The UNIX and Linux Forums  

Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
.
google unix.com



Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Araic Encoding habuzahra Shell Programming and Scripting 2 07-02-2009 09:38 PM
Shell Uri Encoding Trump Shell Programming and Scripting 1 03-25-2009 09:22 PM
get the file encoding vinment AIX 1 12-12-2008 02:40 PM
URL encoding Vichu Shell Programming and Scripting 8 08-27-2008 08:16 PM
encoding palmer18 UNIX for Dummies Questions & Answers 3 08-21-2007 10:35 AM

Reply
English Japanese Spanish French German Portuguese Italian Dutch Swedish Russian Norwegian Hungarian Hebrew Danish Bulgarian Greek Powered by Powered by Google
 
LinkBack Thread Tools Search this Thread Rate Thread Display Modes
  #1 (permalink)  
Old 10-29-2009
tukuyomi tukuyomi is offline
Registered User
  
 

Join Date: Aug 2009
Posts: 43
Encoding troubles

Hello All

I have a set of files, each one containing some lines that follows that regex:

Code:
regex='disabled\,.*\,\".*\"'

and here is what file says about each files:

Code:
file <random file>
<random file> ASCII text, with CRLF line terminators

So, as an example, here is what a file ("Daffy Duck - The Marvin Missions (USA).cht" is its name) says:

Code:
disabled,C283-3D6F,"Invincibility" 
disabled,DFBD-1DA4,"Start with 1 life" 
disabled,DBBD-1DA4,"Start with 9 lives (don’t set lives in options menu)" 
disabled,49BD-1DA4,"Start with 25 lives (don’t set lives in options menu)" 
disabled,9FBD-1DA4,"Start with 51 lives (don’t set lives in options menu)" 
disabled,DDB3-3404,"Infinite lives" 
disabled,DDA8-4466,"Extra lives cost $500" 
disabled,DFA8-4466,"Extra lives cost $1,500"

It's not visible on this forum, but I have a character encoding problem on the `'` on lines 3-5
In order to check the syntax of each file, I wrote a small bash script (see below) that check each line against the regex above. But due to this small encoding problem, my script echoes those lines although they match the regex.
My script:

Code:
#!/bin/bash

regex='disabled\,.*\,\".*\"'
for f in *cht; do
    while read line; do
        if [[ ! "${line}" =~ ${regex} ]]; then
            echo "$f - $line"
        fi
    done < "$f"
    
done

exit 0

stdout:

Code:
Daffy Duck - The Marvin Missions (USA).cht - disabled,DBBD-1DA4,"Start with 9 lives (don�t set lives in options menu)"
Daffy Duck - The Marvin Missions (USA).cht - disabled,49BD-1DA4,"Start with 25 lives (don�t set lives in options menu)"
Daffy Duck - The Marvin Missions (USA).cht - disabled,9FBD-1DA4,"Start with 51 lives (don�t set lives in options menu)"

Any advices to get rid of those � (replacing is not an option)? Thank you for reading.
  #2 (permalink)  
Old 10-30-2009
rdcwayx rdcwayx is offline
Registered User
  
 

Join Date: Jun 2006
Posts: 290
' in your cht file is Asian char with double byte.

Maybe you need replace to real ' by SED first. You will see the difference below


Code:
dont 
don't

  #3 (permalink)  
Old 10-30-2009
tukuyomi tukuyomi is offline
Registered User
  
 

Join Date: Aug 2009
Posts: 43
Ah, Thank you for pointing that out, I didn't notice at all.
But the problem is still the same: I don't know how to tell sed about this char:

(bigger font to see the [0092]) or this one (�).
  #4 (permalink)  
Old 4 Weeks Ago
tukuyomi tukuyomi is offline
Registered User
  
 

Join Date: Aug 2009
Posts: 43
Ok I found a way to tell sed about that [0092] char.
As an example, let's take this line:

Code:
disabled,DBBD-1DA4,"Start with 9 lives (don[0092]t set lives in options menu)"

(as seen on the screenshot above.)
Let's use the od command to see what's inside this char:

Code:
echo 'disabled,DBBD-1DA4,"Start with 9 lives (don[0092]t set lives in options menu)"' | od -c
0000000   d   i   s   a   b   l   e   d   ,   D   B   B   D   -   1   D
0000020   A   4   ,   "   S   t   a   r   t       w   i   t   h       9
0000040       l   i   v   e   s       (   d   o   n 302 222   t       s
0000060   e   t       l   i   v   e   s       i   n       o   p   t   i
0000100   o   n   s       m   e   n   u   )   "  \n
0000113

We clearly see 302 and 222 that seem to compose our ’
Using this, we can then write

Code:
$ echo 'disabled,DBBD-1DA4,"Start with 9 lives (don[0092]t set lives in options menu)"' | sed 's/'$(echo $'\302'$'\222')'/'$(echo $'\'')'/'

(works at least in bash)
Reply

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT -4. The time now is 07:03 AM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited. Language Translations Powered by .
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios
The UNIX and Linux Forums Content Copyright ©1993-2009. All Rights Reserved.Ad Management by RedTyger

Content Relevant URLs by vBSEO 3.2.0