![]() |
|
|
google unix.com
|
|||||||
| Forums | Register | Forum Rules | Links | Albums | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here. |
More UNIX and Linux Forum Topics You Might Find Helpful
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Araic Encoding | habuzahra | Shell Programming and Scripting | 2 | 07-02-2009 09:38 PM |
| Shell Uri Encoding | Trump | Shell Programming and Scripting | 1 | 03-25-2009 09:22 PM |
| get the file encoding | vinment | AIX | 1 | 12-12-2008 02:40 PM |
| URL encoding | Vichu | Shell Programming and Scripting | 8 | 08-27-2008 08:16 PM |
| encoding | palmer18 | UNIX for Dummies Questions & Answers | 3 | 08-21-2007 10:35 AM |
![]() |
|
|
LinkBack | Thread Tools | Search this Thread | Rate Thread | Display Modes |
|
|
|
||||
|
Encoding troubles
Hello All I have a set of files, each one containing some lines that follows that regex: Code:
regex='disabled\,.*\,\".*\"' and here is what file says about each files: Code:
file <random file> <random file> ASCII text, with CRLF line terminators So, as an example, here is what a file ("Daffy Duck - The Marvin Missions (USA).cht" is its name) says: Code:
disabled,C283-3D6F,"Invincibility" disabled,DFBD-1DA4,"Start with 1 life" disabled,DBBD-1DA4,"Start with 9 lives (don’t set lives in options menu)" disabled,49BD-1DA4,"Start with 25 lives (don’t set lives in options menu)" disabled,9FBD-1DA4,"Start with 51 lives (don’t set lives in options menu)" disabled,DDB3-3404,"Infinite lives" disabled,DDA8-4466,"Extra lives cost $500" disabled,DFA8-4466,"Extra lives cost $1,500" It's not visible on this forum, but I have a character encoding problem on the `'` on lines 3-5 In order to check the syntax of each file, I wrote a small bash script (see below) that check each line against the regex above. But due to this small encoding problem, my script echoes those lines although they match the regex. My script: Code:
#!/bin/bash
regex='disabled\,.*\,\".*\"'
for f in *cht; do
while read line; do
if [[ ! "${line}" =~ ${regex} ]]; then
echo "$f - $line"
fi
done < "$f"
done
exit 0
stdout: Code:
Daffy Duck - The Marvin Missions (USA).cht - disabled,DBBD-1DA4,"Start with 9 lives (don�t set lives in options menu)" Daffy Duck - The Marvin Missions (USA).cht - disabled,49BD-1DA4,"Start with 25 lives (don�t set lives in options menu)" Daffy Duck - The Marvin Missions (USA).cht - disabled,9FBD-1DA4,"Start with 51 lives (don�t set lives in options menu)" Any advices to get rid of those � (replacing is not an option)? Thank you for reading. |
|
||||
|
Ok I found a way to tell sed about that [0092] char. As an example, let's take this line: Code:
disabled,DBBD-1DA4,"Start with 9 lives (don[0092]t set lives in options menu)" (as seen on the screenshot above.) Let's use the od command to see what's inside this char: Code:
echo 'disabled,DBBD-1DA4,"Start with 9 lives (don[0092]t set lives in options menu)"' | od -c 0000000 d i s a b l e d , D B B D - 1 D 0000020 A 4 , " S t a r t w i t h 9 0000040 l i v e s ( d o n 302 222 t s 0000060 e t l i v e s i n o p t i 0000100 o n s m e n u ) " \n 0000113 We clearly see 302 and 222 that seem to compose our ’ Using this, we can then write Code:
$ echo 'disabled,DBBD-1DA4,"Start with 9 lives (don[0092]t set lives in options menu)"' | sed 's/'$(echo $'\302'$'\222')'/'$(echo $'\'')'/' (works at least in bash) |
![]() |
| Bookmarks |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|