Encoding troubles


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Encoding troubles
# 1  
Old 10-29-2009
Encoding troubles

Hello All

I have a set of files, each one containing some lines that follows that regex:
Code:
regex='disabled\,.*\,\".*\"'

and here is what file says about each files:
Code:
file <random file>
<random file> ASCII text, with CRLF line terminators

So, as an example, here is what a file ("Daffy Duck - The Marvin Missions (USA).cht" is its name) says:
Code:
disabled,C283-3D6F,"Invincibility" 
disabled,DFBD-1DA4,"Start with 1 life" 
disabled,DBBD-1DA4,"Start with 9 lives (don't set lives in options menu)" 
disabled,49BD-1DA4,"Start with 25 lives (don't set lives in options menu)" 
disabled,9FBD-1DA4,"Start with 51 lives (don't set lives in options menu)" 
disabled,DDB3-3404,"Infinite lives" 
disabled,DDA8-4466,"Extra lives cost $500" 
disabled,DFA8-4466,"Extra lives cost $1,500"

It's not visible on this forum, but I have a character encoding problem on the `'` on lines 3-5
In order to check the syntax of each file, I wrote a small bash script (see below) that check each line against the regex above. But due to this small encoding problem, my script echoes those lines although they match the regex.
My script:
Code:
#!/bin/bash

regex='disabled\,.*\,\".*\"'
for f in *cht; do
    while read line; do
        if [[ ! "${line}" =~ ${regex} ]]; then
            echo "$f - $line"
        fi
    done < "$f"
    
done

exit 0

stdout:
Code:
Daffy Duck - The Marvin Missions (USA).cht - disabled,DBBD-1DA4,"Start with 9 lives (don�t set lives in options menu)"
Daffy Duck - The Marvin Missions (USA).cht - disabled,49BD-1DA4,"Start with 25 lives (don�t set lives in options menu)"
Daffy Duck - The Marvin Missions (USA).cht - disabled,9FBD-1DA4,"Start with 51 lives (don�t set lives in options menu)"

Any advices to get rid of those � (replacing is not an option)? Thank you for reading.
# 2  
Old 10-30-2009
' in your cht file is Asian char with double byte.

Maybe you need replace to real ' by SED first. You will see the difference below

Code:
dont 
don't

# 3  
Old 10-30-2009
Ah, Thank you for pointing that out, I didn't notice at all.
But the problem is still the same: I don't know how to tell sed about this char:
Image
(bigger font to see the [0092]) or this one (�).
# 4  
Old 11-04-2009
Ok I found a way to tell sed about that [0092] char.
As an example, let's take this line:
Code:
disabled,DBBD-1DA4,"Start with 9 lives (don[0092]t set lives in options menu)"

(as seen on the screenshot above.)
Let's use the od command to see what's inside this char:
Code:
echo 'disabled,DBBD-1DA4,"Start with 9 lives (don[0092]t set lives in options menu)"' | od -c
0000000   d   i   s   a   b   l   e   d   ,   D   B   B   D   -   1   D
0000020   A   4   ,   "   S   t   a   r   t       w   i   t   h       9
0000040       l   i   v   e   s       (   d   o   n 302 222   t       s
0000060   e   t       l   i   v   e   s       i   n       o   p   t   i
0000100   o   n   s       m   e   n   u   )   "  \n
0000113

We clearly see 302 and 222 that seem to compose our '
Using this, we can then write
Code:
$ echo 'disabled,DBBD-1DA4,"Start with 9 lives (don[0092]t set lives in options menu)"' | sed 's/'$(echo $'\302'$'\222')'/'$(echo $'\'')'/'

(works at least in bash)
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Solaris

View file encoding then change encoding.

Hi all!! I´m using command file -i myfile.xml to validate XML file encoding, but it is just saying regular file . I´m expecting / looking an output as UTF8 or ANSI / ASCII Is there command to display the files encoding? Thank you! (2 Replies)
Discussion started by: mrreds
2 Replies

2. UNIX for Advanced & Expert Users

Troubles with OpenSSH

Hi, I am trying to login from one AIX server to another without using a password, a basic configuration, however it doesn't seem to work. All things are in place. I have both a public and private key in the ~/.ssh folder and also have an "authorized_keys" file on the target-server containing... (5 Replies)
Discussion started by: Hille
5 Replies

3. BSD

PF troubles on OpenBSD 5.0

I am setting up a system as an ADSL gateway. ADSL is working fine. PF is not forwarding for some reason. # ifconfig lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 33196 priority: 0 groups: lo inet6... (0 Replies)
Discussion started by: John Tate
0 Replies

4. Shell Programming and Scripting

awk and tr troubles

I am trying to make all the fields containing lower case letters upper case and the third field of a file display ** instead. I have this: awk '{printf "%s %s ** %d %d\n", $1, $2, $4, $5}' database.txt | tr '' '' < database.txt And that only changes it to upper case, other... (4 Replies)
Discussion started by: Bungkai
4 Replies

5. Shell Programming and Scripting

How to find the file encoding and updating the file encoding?

Hi, I am beginner to Unix. My requirement is to validate the encoding used in the incoming file(csv,txt).If it is encoded with UTF-8 format,then the file should remain as such otherwise i need to chnage the encoding to UTF-8. Please advice me how to proceed on this. (7 Replies)
Discussion started by: cnraja
7 Replies

6. Shell Programming and Scripting

for loop troubles

What I have here is a pretty textbook recursive function. Its purpose right now is simply to display all folders that don't contain folders. It works fine in all instances I can think of... except one. If there is a folder with a space in its name, the thing goes Kablooie. AFAIK the problem comes... (5 Replies)
Discussion started by: divisionbyzero
5 Replies

7. UNIX for Dummies Questions & Answers

Cron troubles

I am aware this question has been answered time and again. I feel I have tried everything I have seen on the net and really need help to get this working. Same old story. Shell script, working from command but not from cron. I need my script to take values from a .properties file. Tried... (2 Replies)
Discussion started by: airalpha
2 Replies

8. Shell Programming and Scripting

if-statement troubles

I try to compare the day and month of someones birthday with the day and month of today, so my little bash script can send a mail to the person that has its birthday that day. The first line of the file birthdays looks like this: firstname,lastname,01/01/1990,.... The variable birthday's... (4 Replies)
Discussion started by: doc.arne
4 Replies

9. HP-UX

cron troubles

I have a cronjob that I need to run everyday and it needs to have todays date inputed, here is what I have, but is not working as expected.......... 23 02 * * * cd /path;./RequestSummaryReport.sh $(date +%Y-%m-%d) the output from mail gives me............. Date: Fri, 8 Feb 2008 02:12:07... (4 Replies)
Discussion started by: theninja
4 Replies

10. Programming

compiling troubles

i keep getting the following error with the code segment below when i try to compile the program. The code is from 'defs.h' parse error before '(' parse error before ')' stray '\' in program this is the code segment and the error is on the second line of the segment #define... (1 Reply)
Discussion started by: token
1 Replies
Login or Register to Ask a Question