Cleaning up incorrect/unknown characters


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Cleaning up incorrect/unknown characters
# 1  
Old 07-11-2013
Cleaning up incorrect/unknown characters

Hi,

Some xml files that need to be parsed by a script may contain unknown characters in random positions. I can't copy/paste it to show an example because they disappear this way but in a text editor they show up. Below is an example (in the context this character is supposed to be an apostrophe).

Image

Is there a way to identify, and possibly replace, those characters using sed, awk or else?

Thanks for your help
# 2  
Old 07-11-2013
looks more like a problem with charset or locale (Lang ? )to me... on what was produced the xml file?
# 3  
Old 07-11-2013
The xml files are copied from an apple ipod touch.
# 4  
Old 07-11-2013
Found something for you to read:
Character encoding for iOS developers. Or UTF-8 what now? - Matt Galloway

http://www.unicode.org/charts/PDF/U0000.pdf

Last edited by vbe; 07-11-2013 at 11:26 AM.. Reason: added unicod url...
This User Gave Thanks to vbe For This Post:
# 5  
Old 07-11-2013
Thanks. I've read through these and found them quite informative. I'm wondering now how i can use this information somehow.

I suppose i'd need to make some tests and try out different wildcard combinations
# 6  
Old 07-11-2013
Its more a question of knowing Input format (seems UTF ) and needed format on your unix/linux and doing a translation of char codes...
# 7  
Old 07-11-2013
No wildcards needed, look at how UTF8 works, simply remove any characters >=128.

Code:
tr -d '\200-\377' < inputfile > outputfile

This User Gave Thanks to Corona688 For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Cleaning output using awk

I have some small problem with my code. data.html <TD class="statuscol2">c</TD> <TD class="statuscol3">18</TD> <TD class="statuscol4"><SPAN TITLE="#04">test4</SPAN></TD> <TD... (4 Replies)
Discussion started by: Jotne
4 Replies

2. Shell Programming and Scripting

Removing characters from end of line (length unknown)

Hi I have a file which contains wrong XML, There are some garbage characters at the end of line that I want to get rid of. Example: <request type="product" ><attributes><pair><name>q</name><value><!]></value></pair><pair><name>start</name><value>1</value></pair></attributes></request>�J ... (7 Replies)
Discussion started by: dirtyd0ggy
7 Replies

3. Shell Programming and Scripting

cleaning the file

Hi, I have a file with multiple rows. each row has 8 columns. Column 8 has entries separated by commas. I want to exclude all the rows in which column 8 has more than 3 commas. 1234#0/1 - ABC_1234 3 ATGCATGCATGC HHHIIIGIHVF 1 49:T>C,60:T>C,78:C>A,76:G>T,65:T>G Thanks, Diya (3 Replies)
Discussion started by: Diya123
3 Replies

4. Shell Programming and Scripting

File cleaning

HI , I am getting the source data as below. Source Data CDR_Data,,,,, F1,F2,F3,F4,F5,F6 5,5,6,7,8,7 6,6,g,,, 7,7,76,,, 8,8,gt,,, 9,9,df ,d,d,d ,,,,, (4 Replies)
Discussion started by: wangkc
4 Replies

5. UNIX for Dummies Questions & Answers

AWK Data Cleaning

Hello, I am trying to analyze data I recently ran, and the only way to efficiently clean up the data is by using an awk file. I am very new to awk and am having great difficulty with it. In $8 and $9, for example, I am trying to delete numbers that contain 1. I cannot find any tutorials that... (20 Replies)
Discussion started by: carmar87
20 Replies

6. Shell Programming and Scripting

read in a file character by character - replace any unknown ASCII characters with spa

Can someone help me to write a script / command to read in a file, character by character, replace any unknown ASCII characters with space. then write out the file to a new filename/ Thanks! (1 Reply)
Discussion started by: raghav525
1 Replies

7. Solaris

PING - Unknown host 127.0.0.1, Unknown host localhost - Solaris 10

Hello, I have a problem - I created a chrooted jail for one user. When I'm logged in as root, everything work fine, but when I'm logged in as a chrooted user - I have many problems: 1. When I execute the command ping, I get weird results: bash-3.00$ usr/sbin/ping localhost ... (4 Replies)
Discussion started by: Przemek
4 Replies

8. SCO

Tape drive cleaning

Hello everyone, First, thank you anyone who might be able to help : ) !! here it is, I am using SCO at my business, and I back up everything to a tape drive. I want to do my cleaning of the drive, and i put in the cartridge to the drive, it recognizes it yet it will not engage the... (5 Replies)
Discussion started by: RichardHeadd
5 Replies

9. UNIX for Dummies Questions & Answers

Database cleaning software

Hi everybody, I have been given a task to find the names of some products that can clean up databases by removing confidential information. The situation is that a client imports data from public sources (government websites, etc.) but that this data sometimes includes things like Social... (0 Replies)
Discussion started by: rhfrommn
0 Replies

10. AIX

doing some spring cleaning....

USERS="me you jim joe sue" for user in ${USERS}; do rmuser -p $user usrdir=`cat /etc/passwd|grep $user|awk -F":" '{ print $6 }'` rm -fr `cat /etc/passwd|grep $user|awk -F":" '{ print $6 }'` echo Deleting: $user '\t' REMOVING: $usrdir done This is for AIX ONLY!!! but easily ported to... (0 Replies)
Discussion started by: Optimus_P
0 Replies
Login or Register to Ask a Question