Removing these non-ASCII characters from a file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Removing these non-ASCII characters from a file
# 1  
Old 07-02-2013
Removing these non-ASCII characters from a file

Hi,
I have many text files which contain some non-ASCII characters. I attach the screenshots of one of the files for people to have a look at. The issue is even after issuing the non-ASCII removal commands one of the characters does not go away. The character that goes away is the black one with a question mark in it whereas, the square character does not go. I also show how that square character looks when seen using the "more" command using konsole on Centos distribution.

The code which helps remove the black symbol with a question mark is this:
Code:
ls -1 *.dat | while read page
do
cat $page | tr -d '\001'-'\011''\013''\014''\016'-'\037''\200'-'\377' < "$page" > "$page".txt
done

But the above code does not get rid of those square characters. I have also searched through this forum and also on the internet and found many other codes to non-ASCII removal. None of them get rid of this square like character from the file.
Removing these non-ASCII characters from a file-screenpng
Removing these non-ASCII characters from a file-screen2png

Last edited by shoaibjameel123; 07-02-2013 at 12:20 AM.. Reason: Code tags edit
# 2  
Old 07-02-2013
Run octal dump and check what these characters are:
Code:
od -c < file

If these are control characters:
Code:
awk '{gsub(/[[:cntrl:]]/,X)}1' file > out

This User Gave Thanks to Yoda For This Post:
# 3  
Old 07-02-2013
Thanks.
Code:
awk '{gsub(/[[:cntrl:]]/,X)}1' file > out

The above command helps get ride of those square ones. So this means by running this command along with the one that I have posted above can help get rid of all the non-ASCII characters. For od, I will study the output. This is something new for me.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Convert UTF-8 file to ASCII/ISO8859-1 OR replace characters

I am trying to develop a script which will work on a source UTF-8 file and perform one or more of the following It will accept the target encoding as an argument e.g. US-ASCII or ISO-8859-1, etc 1. It should replace all occurrences of characters outside target character set by " " (space) or... (3 Replies)
Discussion started by: hemkiran.s
3 Replies

2. Shell Programming and Scripting

Identify extended ascii characters in a file

Hi, Is there a way to identify the lines in a file having extended ascii characters and display the same? For instance I have a file abc.txt having below data aaa|bbb|111|This is first line aaa|bbb|222|This is secõnd line aaa|bbb|333|This is third line aaa|bbb|444|This is foùrth line... (3 Replies)
Discussion started by: decci_7
3 Replies

3. Shell Programming and Scripting

Removing last and first characters in a file

bash-3.00$ cat temp.txt ./a/REA01/ces1/apps/ces_ces1_init3_aa.ear/ces.war/WEB-INF/classes/reds/common/environment.properties ./a/REA01/ces1/apps/ces_ces1_init3_aa.ear/commonproperties/hi/HostIntegration.properties... (9 Replies)
Discussion started by: bhas85
9 Replies

4. Shell Programming and Scripting

How to check if the file has EBCDIC or ascii characters

Hi, is there a way to check if the initial few characters are ebcdic or ascii in a file? (1 Reply)
Discussion started by: ahmedwaseem2000
1 Replies

5. Shell Programming and Scripting

New line characters in Ascii file

I am having a file(1234.txt) downloaded from windows server (in Ascii format).However when i ftp this file to Unix server and try to work with it..i am unable to do anything.When i try to open the file using vi editor the file opens in the following format ... @ @ @ @ @ @ @ @... (4 Replies)
Discussion started by: appu2176
4 Replies

6. Shell Programming and Scripting

Removing ^M characters from a file

Hi, I want to removing ^M characters from a file and combine the line with the next line. ex: issue i have: ABC^M^M DEF solution i need: ABCDEF I found that you by using the following command you can remove new line characters. tr -d '\r' < infile.csv > outfile.csv still... (10 Replies)
Discussion started by: mwrg
10 Replies

7. Shell Programming and Scripting

convert ascii values into ascii characters

Hi gurus, I have a file in unix with ascii values. I need to convert all the ascii values in the file to ascii characters. File contains nearly 20000 records with ascii values. (10 Replies)
Discussion started by: sandeeppvk
10 Replies

8. Shell Programming and Scripting

Removing certain characters in a file

Hi I have a file that has semicolons in it (;) is there a way to just remove these in the file. Example name: Joe Smith; group: Group1; name: Mary White; group: Group2; (2 Replies)
Discussion started by: bombcan
2 Replies

9. Shell Programming and Scripting

Weird Ascii characters in file names

Hi. I have files in my OS that has weird file names with not-conventional ascii characters. I would like to run them but I can't refer them. I know the ascii # of the problematic characters. I can't change their name since it belongs to a 3rd party program... but I want to run it. is there... (2 Replies)
Discussion started by: yamsin789
2 Replies

10. HP-UX

Hex characters of ascii file

Hi, Whats the command or how do you display the hexadecimal characters of an ascii file. thanks Bud (2 Replies)
Discussion started by: budrito
2 Replies
Login or Register to Ask a Question