Redirecting records with non-printable characters


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Redirecting records with non-printable characters
# 1  
Old 10-13-2015
Redirecting records with non-printable characters

Hi,

I have a huge file (50 Mil rows) which has certain non-printable ASCII characters in it. I am cleaning the file by deleting those characters using the following command -

Code:
tr -cd '\11\12\15\40-\176' < unclean_file > clean_file

Please note that I am excluding the following -
tab, linefeed, carriage-return and all keyboard characters while cleaning the file.

However, besides cleansing the file (by the above command) I also need to identify the rows which have these non-printable ASCII characters and redirect them to another file.

As stated earlier, can anyone please advise how I can capture these rows (with non-printable characters) in another file ?

Thanks
# 2  
Old 10-13-2015
Code:
grep -v '[[:print:]]' myFile >nonPrintFile

# 3  
Old 10-13-2015
Code:
tr -cd '[:print:]' < unclean_file > clean_file

# 4  
Old 10-14-2015
Redirecting records with non-printable characters

Hi Vgersh99,

Will the command suggested by you also redirect rows containing linefeed, carriage-return and tabs ?

Code:
grep -v '[[:print:]]' myFile >nonPrintFile

I do not intend to redirect rows containing linefeed, carriage-return and tabs.

Please advise.

Thanks
# 5  
Old 10-14-2015
You are right, the [:print:] character set does not have tabs and newlines.
Improvements:
Code:
tr -cd '[:print:]\t\n' < unclean_file > clean_file

Code:
awk '/[^[:print:]\t]/' unclean_file > nonPrint_lines

The CR is really a special character in Unix. Nevertheless you can add a \r.

Last edited by MadeInGermany; 10-14-2015 at 05:43 AM..
# 6  
Old 10-29-2015
Thanks @MadeInGermany.

Shouldn't I be also including
Code:
\n

in the command ? Otherwise wouldn't it qualify every line in the file to have non-print character since newline is also a non-print character ?

Code:
awk '/[^[:print:]\t\r\n]/' unclean_file > nonPrint_lines

Please correct me if I am wrong.

Thanks again !
# 7  
Old 10-29-2015
Quote:
Originally Posted by rishigc
Thanks @MadeInGermany.

Shouldn't I be also including
Code:
\n

in the command ? Otherwise wouldn't it qualify every line in the file to have non-print character since newline is also a non-print character ?

Code:
awk '/[^[:print:]\t\r\n]/' unclean_file > nonPrint_lines

Please correct me if I am wrong.

Thanks again !
With default record separators, <newline> characters are stripped from $0 when each line is read and the default print command (used when the condition evaluates to TRUE and there is no action section specified) will add a <newline> to the output. So, the two commands:
Code:
awk '/[^[:print:]\t\r\n]/' unclean_file > nonPrint_lines
awk '/[^[:print:]\t\r]/' unclean_file > nonPrint_lines

produce exactly the same output for any input file. (But, the results are unspecified if the last character in a non-empty input file is not a <newline> character.)

And, as MadeInGermany said, <carriage-return> is not a normal character in a UNIX/Linux text file. Unless you're processing DOS format text files, you probably want to copy lines containing <carriage-return> characters from unclean_file to nonPrint_lines.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

ksh check for non printable characters in a string

Hi All, I am trying to find non-printable characters in a string. The sting could have alphanumeric, puntuations and characters like (*&%$#.') but not non-printable (or that is what I think they are called) which are introduced when you copy any text from DOS to unix box. Input string1:... (10 Replies)
Discussion started by: dips_ag
10 Replies

2. Shell Programming and Scripting

problem in redirecting records using nawk

I am trying to redirect record to two files using nawk if-else. #Identify good and bad records and redirect records using if-then-else nawk -F"|" '{if(NF!=14){printf("%s\n",$0) >> "$fn"_bad_data}else{printf("%s\n",$0) >> $fn}}' "$fn".orig "$fn".orig is the source file name bad... (7 Replies)
Discussion started by: siteregsam
7 Replies

3. Shell Programming and Scripting

Unable to grep control/non printable characters

Unable to grep: Able to grep: (11 Replies)
Discussion started by: proactiveaditya
11 Replies

4. UNIX for Dummies Questions & Answers

removing non printable characters

Hi, in a file, i have records as below: 123|62|absnb|267629 123|267|28728|uiuip 123|567|26761|2676 i want to remove the non printable characters after the end of each record. I guess there are certain charcters but not visible. i don't know what character that is exactly. I used... (2 Replies)
Discussion started by: pandeesh
2 Replies

5. Shell Programming and Scripting

Removing Non-printable characters in unix file

Hi, We have a non printable character "®" in our file , we want to remove this character, we tried tr -dc '' < oldfile> newfile but this command is removing all new line entries along with the non printable character and all the records are coming in one line(it is changing the format of the... (2 Replies)
Discussion started by: pyaranoid
2 Replies

6. HP-UX

Non-printable characters

I have been using OKI data Microline printers; models 590 and 591 to print a bar code using the following escape sequence: \E^PA^H^C00^D^C^A^A^A\E^PB^H The escape sequence is stored in a unix file which is edited using vi. Now, we are considering Microline printer model 395C and the bar code... (3 Replies)
Discussion started by: Joy Conner
3 Replies

7. Shell Programming and Scripting

count characters in specific records

I have a text file which represents a http flow like this: HTTP/1.1 200 OK Date: Fri, 23 Jan 2009 17:16:24 GMT Server: Apache Last-Modified: Fri, 23 Jan 2009 17:08:03 GMT Accept-Ranges: bytes Cache-Control: max-age=540 Expires: Fri, 23 Jan 2009 17:21:31 GMT Vary: Accept-Encoding ... (1 Reply)
Discussion started by: littleboyblu
1 Replies

8. UNIX for Dummies Questions & Answers

delete non printable characters from file

i have a file which contains non printable characters like enter,escape etc i want to delete them from the file (2 Replies)
Discussion started by: alokjyotibal
2 Replies

9. Shell Programming and Scripting

grep non printable characters

Sometimes obvious things... are not so obvious. I always thought that it was possible to grep non printable characters but not with my GNU grep (5.2.1) version. printf "Hello\tWorld" | grep -l '\t' printf "Hello\tWorld" | grep -l '\x09' printf "Hello\tWorld" | grep -l '\x{09}' None of them... (3 Replies)
Discussion started by: ripat
3 Replies

10. Shell Programming and Scripting

Best way to search files for non-printable characters?

I need to check ftp'd incoming files for characters that are not alphanumeric,<tab>, <cr>, or <lf> characters. Each file would have 10-20,000 line with up to 3,000 characters per line. Should I use awk, sed, or grep and what would the command look like to do such a search? Thanks much to anyone... (2 Replies)
Discussion started by: jvander
2 Replies
Login or Register to Ask a Question