Non-ascii character detection (perl or grep)


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Non-ascii character detection (perl or grep)
# 1  
Old 02-19-2007
Non-ascii character detection (perl or grep)

Hi,
Can I know how to grep for lines with non-ascii characters in a file?

If not grep, at least can we do it with command-line perl or awk? I tried the [:ascii] functionality of perl, but still could not get the result. Any help??

PS: I was sure that someone should have asked this question already. So i searched thro the forums, but could not find one relating to what i am asking.. Smilie

Thanks
Srini
# 2  
Old 02-19-2007
Not tested though ! Smilie Smilie

Try and let us know for any problems!

Code:
#! /opt/third-party/bin/perl

my($content, $length);

open(FILE, "< sample") || die "Unable to open file small. <$!>\n";

while( chomp($content = <FILE>) ) {
    length = length($content);
    
    for( $i = 0; $i < $length; $i++ ) {
     
        if( ord(substr($content, $i, 1)) > 127 )
        {
            print "$content\n";
            last;
        }        
    }
}
close(FILE);

exit 0

This User Gave Thanks to matrixmadhan For This Post:
# 3  
Old 02-19-2007
Quote:
Originally Posted by matrixmadhan
Not tested though ! Smilie Smilie

Try and let us know for any problems!

Code:
#! /opt/third-party/bin/perl

my($content, $length);

open(FILE, "< sample") || die "Unable to open file small. <$!>\n";

while( chomp($content = <FILE>) ) {
    length = length($content);
    
    for( $i = 0; $i < $length; $i++ ) {
     
        if( ord(substr($content, $i, 1)) > 127 )
        {
            print "$content\n";
            last;
        }        
    }
}
close(FILE);

exit 0

Thanks Madhan!! That works fine.. (except for the "$length" instead of length)..
Can i know whether there is a command (instead of a script) to do this job?? an awk or perl or grep command??

Thanks
Srini
# 4  
Old 02-19-2007
Quote:
Thanks Madhan!! That works fine.. (except for the "$length" instead of length)..
Can i know whether there is a command (instead of a script) to do this job?? an awk or perl or grep command??
Smilie Right! That was a catch..

Am not aware of any command to suit your requirement directly!
# 5  
Old 02-19-2007
Try this...

tr -d "\000-\011\013-\177" < txtfile

...where txtfile is the file you want to scan for non-ascii chars.

This will send txtfile to stdout, but along the way delete every ascii char except newline.
# 6  
Old 02-19-2007
Doesnt work....

Quote:
Originally Posted by mschwage
tr -d "\000-\011\013-\177" < txtfile

...where txtfile is the file you want to scan for non-ascii chars.

This will send txtfile to stdout, but along the way delete every ascii char except newline.
Thanks mschwage. But the command doesnt work out for me. Can you please let me know whether it worked out for u?

Thanks
Srini

Last edited by srinivasan_85; 02-19-2007 at 10:44 AM.. Reason: My carelessness to read the thread properly...
# 7  
Old 02-19-2007
[QUOTE=srinivasan_85]Hi,
Can I know how to grep for lines with non-ascii characters in a file?

If not grep, at least can we do it with command-line perl or awk? I tried the [:ascii] functionality of perl, but still could not get the result. Any help??

Have you tried this -
grep -v [:alnum:] test
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

What is ASCII character?

Hi Guru, I have put one post yesterday and get answer. thanks for your help. my question today is: what is ascii character for following non printable characters: ( we need filter these characters out in another process) ^MM-^E^MM-^E. Old post link: ... (5 Replies)
Discussion started by: ken002
5 Replies

2. Shell Programming and Scripting

Remove some special ascii character

Hello I have this special caracter after retreving rows from sql server: "....spasses: • Entrem al valort 6050108002811 • El donem..." I would like a sed command to remove it..or just know it's ascii code in order to replace it into my sql sentence.. Hope some one knows how to do that.... (7 Replies)
Discussion started by: ldiaz2106
7 Replies

3. Shell Programming and Scripting

Print the next ASCII character

Hi, In my file, for few field I have to print the next ASCII character for every character. In the below file, I have to do for the 2,3 and 5th fields. Input File ======== 1|abc|def|5|ghi 2|jkl|mno|6|pqr Expected Ouput file ======= 1|bcd|efg|5|hij 2|klm|nop|6|qrs (2 Replies)
Discussion started by: machomaddy
2 Replies

4. Shell Programming and Scripting

FTP Issue with Non ascii character

I have one file .dat file on windows server containg the following text "Bürki" Now When I am using FTP (get) command from UNIX server the text is appering is as "Bürki" I want to preserve the text in the file on UNIX server as it is in source file. Could you please suggest some... (2 Replies)
Discussion started by: Bhushan D
2 Replies

5. UNIX for Dummies Questions & Answers

How to grep for a non-standard ASCII character?

A very simple question but I have scoured the web and can't find an answer. How do I search for a character by ASCII code in a regular expression using grep? For example, we use the End of Medium symbol as a delimiter in certain files. (this is ascii 031 in oct, displays as ^Y) I want to grep... (6 Replies)
Discussion started by: DJR
6 Replies

6. UNIX for Advanced & Expert Users

ASCII Character Set

I thought I would point this out. This has a lot of the non printing characters. ASCII Character Set (7 Replies)
Discussion started by: cokedude
7 Replies

7. Shell Programming and Scripting

Passing variable and wild card character to grep in Perl

HI All, I have a script that needs to find out a list of files in a directory, i pass the search parameter as an argument. opendir ( DIR, $dir ) || die "Error in opening dir $dirname\n"; @filename1 = (grep {/$File_pattern/ } readdir(DIR)); The problem is my file patterns are like... (1 Reply)
Discussion started by: amit1_x
1 Replies

8. UNIX for Dummies Questions & Answers

global search and replacement of a non-ascii character

Hi, I need to do a global search and replacement of a non-ascii character. Let me first give the background of my problem. Very frequently, I need to copy set of references from different sources. Typically, a reference would like this: Banumathy et al., 2002 G. Banumathy, V. Singh and U.... (1 Reply)
Discussion started by: effjay
1 Replies

9. Shell Programming and Scripting

read in a file character by character - replace any unknown ASCII characters with spa

Can someone help me to write a script / command to read in a file, character by character, replace any unknown ASCII characters with space. then write out the file to a new filename/ Thanks! (1 Reply)
Discussion started by: raghav525
1 Replies

10. UNIX for Dummies Questions & Answers

Ascii value of character?

Is there a way to determine the ascii value of a character? For example, let's say a shell variable has the value 'A'. I would like it's ascii value (e.g. 65 in this case). I would like to do this from a script (preferably ksh). (12 Replies)
Discussion started by: sszd
12 Replies
Login or Register to Ask a Question