Identify records having junk characters in unix


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Identify records having junk characters in unix
# 1  
Old 11-14-2007
Identify records having junk characters in unix

Hi Friends,

I need to have a command in Unix which output all teh records havingg junk characters in a file....

I know a command cat -tv <Filename> which opens the file and we can check for any junk character in it.

But my requirement is to fetch ONLY THOSE records having junk characters.
Please suggest

Thanks in advance,
Suresh.
# 2  
Old 11-14-2007
What do you mean by junk characters ?

characters within specific ascii range ???
# 3  
Old 11-14-2007
Code:
#!/bin/sh

while read N
do
        if hasjunk "$N"
        then
             echo "$N"
        fi
done

# 4  
Old 11-14-2007
Hi ,

Junk characters means somethin like this when I did a cat on the unix file

|ש××××ª× ×¢×× ×ר××¦× ××שר×ת ×××× ××× ×××ר×× ×- ×©× × ×ש××××××× ××××¢× ××××¤× ×××§× ×× ×××ר ×××ש×××××××××× ××× × ×סר××× ××©× × ××××××× ×©×× ×× × ×ק××××. ×××§×©× ×ס×ר××××× ×©× × ×©×"×: 482304481-×ש××× ×©× 3 ×××××ת ×©× ××¡×¨× ×¨×§ ×©× ××××× ×©× ×סר ×××¢× ×ª× ×××ר ×××ש×××ק××× ×ª×× 6 ×××× ××× ×¢××¨× ×××ש××××ס××¨× ×ס×פ×ת. ××¢××¨×ª× ×ת××× ×ת ×ת ××¢×× ×××. ×××ת|

Thanks and Regards,
Suresh
# 5  
Old 11-14-2007
© - ascii value - 169

this link should be useful to you,

Unicode/UTF-8-character table - starting from code position 0080

something like this should do it,

Code:
#! /opt/third-party/bin/perl

open(FILE, "<", $ARGV[0]) || die ("unable to open <$!>\n");

while( read(FILE, $data, 1) == 1 ) {
  $ordVal = ord($data);
  if( $ordVal == 169 ) {
    # similarly for other characters as well,
    # better option would be to build a range for that
    # do the processing here
  }
}

close(FILE);

exit 0

# 6  
Old 11-14-2007
Maybe the file command can help You?
Otherwise You must be more specific, You may be using character sets that come out strange in terminal but ok in any other application.

Example:
file *|grep text
in a random directory it would give me something like
ecl: ASCII text
gitt: Bourne-Again shell script text executable
HELP: ASCII English text
t2s: POSIX shell script text executable
time2Long.java: ASCII Java program text

(and lines sorted out could be lines like
Firefox_wallpaper.png: PNG image data, 1914 x 818, 8-bit/color RGB, non-interlaced
FW6AK115310.pdf: PDF document, version 1.3
itinerary-hotel-3S69Q2.RTF: Rich Text Format data, version 1, ANSI
)
Please be more specific if You can.

/Lakris
# 7  
Old 11-14-2007
Use a language which has the isctrl() function; I think Perl does.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Need to remove Junk characters

Hi All, I have a issue that we are getting Junk characters from source and i am not able to load that records to Database. Line breakers Junk Characters (Â and different every time) Japanese Characters Every time I am using grep command and awk -F "\007" to find them and delete that... (1 Reply)
Discussion started by: spradeep86
1 Replies

2. Shell Programming and Scripting

To check Blank Lines, Blank Records and Junk Characters in a File

Hi All Need Help I have a file with the below format (ABC.TXT) : ®¿¿ABCDHEJJSJJ|XCBJSKK01|M|7348974982790 HDFLJDKJSKJ|KJALKSD02|M|7378439274898 KJHSAJKHHJJ|LJDSAJKK03|F|9898982039999 (cont......) I need to write a script where it will check for : blank lines (between rows,before... (6 Replies)
Discussion started by: chatwithsaurav
6 Replies

3. Solaris

Junk characters in Solaris 11

Hi, I rebooted a Solaris 11 box and after that date stamp is coming in junk in almost all directories. root@tstilp05 # ls -l total 112 drwxrwxr-x 9 root sys 19 juin 1 03:10 adm drwxr-xr-x 6 root sys 6 sept. 19 2012 ai drwxr-xr-x 3 root bin ... (3 Replies)
Discussion started by: solaris_1977
3 Replies

4. Shell Programming and Scripting

Junk characters in mailx output

I have script which send a mail with top output. The script look like $ cat health.sh #!/bin/sh maillist="email address" rm /home/rtq1/file top -n 1 | head 15 > file cat file | mailx -s "Daily Health Report from `hostname` ..." "${maillist}" But now i am getting some junk characters along... (1 Reply)
Discussion started by: Renjesh
1 Replies

5. Shell Programming and Scripting

Handling Junk Characters

Urgently ur help is needed. Actually my req is i have an input file, that input file may have junk characters (^M, ^Z) etc... eg: cat file name abc^Z addres name2 msdmskd^Z address2 I want to validate the record and display where exactly this junk character resides. I want to... (3 Replies)
Discussion started by: help_scr_seeker
3 Replies

6. UNIX for Dummies Questions & Answers

how to grep junk characters in a file

hi guys, I am generating a file from datastage (an etl tool). Now the file is having some junk characters like ( Á,L´±,ñ and so on).. I want to use the grep function to figure out all the junk characters and their location. Can somebody help me out in finding it out.. if possible i... (1 Reply)
Discussion started by: mac4rfree
1 Replies

7. UNIX for Dummies Questions & Answers

XML file shows Junk Characters in UNIX

Hello sir, I have generated XML file from VS 2005. It works well in windows but it shows some junk characters in unix. Can any help me with this problem. Thank you in advance. Hema (6 Replies)
Discussion started by: hemavenkatesh
6 Replies

8. Shell Programming and Scripting

Replacing junk characters

Hi, I have a file with data as given below $cat file1 123|abc|345 345|def|567 The first record is good record. The second record has an invisible junk character like \032. I was replace all the occurences of that invisible character with #. I want to do this for a set of... (16 Replies)
Discussion started by: ashwin3086
16 Replies

9. Shell Programming and Scripting

finding junk characters

Hi, Is there anyway to find the junk characters in a file.Consider the file has data as given below: 123|abc^M|Doctor^C #record 1 234|def|Med #record 2 345|dfg^C|Wrong^V #record 3 The junk characters are highlighted and this is a pipe delimited file. Is there anyway to... (20 Replies)
Discussion started by: ashwin3086
20 Replies

10. Shell Programming and Scripting

Remove junk characters using Perl

Guys, can you help me in removing the junk character "^S" from the below line using perl Reference Data Not Recognised ^S Where a value is provided by the consuming system, which is not reco Thanks, M.Mohan (1 Reply)
Discussion started by: mohan_xunil
1 Replies
Login or Register to Ask a Question