The UNIX and Linux Forums  

Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
Google UNIX.COM


Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts here.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
how do I identify files with characters beyond a certain range. kcsunsun01dev Shell Programming and Scripting 3 02-14-2008 02:51 PM
UNIx junk emails sireesha15 UNIX for Dummies Questions & Answers 6 12-14-2006 12:08 PM
Email ends with Junk Characters Amruta Pitkar UNIX for Dummies Questions & Answers 5 08-16-2006 05:43 PM
Identify a remote machine as windows or unix u449064 UNIX for Advanced & Expert Users 5 08-05-2006 09:03 AM
how does unix identify C and other language code! a25khan UNIX for Dummies Questions & Answers 2 01-21-2004 07:44 AM

Reply
 
Submit Tools LinkBack Thread Tools Search this Thread Display Modes
  #1  
Old 11-14-2007
Registered User
 

Join Date: Nov 2006
Location: Czech Republic
Posts: 39
Identify records having junk characters in unix

Hi Friends,

I need to have a command in Unix which output all teh records havingg junk characters in a file....

I know a command cat -tv <Filename> which opens the file and we can check for any junk character in it.

But my requirement is to fetch ONLY THOSE records having junk characters.
Please suggest

Thanks in advance,
Suresh.
Reply With Quote
Forum Sponsor
  #2  
Old 11-14-2007
Technorati Master
 

Join Date: Mar 2005
Location: Large scale systems...
Posts: 2,610
What do you mean by junk characters ?

characters within specific ascii range ???
Reply With Quote
  #3  
Old 11-14-2007
Registered User
 

Join Date: Jan 2007
Posts: 2,965
Code:
#!/bin/sh

while read N
do
        if hasjunk "$N"
        then
             echo "$N"
        fi
done
Reply With Quote
  #4  
Old 11-14-2007
Registered User
 

Join Date: Nov 2006
Location: Czech Republic
Posts: 39
Hi ,

Junk characters means somethin like this when I did a cat on the unix file

|ש××××ª× ×¢×× ×ר××¦× ××שר×ת ×××× ××× ×××ר×× ×- ×©× × ×ש××××××× ××××¢× ××××¤× ×××§× ×× ×××ר ×××ש×××××××××× ××× × ×סר××× ××©× × ××××××× ×©×× ×× × ××§××××. ×××§×©× ×ס×ר××××× ×©× × ×©×"×: 482304481-×ש××× ×©× 3 ×××××ת ×©× ××¡×¨× ×¨×§ ×©× ××××× ×©× ×סר ×××¢× ×ª× ×××ר ×××ש××××§××× ×ª×× 6 ×××× ××× ×¢××¨× ×××ש××××ס××¨× ×ס×פ×ת. ××¢××¨×ª× ×ת××× ×ת ×ת ××¢×× ×××. ×××ת|

Thanks and Regards,
Suresh
Reply With Quote
  #5  
Old 11-14-2007
Technorati Master
 

Join Date: Mar 2005
Location: Large scale systems...
Posts: 2,610
© - ascii value - 169

this link should be useful to you,

Unicode/UTF-8-character table - starting from code position 0080

something like this should do it,

Code:
#! /opt/third-party/bin/perl

open(FILE, "<", $ARGV[0]) || die ("unable to open <$!>\n");

while( read(FILE, $data, 1) == 1 ) {
  $ordVal = ord($data);
  if( $ordVal == 169 ) {
    # similarly for other characters as well,
    # better option would be to build a range for that
    # do the processing here
  }
}

close(FILE);

exit 0
Reply With Quote
  #6  
Old 11-14-2007
Registered User
 

Join Date: Oct 2007
Posts: 171
Maybe the file command can help You?
Otherwise You must be more specific, You may be using character sets that come out strange in terminal but ok in any other application.

Example:
file *|grep text
in a random directory it would give me something like
ecl: ASCII text
gitt: Bourne-Again shell script text executable
HELP: ASCII English text
t2s: POSIX shell script text executable
time2Long.java: ASCII Java program text

(and lines sorted out could be lines like
Firefox_wallpaper.png: PNG image data, 1914 x 818, 8-bit/color RGB, non-interlaced
FW6AK115310.pdf: PDF document, version 1.3
itinerary-hotel-3S69Q2.RTF: Rich Text Format data, version 1, ANSI
)
Please be more specific if You can.

/Lakris
Reply With Quote
  #7  
Old 11-14-2007
Read Only
 

Join Date: Nov 2007
Posts: 165
Use a language which has the isctrl() function; I think Perl does.
Reply With Quote
Google The UNIX and Linux Forums
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes




All times are GMT -7. The time now is 07:10 AM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited.
The UNIX and Linux Forums Content Copyright ©1993-2008. All Rights Reserved.Ad Management by RedTyger Visit The Complex Event Processing Blog

Content Relevant URLs by vBSEO 3.2.0