The UNIX and Linux Forums  

Go Back   The UNIX and Linux Forums > Top Forums > UNIX for Dummies Questions & Answers
Google UNIX.COM


UNIX for Dummies Questions & Answers If you're not sure where to post a UNIX or Linux question, post it here. All UNIX and Linux newbies welcome !!

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Dont want to change the codepage of a unicode file shibajighosh AIX 0 05-12-2008 09:21 PM
Find Unicode Character in File azelinsk Shell Programming and Scripting 1 04-11-2008 06:46 PM
unicode problem Akimaki High Level Programming 3 03-19-2007 07:00 PM
How to display unicode characters / unicode string jackdorso High Level Programming 3 05-20-2005 10:09 AM
unicode rein UNIX for Advanced & Expert Users 2 01-18-2005 10:40 AM

Reply
 
Submit Tools LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 01-01-2007
Registered User
 

Join Date: Jan 2007
Location: British Columbia
Posts: 5
Stumble this Post!
grep and UNICODE (utf-16) file

I'm using shell scripting in Applescript. When searching a file with the ANSEL character set (for GEDCOM files) using (grep '1 CHAR ANSEL' filepath) gives the expected result. When searching a UNICODE formatted file (utf-16), searching for text known to exist in the file using (grep '1 CHAR UNICODE' filepath) does not find the string in the file.

How can I grep this UNICODE file? Thanks in advance.
Reply With Quote
Forum Sponsor
  #2 (permalink)  
Old 01-02-2007
Technorati Master
 

Join Date: Mar 2005
Location: Large scale systems...
Posts: 2,572
Stumble this Post!
i think utf (8/16) format files doenst have the default end-of-line identifier in such a case, the usual tools applied to other text files cannot be used with.

Try running a wc -l on the file and post the output of number of lines, for pure utf formatted files it would return a zero; for such situations customized codes need to be written.

Reply With Quote
  #3 (permalink)  
Old 01-03-2007
Registered User
 

Join Date: Jan 2007
Location: British Columbia
Posts: 5
Stumble this Post!
Thanks. It does turn out to be zero. I haven't a clue what to do here. All I want to do is search such a file for the string "1 CHAR UNICODE". Any help would be appreciated.
Reply With Quote
  #4 (permalink)  
Old 01-03-2007
...@...
 

Join Date: Feb 2004
Location: NM
Posts: 3,851
Stumble this Post!
See if you have iconv on your box - you can convert utf-16 to utf-8 which is searchable by grep. iconv should be there.
Code:
iconv -f UTF-16 -t UTF-8 myfile > temporary_file
grep '1 CHAR UNICODE' temporary_file
Reply With Quote
  #5 (permalink)  
Old 01-03-2007
Registered User
 

Join Date: Jan 2007
Location: British Columbia
Posts: 5
Stumble this Post!
thanks Jim. My Mac installation does have iconv. I'll use a pipe instead, to avoid having to delete a temp file. All I need is a true or false:

iconv -f UTF-16 -t UTF-8 myfile | grep '1 CHAR UNICODE'

works for me. I am such a novice I don't know how to find a list of commands available to me.
Reply With Quote
Google The UNIX and Linux Forums
Reply

Thread Tools
Display Modes




All times are GMT -7. The time now is 08:36 PM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited.
The UNIX and Linux Forums Content Copyright ©1993-2008 The CEP Blog All Rights Reserved -Ad Management by RedTyger Visit The Global Fact Book

Content Relevant URLs by vBSEO 3.2.0