Remove lines with non-chinese characters from xml file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Remove lines with non-chinese characters from xml file
# 1  
Old 03-01-2011
Remove lines with non-chinese characters from xml file

Hi there, I'm looking for a way to remove all lines that don't contain chinese characters from an xml file.
Example: http://pastebin.com/8KzSbCKe
The result should be like this: http://pastebin.com/ZywXsNhx
Only lines that don't contain chinese characters should be deleted. If theres a mix of chinese and latin characters the line shouldn't get deleted.
I thought about using sed but I have no idea how...
Thanks!

Last edited by fpmurphy; 03-01-2011 at 10:48 PM.. Reason: httx ->http
# 2  
Old 03-01-2011
Try...
Code:
awk '{f=0;for(i=1;i<=length;i++)if(substr($0,i,1)>"\337")f=1}f' file

# 3  
Old 03-01-2011
Assumption here is file is stored in UTF-8 (this wont work for unicode), also my awk needed a 0 in front of the 337:

Code:
awk '{f=0;for(i=1;i<=length;i++)if(substr($0,i,1)>"\0337")f=1}f' file

EDIT:
Further testing the \0 dosn't work.

The range of chineese unicode chars is 4E00 thru 9FFF (344 270 200 thru 351 277 277) so the test should be >"\343" and <"\352" (to avoid picking up any 4 char UTF-8 codes):

Code:
awk '{f=0;for(i=1;i<=length;i++)if(substr($0,i,1)>"\343"&&substr($0,i,1)<"\352")f=1}f' file


Last edited by Chubler_XL; 03-01-2011 at 11:48 PM..
# 4  
Old 03-02-2011
Quote:
Originally Posted by Chubler_XL
...
The range of chineese unicode chars is 4E00 thru 9FFF (344 270 200 thru 351 277 277) so the test should be >"\343" and <"\352" (to avoid picking up any 4 char UTF-8 codes):

Code:
awk '{f=0;for(i=1;i<=length;i++)if(substr($0,i,1)>"\343"&&substr($0,i,1)<"\352")f=1}f' file

Thank you! Works perfectly!
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. SuSE

Display Chinese and Japanese characters on my SLES console.

Hello, I'm trying to figure out how to display Chinese and Japanese Characters on my SLES 11 Console. Is there any way that I could display those characters on my console? Thank you. (3 Replies)
Discussion started by: pjeedu2247
3 Replies

2. Red Hat

How to display Chinese and Japanese Characters on Rhel 6?

Hello, I'm trying to figure out how to display Chinese and Japanese Characters on my RHEL 6 Console. There is no more "bogl-bterm" for RHEL6, that is not supported anymore. Is there any way that I could display them? Thank you. (2 Replies)
Discussion started by: pjeedu2247
2 Replies

3. Shell Programming and Scripting

How can I remove some xml tag lines using shell script?

Hi All, My name is Prathyu and I am working as a ETL develper. I have one requirement to create a XML file based on the provided XSD file. As per the Datastage standards Key(repeatable) field does not contain any Null values so I am inserting some dummy tag line to that XML file. ... (14 Replies)
Discussion started by: Prathyu
14 Replies

4. Shell Programming and Scripting

How to add the multiple lines of xml tags before a particular xml tag in a file

Hi All, I'm stuck with adding multiple lines(irrespective of line number) to a file before a particular xml tag. Please help me. <A>testing_Location</A> <value>LA</value> <zone>US</zone> <B>Region</B> <value>Russia</value> <zone>Washington</zone> <C>Country</C>... (0 Replies)
Discussion started by: mjavalkar
0 Replies

5. Shell Programming and Scripting

How to remove some xml tag lines using shell script

I have existing XML file as below, now based on input string in shell script on workordercode i need to create a seprate xml file for e.g if we pass the input string as 184851 then it find the tag data from <workOrder>..</workOrder> and write to a new file and similarly next time if i pass the... (3 Replies)
Discussion started by: balrajg
3 Replies

6. Solaris

Chinese / Global characters problem

Hello, I have large xml files with chinese characters on a windows box and they need to be FTP'd to UNIX box. When I ftp the file, the chinese text converts to junk characters. I tried changing my setting on putty to UTF-8, but still cannot view the correct text. Is there something I need to... (4 Replies)
Discussion started by: tokool420
4 Replies

7. Shell Programming and Scripting

Remove lines from XML based on condition

Hi, I need to remove some lines from an XML file is the value within a tag is empty. Imagine this scenario, <acd><acdID>2</acdID><logon></logon></acd> <acd><acdID></acdID><logon></logon></acd> <acd><acdID></acdID><logon></logon></acd> <acd><acdID></acdID><logon></logon></acd> I... (3 Replies)
Discussion started by: giles.cardew
3 Replies

8. Shell Programming and Scripting

How to remove xml namespace from xml file using shell script?

I have an xml file: <AutoData xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <Table1> <Data1 10 </Data1> <Data2 20 </Data2> <Data3 40 </Data3> <Table1> </AutoData> and I have to remove the portion xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" only. I tried using sed... (10 Replies)
Discussion started by: Gary1978
10 Replies

9. Filesystems, Disks and Memory

Chinese characters in Vi editor

Dear All, I have excel files containing Chinese characters. I have a requirement to display the contents of both the English and the Chinese files in the Unix box using the vi editor. But I when I try to open the Chinese files, the characters are junk. Can one of you help me in getting rid of... (4 Replies)
Discussion started by: chrisanto_2000
4 Replies

10. Solaris

Chinese characters on Sol 2.7

Hi there, I need to get a Chinese disclaimer attached to an email on a Solaris 2.7 box. The disclaimer we use is in English and stored as a text file although I've been asked to see if we can add the Chinsese one? Is it simply just a matter of adding the Chinese locale to the OS or is there... (1 Reply)
Discussion started by: Hayez
1 Replies
Login or Register to Ask a Question