Sponsored Content
Top Forums Shell Programming and Scripting Remove lines with non-chinese characters from xml file Post 302500867 by Chubler_XL on Tuesday 1st of March 2011 10:12:50 PM
Old 03-01-2011
Assumption here is file is stored in UTF-8 (this wont work for unicode), also my awk needed a 0 in front of the 337:

Code:
awk '{f=0;for(i=1;i<=length;i++)if(substr($0,i,1)>"\0337")f=1}f' file

EDIT:
Further testing the \0 dosn't work.

The range of chineese unicode chars is 4E00 thru 9FFF (344 270 200 thru 351 277 277) so the test should be >"\343" and <"\352" (to avoid picking up any 4 char UTF-8 codes):

Code:
awk '{f=0;for(i=1;i<=length;i++)if(substr($0,i,1)>"\343"&&substr($0,i,1)<"\352")f=1}f' file


Last edited by Chubler_XL; 03-01-2011 at 11:48 PM..
 

10 More Discussions You Might Find Interesting

1. Solaris

Chinese characters on Sol 2.7

Hi there, I need to get a Chinese disclaimer attached to an email on a Solaris 2.7 box. The disclaimer we use is in English and stored as a text file although I've been asked to see if we can add the Chinsese one? Is it simply just a matter of adding the Chinese locale to the OS or is there... (1 Reply)
Discussion started by: Hayez
1 Replies

2. Filesystems, Disks and Memory

Chinese characters in Vi editor

Dear All, I have excel files containing Chinese characters. I have a requirement to display the contents of both the English and the Chinese files in the Unix box using the vi editor. But I when I try to open the Chinese files, the characters are junk. Can one of you help me in getting rid of... (4 Replies)
Discussion started by: chrisanto_2000
4 Replies

3. Shell Programming and Scripting

How to remove xml namespace from xml file using shell script?

I have an xml file: <AutoData xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <Table1> <Data1 10 </Data1> <Data2 20 </Data2> <Data3 40 </Data3> <Table1> </AutoData> and I have to remove the portion xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" only. I tried using sed... (10 Replies)
Discussion started by: Gary1978
10 Replies

4. Shell Programming and Scripting

Remove lines from XML based on condition

Hi, I need to remove some lines from an XML file is the value within a tag is empty. Imagine this scenario, <acd><acdID>2</acdID><logon></logon></acd> <acd><acdID></acdID><logon></logon></acd> <acd><acdID></acdID><logon></logon></acd> <acd><acdID></acdID><logon></logon></acd> I... (3 Replies)
Discussion started by: giles.cardew
3 Replies

5. Solaris

Chinese / Global characters problem

Hello, I have large xml files with chinese characters on a windows box and they need to be FTP'd to UNIX box. When I ftp the file, the chinese text converts to junk characters. I tried changing my setting on putty to UTF-8, but still cannot view the correct text. Is there something I need to... (4 Replies)
Discussion started by: tokool420
4 Replies

6. Shell Programming and Scripting

How to remove some xml tag lines using shell script

I have existing XML file as below, now based on input string in shell script on workordercode i need to create a seprate xml file for e.g if we pass the input string as 184851 then it find the tag data from <workOrder>..</workOrder> and write to a new file and similarly next time if i pass the... (3 Replies)
Discussion started by: balrajg
3 Replies

7. Shell Programming and Scripting

How to add the multiple lines of xml tags before a particular xml tag in a file

Hi All, I'm stuck with adding multiple lines(irrespective of line number) to a file before a particular xml tag. Please help me. <A>testing_Location</A> <value>LA</value> <zone>US</zone> <B>Region</B> <value>Russia</value> <zone>Washington</zone> <C>Country</C>... (0 Replies)
Discussion started by: mjavalkar
0 Replies

8. Shell Programming and Scripting

How can I remove some xml tag lines using shell script?

Hi All, My name is Prathyu and I am working as a ETL develper. I have one requirement to create a XML file based on the provided XSD file. As per the Datastage standards Key(repeatable) field does not contain any Null values so I am inserting some dummy tag line to that XML file. ... (14 Replies)
Discussion started by: Prathyu
14 Replies

9. Red Hat

How to display Chinese and Japanese Characters on Rhel 6?

Hello, I'm trying to figure out how to display Chinese and Japanese Characters on my RHEL 6 Console. There is no more "bogl-bterm" for RHEL6, that is not supported anymore. Is there any way that I could display them? Thank you. (2 Replies)
Discussion started by: pjeedu2247
2 Replies

10. SuSE

Display Chinese and Japanese characters on my SLES console.

Hello, I'm trying to figure out how to display Chinese and Japanese Characters on my SLES 11 Console. Is there any way that I could display those characters on my console? Thank you. (3 Replies)
Discussion started by: pjeedu2247
3 Replies
iconv_mac_cyr(5)					Standards, Environments, and Macros					  iconv_mac_cyr(5)

NAME
iconv_mac_cyr - code set conversion tables for Macintosh Cyrillic DESCRIPTION
The following code set conversions are supported: +---------------------------------------------------------------------+ | Code Set Conversions Supported | +--------------+--------+--------------+--------+---------------------+ | Code |Symbol |Target Code |Symbol | Target | +--------------+--------+--------------+--------+---------------------+ |Output | | | | | +--------------+--------+--------------+--------+---------------------+ |Mac Cyrillic |mac |ISO 8859-5 |iso5 | ISO 8859-5 Cyrillic | +--------------+--------+--------------+--------+---------------------+ |Mac Cyrillic |mac |KOI8-R |koi8 | KOI8-R | +--------------+--------+--------------+--------+---------------------+ |Mac Cyrillic |mac |PC Cyrillic |alt | Alternative PC | +--------------+--------+--------------+--------+---------------------+ |Cyrillic | | | | | +--------------+--------+--------------+--------+---------------------+ |Mac Cyrillic |mac |MS 1251 |win5 | Windows Cyrillic | +--------------+--------+--------------+--------+---------------------+ CONVERSIONS
The conversions are performed according to the following tables. All values in the tables are given in octal. Mac Cyrillic to ISO 8859-5 For the conversion of Mac Cyrillic to ISO 8859-5, all characters not in the following table are mapped unchanged. +-----------------------------------------------------------------+ | Conversions Performed | | Mac Cyrillic ISO 8859-5 Mac Cyrillic ISO 8859-5 | |24 4 276 252 | |200 260 277 372 | |201 261 300 370 | |202 262 301 245 | |203 263 302-311 40 | |204 264 312 240 | |205 265 313 242 | |206 266 314 362 | |207 267 315 254 | |210 270 316 374 | |211 271 317 365 | |212 272 320-327 40 | |213 273 330 256 | |214 274 331 376 | |215 275 332 257 | |216 276 333 377 | |217 277 334 360 | |220 300 335 241 | |221 301 336 361 | |222 302 337 357 | |223 303 340 320 | |224 304 341 321 | |225 305 342 322 | |226 306 343 323 | |227 307 344 324 | |230 310 345 325 | |231 311 346 326 | |232 312 347 327 | |233 313 350 330 | |234 314 351 331 | |235 315 352 332 | |236 316 353 333 | |237 317 354 334 | |240-246 40 355 335 | |247 246 356 336 | |250-252 40 357 337 | |253 242 360 340 | |254 362 361 341 | |255 40 362 342 | |256 243 363 343 | |257 363 364 344 | |260-263 40 365 345 | |264 366 366 346 | |265-266 40 367 347 | |267 250 370 350 | |270 244 371 351 | |271 364 372 352 | |272 247 373 353 | |273 367 374 354 | |274 251 375 355 | |275 371 376 356 | |375 370 | +-----------------------------------------------------------------+ Mac Cyrillic to KOI8-R For the conversion of Mac Cyrillic to KOI8-R, all characters not in the following table are mapped unchanged. +-----------------------------------------------------------------+ | | Conversions|Performed | | | Mac Cyrillic | KOI8-R | Mac Cyrillic | KOI8-R | |24 | 4 |276 |272 | |200 | 341 |277 |252 | |201 | 342 |300 |250 | |202 | 367 |301 |265 | |203 | 347 |302-311 |40 | |204 | 344 |312 |240 | |205 | 345 |313 |261 | |206 | 366 |314 |241 | |207 | 372 |315 |274 | |210 | 351 |316 |254 | |211 | 352 |317 |245 | |212 | 353 |320-327 |40 | |213 | 354 |330 |276 | |214 | 355 |331 |256 | |215 | 356 |332 |277 | |216 | 357 |333 |257 | |217 | 360 |334 |260 | |220 | 362 |335 |263 | |221 | 363 |336 |243 | |222 | 364 |337 |321 | |223 | 365 |340 |301 | |224 | 346 |341 |302 | |225 | 350 |342 |327 | |226 | 343 |343 |307 | |227 | 376 |344 |304 | |230 | 373 |345 |305 | |231 | 375 |346 |326 | |232 | 377 |347 |332 | |233 | 371 |350 |311 | |234 | 370 |351 |312 | |235 | 374 |352 |313 | |236 | 340 |353 |314 | |237 | 361 |354 |315 | |240-246 | 40 |355 |316 | |247 | 266 |356 |317 | |250-252 | 40 |357 |320 | |253 | 261 |360 |322 | |254 | 241 |361 |323 | |255 | 40 |362 |324 | |256 | 262 |363 |325 | |257 | 242 |364 |306 | |260-263 | 40 |365 |310 | |264 | 246 |366 |303 | |265-266 | 40 |367 |336 | |267 | 270 |370 |333 | |270 | 264 |371 |335 | |271 | 244 |372 |337 | |272 | 267 |373 |331 | |273 | 247 |374 |330 | |274 | 271 |375 |334 | |275 | 251 |376 |300 | |375 | 370 | | | +---------------+----------------+----------------+---------------+ Mac Cyrillic to PC Cyrillic For the conversion of Mac Cyrillic to PC Cyrillic, all characters not in the following table are mapped unchanged. +-----------------------------------------------------------------+ | | Conversions|Performed | | | Mac Cyrillic | PC Cyrillic | Mac Cyrillic | PC Cyrillic | |24 | 4 |355 |255 | |240-334 | 40 |356 |256 | |335 | 360 |357 |257 | |336 | 361 |360 |340 | |337 | 357 |361 |341 | |340 | 240 |362 |342 | |341 | 241 |363 |343 | |342 | 242 |364 |344 | |343 | 243 |365 |345 | |344 | 244 |366 |346 | |345 | 245 |367 |347 | |346 | 246 |370 |350 | |347 | 247 |371 |351 | |350 | 250 |372 |352 | |351 | 251 |373 |353 | |352 | 252 |374 |354 | |353 | 253 |375 |355 | |354 | 254 |376 |356 | |303 | 366 | | | +---------------+----------------+----------------+---------------+ Mac Cyrillic to MS 1251 For the conversion of Mac Cyrillic to MS 1251, all characters not in the following table are mapped unchanged. +-----------------------------------------------------------------+ | | Conversions|Performed | | | Mac Cyrillic | MS 1251 | Mac Cyrillic | MS 1251 | |24 | 4 |255 |40 | |200 | 300 |256 |201 | |201 | 301 |257 |203 | |202 | 302 |260-263 |40 | |203 | 303 |264 |263 | |204 | 304 |266 |264 | |205 | 305 |267 |243 | |206 | 306 |270 |252 | |207 | 307 |271 |272 | |210 | 310 |272 |257 | |211 | 311 |273 |277 | |212 | 312 |274 |212 | |213 | 313 |275 |232 | |214 | 314 |276 |214 | |215 | 315 |277 |234 | |216 | 316 |300 |274 | |217 | 317 |301 |275 | |220 | 320 |302 |254 | |221 | 321 |303-306 |40 | |222 | 322 |307 |253 | |223 | 323 |310 |273 | |224 | 324 |311 |205 | |225 | 325 |312 |240 | |226 | 326 |313 |200 | |227 | 327 |314 |220 | |230 | 330 |315 |215 | |231 | 331 |316 |235 | |232 | 332 |317 |276 | |233 | 333 |320 |226 | |234 | 334 |321 |227 | |235 | 335 |322 |223 | |236 | 336 |323 |224 | |237 | 337 |324 |221 | |240 | 206 |325 |222 | |241 | 260 |326 |40 | |242 | 245 |327 |204 | |243 | 40 |330 |241 | |244 | 247 |331 |242 | |245 | 267 |332 |217 | |246 | 266 |333 |237 | |247 | 262 |334 |271 | |250 | 256 |335 |250 | |252 | 231 |336 |270 | |253 | 200 |337 |377 | |254 | 220 |362 |324 | +---------------+----------------+----------------+---------------+ FILES
/usr/lib/iconv/*.so conversion modules /usr/lib/iconv/*.t conversion tables /usr/lib/iconv/iconv_data list of conversions supported by conversion tables SEE ALSO
iconv(1), iconv(3C), iconv(5) SunOS 5.10 18 Apr 1997 iconv_mac_cyr(5)
All times are GMT -4. The time now is 06:04 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy