Sponsored Content
Top Forums Shell Programming and Scripting Remove lines with non-chinese characters from xml file Post 302501004 by g4rb4g3 on Wednesday 2nd of March 2011 08:31:20 AM
Old 03-02-2011
Quote:
Originally Posted by Chubler_XL
...
The range of chineese unicode chars is 4E00 thru 9FFF (344 270 200 thru 351 277 277) so the test should be >"\343" and <"\352" (to avoid picking up any 4 char UTF-8 codes):

Code:
awk '{f=0;for(i=1;i<=length;i++)if(substr($0,i,1)>"\343"&&substr($0,i,1)<"\352")f=1}f' file

Thank you! Works perfectly!
 

10 More Discussions You Might Find Interesting

1. Solaris

Chinese characters on Sol 2.7

Hi there, I need to get a Chinese disclaimer attached to an email on a Solaris 2.7 box. The disclaimer we use is in English and stored as a text file although I've been asked to see if we can add the Chinsese one? Is it simply just a matter of adding the Chinese locale to the OS or is there... (1 Reply)
Discussion started by: Hayez
1 Replies

2. Filesystems, Disks and Memory

Chinese characters in Vi editor

Dear All, I have excel files containing Chinese characters. I have a requirement to display the contents of both the English and the Chinese files in the Unix box using the vi editor. But I when I try to open the Chinese files, the characters are junk. Can one of you help me in getting rid of... (4 Replies)
Discussion started by: chrisanto_2000
4 Replies

3. Shell Programming and Scripting

How to remove xml namespace from xml file using shell script?

I have an xml file: <AutoData xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <Table1> <Data1 10 </Data1> <Data2 20 </Data2> <Data3 40 </Data3> <Table1> </AutoData> and I have to remove the portion xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" only. I tried using sed... (10 Replies)
Discussion started by: Gary1978
10 Replies

4. Shell Programming and Scripting

Remove lines from XML based on condition

Hi, I need to remove some lines from an XML file is the value within a tag is empty. Imagine this scenario, <acd><acdID>2</acdID><logon></logon></acd> <acd><acdID></acdID><logon></logon></acd> <acd><acdID></acdID><logon></logon></acd> <acd><acdID></acdID><logon></logon></acd> I... (3 Replies)
Discussion started by: giles.cardew
3 Replies

5. Solaris

Chinese / Global characters problem

Hello, I have large xml files with chinese characters on a windows box and they need to be FTP'd to UNIX box. When I ftp the file, the chinese text converts to junk characters. I tried changing my setting on putty to UTF-8, but still cannot view the correct text. Is there something I need to... (4 Replies)
Discussion started by: tokool420
4 Replies

6. Shell Programming and Scripting

How to remove some xml tag lines using shell script

I have existing XML file as below, now based on input string in shell script on workordercode i need to create a seprate xml file for e.g if we pass the input string as 184851 then it find the tag data from <workOrder>..</workOrder> and write to a new file and similarly next time if i pass the... (3 Replies)
Discussion started by: balrajg
3 Replies

7. Shell Programming and Scripting

How to add the multiple lines of xml tags before a particular xml tag in a file

Hi All, I'm stuck with adding multiple lines(irrespective of line number) to a file before a particular xml tag. Please help me. <A>testing_Location</A> <value>LA</value> <zone>US</zone> <B>Region</B> <value>Russia</value> <zone>Washington</zone> <C>Country</C>... (0 Replies)
Discussion started by: mjavalkar
0 Replies

8. Shell Programming and Scripting

How can I remove some xml tag lines using shell script?

Hi All, My name is Prathyu and I am working as a ETL develper. I have one requirement to create a XML file based on the provided XSD file. As per the Datastage standards Key(repeatable) field does not contain any Null values so I am inserting some dummy tag line to that XML file. ... (14 Replies)
Discussion started by: Prathyu
14 Replies

9. Red Hat

How to display Chinese and Japanese Characters on Rhel 6?

Hello, I'm trying to figure out how to display Chinese and Japanese Characters on my RHEL 6 Console. There is no more "bogl-bterm" for RHEL6, that is not supported anymore. Is there any way that I could display them? Thank you. (2 Replies)
Discussion started by: pjeedu2247
2 Replies

10. SuSE

Display Chinese and Japanese characters on my SLES console.

Hello, I'm trying to figure out how to display Chinese and Japanese Characters on my SLES 11 Console. Is there any way that I could display those characters on my console? Thank you. (3 Replies)
Discussion started by: pjeedu2247
3 Replies
iconv_1251(5)						Standards, Environments, and Macros					     iconv_1251(5)

NAME
iconv_1251 - code set conversion tables for MS 1251 (Windows Cyrillic) DESCRIPTION
The following code set conversions are supported: +-------------------------------------------------------------------------+ | Code Set Conversions Supported | +--------------+--------+--------------+--------+-------------------------+ | Code |Symbol |Target Code |Symbol | Target Output | +--------------+--------+--------------+--------+-------------------------+ |MS 1251 |win5 |ISO 8859-5 |iso5 | ISO 8859-5 Cyrillic | +--------------+--------+--------------+--------+-------------------------+ |MS 1251 |win5 |KOI8-R |koi8 | KOI8-R | +--------------+--------+--------------+--------+-------------------------+ |MS 1251 |win5 |PC Cyrillic |alt | Alternative PC Cyrillic | +--------------+--------+--------------+--------+-------------------------+ |MS 1251 |win5 |Mac Cyrillic |mac | Macintosh Cyrillic | +--------------+--------+--------------+--------+-------------------------+ CONVERSIONS
The conversions are performed according to the following tables. All values in the tables are given in octal. MS 1251 to ISO 8859-5 For the conversion of MS 1251 to ISO 8859-5, all characters not in the following table are mapped unchanged. +-----------------------------------------------------------------+ | | Conversions|Performed | | | MS 1251 | ISO 8859-5 | MS 1251 | ISO 8859-5 | |24 | 4 |310 |270 | |200 | 242 |311 |271 | |201 | 243 |312 |272 | |202 | 40 |313 |273 | |203 | 363 |314 |274 | |204-207 | 40 |315 |275 | |210 | 255 |316 |276 | |211 | 40 |317 |277 | |212 | 251 |320 |300 | |213 | 40 |321 |301 | |214 | 252 |322 |302 | |215 | 254 |323 |303 | |216 | 253 |324 |304 | |217 | 257 |325 |305 | |220 | 362 |326 |306 | |221-227 | 40 |327 |307 | |230 | 255 |330 |310 | |231 | 40 |331 |311 | |232 | 371 |332 |312 | |233 | 40 |333 |313 | |234 | 372 |334 |314 | |235 | 374 |335 |315 | |236 | 373 |336 |316 | |237 | 377 |337 |317 | |241 | 256 |340 |320 | |242 | 376 |341 |321 | |243 | 250 |342 |322 | |244-247 | 40 |343 |323 | |250 | 241 |344 |324 | |251 | 40 |345 |325 | |252 | 244 |346 |326 | |253-254 | 40 |347 |327 | |255 | 55 |350 |330 | |256 | 40 |351 |331 | |257 | 247 |352 |332 | |260-261 | 40 |353 |333 | |262 | 246 |354 |334 | |263 | 366 |355 |335 | |264-267 | 40 |356 |336 | |270 | 361 |357 |337 | |271 | 360 |360 |340 | |272 | 364 |361 |341 | |273 | 40 |362 |342 | |274 | 370 |363 |343 | |275 | 245 |364 |344 | |276 | 365 |365 |345 | |277 | 367 |366 |346 | |300 | 260 |367 |347 | |301 | 261 |370 |350 | |302 | 262 |371 |351 | |303 | 263 |372 |352 | |304 | 264 |373 |353 | |305 | 265 |374 |354 | |306 | 266 |375 |355 | |307 | 267 |376 |356 | +---------------+----------------+----------------+---------------+ MS 1251 to KOI8-R For the conversion of MS 1251 to KOI8-R , all characters not in the following table are mapped unchanged. +-----------------------------------------------------------------+ | | Conversions|Performed | | | MS 1251 | KOI8-R | MS 1251 | KOI8-R | |24 | 4 |310 |351 | |200 | 261 |311 |352 | |201 | 262 |312 |353 | |202 | 40 |313 |354 | |203 | 242 |314 |355 | |204-207 | 40 |315 |356 | |210 | 255 |316 |357 | |211 | 40 |317 |360 | |212 | 271 |320 |362 | |213 | 40 |321 |363 | |214 | 272 |322 |364 | |215 | 274 |323 |365 | |216 | 273 |324 |346 | |217 | 277 |325 |350 | |220 | 241 |326 |343 | |221-227 | 40 |327 |376 | |230 | 255 |330 |373 | |231 | 40 |331 |375 | |232 | 251 |332 |377 | |233 | 40 |333 |371 | |234 | 252 |334 |370 | |235 | 254 |335 |374 | |236 | 253 |336 |340 | |237 | 257 |337 |361 | |241 | 276 |340 |301 | |242 | 256 |341 |302 | |243 | 270 |342 |327 | |244-247 | 40 |343 |307 | |250 | 263 |344 |304 | |251 | 40 |345 |305 | |252 | 264 |346 |326 | |253-254 | 40 |347 |332 | |255 | 55 |350 |311 | |256 | 40 |351 |312 | |257 | 267 |352 |313 | |260-261 | 40 |353 |314 | |262 | 266 |354 |315 | |263 | 246 |355 |316 | |264-267 | 40 |356 |317 | |270 | 243 |357 |320 | |271 | 260 |360 |322 | |272 | 244 |361 |323 | |273 | 40 |362 |324 | |274 | 250 |363 |325 | |275 | 265 |364 |306 | |276 | 245 |365 |310 | |277 | 247 |366 |303 | |300 | 341 |367 |336 | |301 | 342 |370 |333 | |302 | 367 |371 |335 | |303 | 347 |372 |337 | |304 | 344 |373 |331 | |305 | 345 |374 |330 | |306 | 366 |375 |334 | |307 | 372 |376 |300 | +---------------+----------------+----------------+---------------+ MS 1251 to PC Cyrillic For the conversion of MS 1251 to PC Cyrillic, all characters not in the following table are mapped unchanged. +-----------------------------------------------------------------+ | | Conversions|Performed | | | MS 1251 | PC Cyrillic | MS 1251 | PC Cyrillic | |24 | 4 |332 |232 | |200-207 | 40 |333 |233 | |210 | 260 |334 |234 | |211-227 | 40 |335 |235 | |230 | 260 |336 |236 | |231-247 | 40 |337 |237 | |250 | 360 |340 |240 | |251-254 | 40 |341 |241 | |255 | 55 |342 |242 | |256-267 | 40 |343 |243 | |270 | 361 |344 |244 | |271-277 | 40 |345 |245 | |300 | 200 |346 |246 | |301 | 201 |347 |247 | |302 | 202 |350 |250 | |303 | 203 |351 |251 | |304 | 204 |352 |252 | |305 | 205 |353 |253 | |306 | 206 |354 |254 | |307 | 207 |355 |255 | |310 | 210 |356 |256 | |311 | 211 |357 |257 | |312 | 212 |360 |340 | |313 | 213 |361 |341 | |314 | 214 |362 |342 | |315 | 215 |363 |343 | |316 | 216 |364 |344 | |317 | 217 |365 |345 | |320 | 220 |366 |346 | |321 | 221 |367 |347 | |322 | 222 |370 |350 | |323 | 223 |371 |351 | |324 | 224 |372 |352 | |325 | 225 |373 |353 | |326 | 226 |374 |354 | |327 | 227 |375 |355 | |330 | 230 |376 |356 | |331 | 231 | | | +---------------+----------------+----------------+---------------+ MS 1251 to Mac Cyrillic For the conversion of MS 1251 to Mac Cyrillic, all characters not in the following table are mapped unchanged. +-----------------------------------------------------------------+ | | Conversions|Performed | | | MS 1251 | Mac Cyrillic | MS 1251 | Mac Cyrillic | |24 | 4 |260 |241 | |200 | 253 |262 |247 | |201 | 256 |263 |264 | |202 | 40 |264 |266 | |203 | 257 |266 |246 | |204 | 327 |267 |245 | |205 | 311 |270 |336 | |206 | 240 |271 |334 | |207-211 | 40 |272 |271 | |212 | 274 |273 |310 | |213 | 40 |274 |300 | |214 | 276 |275 |301 | |215 | 315 |276 |317 | |216 | 40 |277 |273 | |217 | 332 |300 |200 | |220 | 254 |301 |201 | |221 | 324 |302 |202 | |222 | 325 |303 |203 | |223 | 322 |304 |204 | |224 | 323 |305 |205 | |225 | 40 |306 |206 | |226 | 320 |307 |207 | |227 | 321 |310 |210 | |230 | 40 |311 |211 | |231 | 252 |312 |212 | |232 | 275 |313 |213 | |233 | 40 |314 |214 | |234 | 277 |315 |215 | |235 | 316 |316 |216 | |236 | 40 |317 |217 | |237 | 333 |320 |220 | |240 | 312 |321 |221 | |241 | 330 |322 |222 | |242 | 331 |323 |223 | |243 | 267 |324 |224 | |244 | 377 |325 |225 | |245 | 242 |326 |226 | |246 | 40 |327 |227 | |247 | 244 |330 |230 | |250 | 335 |331 |231 | |252 | 270 |332 |232 | |253 | 307 |333 |233 | |254 | 302 |334 |234 | |255 | 55 |335 |235 | |256 | 250 |336 |236 | |257 | 272 |337 |237 | |355 | 316 | | | +---------------+----------------+----------------+---------------+ FILES
/usr/lib/iconv/*.so conversion modules /usr/lib/iconv/*.t conversion tables /usr/lib/iconv/iconv_data list of conversions supported by conversion tables SEE ALSO
iconv(1), iconv(3C), iconv(5) SunOS 5.10 18 Apr 1997 iconv_1251(5)
All times are GMT -4. The time now is 07:35 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy