Sponsored Content
Top Forums Shell Programming and Scripting Detect lines beginning with double-byte characters (Japanese) and delete Post 302371664 by durden_tyler on Monday 16th of November 2009 03:22:08 AM
Old 11-16-2009
Code:
perl -lne 'print if ord $_ <= 127' file

tyler_durden
 

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

delete lines from file2 beginning w/file1

I've been searching around here and other places, but can't put this together... I've got a unique list of words in file 1 (one word on each line). I need to delete each line in file2 that begins with the word in file1. I started this way, but want to know how to use file1 words instead... (13 Replies)
Discussion started by: michieka
13 Replies

2. Shell Programming and Scripting

delete zero byte file

Hello I have a requirement where i need to find the zero byte size file in the directory and need to delete that zero byte file. Thanks (2 Replies)
Discussion started by: dsdev_123
2 Replies

3. Shell Programming and Scripting

Email a File from UNIX which has Japanese characters in it

Hi, I'm trying to email from UNIX, a file which has Japanese characters in it (i,e. in the contents -- not the filename). The file gets emailed, but the Japanese characters do not show up properly when I open the file on Windows in my Outlook mailbox. I searched a lot of forums but still... (4 Replies)
Discussion started by: jainkirti
4 Replies

4. Shell Programming and Scripting

How to delete all lines with less then 32 characters from a textfile?

I need to delete all lines with less then 32 characters from a textfile. :) (15 Replies)
Discussion started by: anna428
15 Replies

5. Shell Programming and Scripting

Removing one or more blank characters from beginning of a line

Hi, I was trying to remove the blank from beginning of a line. when I try: sed 's/^ +//' filename it does not work but when I try sed 's/^ *//' filename it works But I think the first command should have also replaced any line with one or more blanks. Kindly help me in understanding... (5 Replies)
Discussion started by: babom
5 Replies

6. Red Hat

How to display Chinese and Japanese Characters on Rhel 6?

Hello, I'm trying to figure out how to display Chinese and Japanese Characters on my RHEL 6 Console. There is no more "bogl-bterm" for RHEL6, that is not supported anymore. Is there any way that I could display them? Thank you. (2 Replies)
Discussion started by: pjeedu2247
2 Replies

7. SuSE

Display Chinese and Japanese characters on my SLES console.

Hello, I'm trying to figure out how to display Chinese and Japanese Characters on my SLES 11 Console. Is there any way that I could display those characters on my console? Thank you. (3 Replies)
Discussion started by: pjeedu2247
3 Replies

8. UNIX for Beginners Questions & Answers

Removing characters from beginning of multiple files

Hi, I have been searching how to do this but I can't seem to find how to do it. Hopefully someone can help. I have multiplr files, 100's example 12345-zxys.213423.zyz.txt. I want to be able to take all these files and remove the first '12345-' from each of the files. '12345-' these characters... (5 Replies)
Discussion started by: israr75
5 Replies

9. UNIX for Beginners Questions & Answers

Inserting n characters to beginning of line if match

I would like to insert n number of characters at the beginning of each line that starts with a given character. If possible, I would be most appreciative for a sed or awk solution. Given the data below, I would like to be able to insert either 125 spaces or 125 "-" at the beginning of every line... (6 Replies)
Discussion started by: jvoot
6 Replies
eucJP(5)							File Formats Manual							  eucJP(5)

NAME
eucJP - A character encoding system (codeset) for Japanese DESCRIPTION
The Japanese EUC (Extended UNIX Code), or eucJP, codeset consists of the following character sets: CS0 (ASCII or JIS Roman) CS1 (JIS X0208) CS2 (JIS Katakana) CS3 (JIS X0212) CS0 is a primary character set. CS1, CS2, and CS3 are supplementary character sets. The MSB (Most Significant Bit) of the byte that repre- sents a character in CS0 is set off, whereas the MSB of the bytes that represent characters in CS1, CS2, and CS3 is set on. Japanese EUC Encoding The representation of ASCII/JIS Roman and JIS X0208 characters in the Japanese EUC codeset is similar to how those characters are repre- sented in the DEC Kanji codeset (refer to deckanji(5)). The two additional character sets, JIS Katakana and JIS X0212, are encoded in the Japanese EUC codeset by making use of the SS2 (Single Shift 2) and SS3 (Single Shift 3) control characters. The Japanese EUC codeset provides the following two areas for representation of user-defined characters (UDC): ----------------------------------------------------------- Area Usage Row Range Number of Char- Code Range acters ----------------------------------------------------------- JIS X0208 85-94 940 F5A1-FEFE JIS X0212 78-94 1598 SS3 [EEA1-FEFE] ----------------------------------------------------------- The representation of UDCs on these two code planes is identical to that for standard characters that occupy the same planes. Code ranges distinguish between UDCs and standard JIS X0208 and JIS X0212 characters that occupy the same plane. Currently, the operating system does not support JIS X0212 (JIS Supplementary) characters. Codeset Conversion The following codeset converter pairs are available for converting Japanese characters between eucJP and other encoding formats. Refer to iconv_intro(5) for an introduction to codeset conversion. For more information about the other codeset for which eucJP is the input or out- put, see the reference page specified in the list item. deckanji_eucJP, eucJP_deckanji Converting from and to the DEC Kanji codeset: deckanji(5). ISO-2022-JP_eucJP, eucJP_ISO-2022-JP Converting from and to the ISO 2022 Japanese codeset: iso2022jp(5). ISO-2022-JPext_eucJP, eucJP_ISO-2022-JPext Converting from and to the ISO 2022 Japanese Extended codeset: iso2022jp(5). JIS7_eucJP, eucJP_JIS7 Converting from and to the JIS7 codeset: jiskanji(5). SJIS_eucJP, eucJP_SJIS Converting from and to the Shift JIS codeset: SJIS(5). Shift JIS encoding is identical to the encoding used in the Microsoft PC code page for Japanese. You can therefore use these con- verters to convert Japanese text from and to Japanese code-page format. See code_page(5) for more information about how the operat- ing system supports PC code pages. sdeckanji_eucJP, eucJP_sdeckanji Converting from and to the Super DEC Kanji codeset: sdeckanji(5). UCS-2_eucJP, eucJP_UCS-2 Converting from and to UCS-2 format: Unicode(5). UCS-4_eucJP, eucJP_UCS-4 Converting from and to UCS-4 format: Unicode(5). UTF-8_eucJP, eucJP_UTF-8 Converting from and to UTF--8 format: Unicode(5). Japanese EUC Fonts For display devices, the operating system supports Japanese EUC characters by converting Japanese EUC code to DEC Kanji code and then using the fonts for DEC Kanji. Because the CS3 character set is not supported by the DEC Kanji codeset, CS3 characters cannot be displayed. The operating system does not provide PostScript fonts for Japanese EUC. Some printers support Japanese with printer-resident fonts and print filters perform codeset conversion, if required, for the encoding used in the file input to the print job. For some other printers, you can set up a print filter to convert Japanese bitmap fonts to PostScript. Refer to i18n_printing(5) for introductory information about your printing options. SEE ALSO
Commands: locale(1) Others: ascii(5), code_page(5), i18n_intro(5), i18n_printing(5), iconv_intro(5), deckanji(5), iso2022jp(5), Japanese(5), jiskanji(5), l10n_intro(5), sdeckanji(5), shiftjis(5), Unicode(5) eucJP(5)
All times are GMT -4. The time now is 10:18 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy