Detect lines beginning with double-byte characters (Japanese) and delete


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Detect lines beginning with double-byte characters (Japanese) and delete
# 1  
Old 11-15-2009
Detect lines beginning with double-byte characters (Japanese) and delete

Greetings,

I want to use a script (preferably awk) which determines if the first character in a line is double-byte (as in Japanese or Chinese) and deletes it.

For example:
Quote:
person one
御み
person two
慎しん
When it gets out.
(in the above quote, I see Japanese on my screen for two lines - with 2 characters in the first and 3 characters in the second - you may see random symbols)

becomes:
Quote:
person one
person two
When it gets out.
# 2  
Old 11-15-2009
If you want the end file to just have english characters then you can use this

Code:
 awk '$0 ~ /[A-Za-z]/ {print $0}' abc.txt

Note:- This way it will eliminate other languages also

HTH,
PL
# 3  
Old 11-15-2009
Thanks daptal - but that's not what I need. I need exactly as stated - only detecting lines with a double byte character only in the beginning position.
# 4  
Old 11-15-2009
Do you by chance know the character set the files are written in?
# 5  
Old 11-15-2009
I am the one who wrote the file, so I know where the character sets came from.

Not sure I understand your question though. The non-Japanese characters are all single-byte characters (I am using vim). The Japanese characters use the "Double Byte Character Set (DBCS).

I want to keep it general so that Chinese and Korean characters are also recognized - which should work by detecting DBCS characters. There must be a straightforward way ... ?
# 6  
Old 11-16-2009
Try...
Code:
awk 'substr($0,1,1) < "\200"' file1

# 7  
Old 11-16-2009
Code:
perl -lne 'print if ord $_ <= 127' file

tyler_durden
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Inserting n characters to beginning of line if match

I would like to insert n number of characters at the beginning of each line that starts with a given character. If possible, I would be most appreciative for a sed or awk solution. Given the data below, I would like to be able to insert either 125 spaces or 125 "-" at the beginning of every line... (6 Replies)
Discussion started by: jvoot
6 Replies

2. UNIX for Beginners Questions & Answers

Removing characters from beginning of multiple files

Hi, I have been searching how to do this but I can't seem to find how to do it. Hopefully someone can help. I have multiplr files, 100's example 12345-zxys.213423.zyz.txt. I want to be able to take all these files and remove the first '12345-' from each of the files. '12345-' these characters... (5 Replies)
Discussion started by: israr75
5 Replies

3. SuSE

Display Chinese and Japanese characters on my SLES console.

Hello, I'm trying to figure out how to display Chinese and Japanese Characters on my SLES 11 Console. Is there any way that I could display those characters on my console? Thank you. (3 Replies)
Discussion started by: pjeedu2247
3 Replies

4. Red Hat

How to display Chinese and Japanese Characters on Rhel 6?

Hello, I'm trying to figure out how to display Chinese and Japanese Characters on my RHEL 6 Console. There is no more "bogl-bterm" for RHEL6, that is not supported anymore. Is there any way that I could display them? Thank you. (2 Replies)
Discussion started by: pjeedu2247
2 Replies

5. Shell Programming and Scripting

Removing one or more blank characters from beginning of a line

Hi, I was trying to remove the blank from beginning of a line. when I try: sed 's/^ +//' filename it does not work but when I try sed 's/^ *//' filename it works But I think the first command should have also replaced any line with one or more blanks. Kindly help me in understanding... (5 Replies)
Discussion started by: babom
5 Replies

6. Shell Programming and Scripting

How to delete all lines with less then 32 characters from a textfile?

I need to delete all lines with less then 32 characters from a textfile. :) (15 Replies)
Discussion started by: anna428
15 Replies

7. Shell Programming and Scripting

Email a File from UNIX which has Japanese characters in it

Hi, I'm trying to email from UNIX, a file which has Japanese characters in it (i,e. in the contents -- not the filename). The file gets emailed, but the Japanese characters do not show up properly when I open the file on Windows in my Outlook mailbox. I searched a lot of forums but still... (4 Replies)
Discussion started by: jainkirti
4 Replies

8. Shell Programming and Scripting

delete zero byte file

Hello I have a requirement where i need to find the zero byte size file in the directory and need to delete that zero byte file. Thanks (2 Replies)
Discussion started by: dsdev_123
2 Replies

9. Shell Programming and Scripting

delete lines from file2 beginning w/file1

I've been searching around here and other places, but can't put this together... I've got a unique list of words in file 1 (one word on each line). I need to delete each line in file2 that begins with the word in file1. I started this way, but want to know how to use file1 words instead... (13 Replies)
Discussion started by: michieka
13 Replies
Login or Register to Ask a Question