script to separate bilingual text file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting script to separate bilingual text file
# 8  
Old 04-21-2010
Depending on the original text, you may want to create the two output files in separate processes.
In one process drop only the obvious English characters, and in the second drop only the obvious Assamese.
# 9  
Old 04-21-2010
Quote:
Originally Posted by alister
Code:
awk '/\xe0/ {print > "assamese"; next} {print > "english"}' data

Regards,
Alister
Thank u alisterSmilie it is working.Smilie

but with some blank line. For small size text file it is OK but for large text file if text of one language[suppose English ] is very dense then in the newly created file other language [suppose Assamese] file, there are same[ i.e dense English text nos ] number of blank lines are printing.
# 10  
Old 04-21-2010
Hmmm. In case I misunderstood the problem, I assumed that each newline terminated string was one language or the other. If sub-line-level granularity is desired, my awk solution is unsuitable.

Regards,
Alister

---------- Post updated at 09:45 AM ---------- Previous update was at 09:42 AM ----------

Quote:
Originally Posted by wildhorse
Thank u alisterSmilie it is working.Smilie

but with some blank line.
You're very welcome. Yeah, that code will always interpret blank lines as English. If you want to discard them, perhaps you can use:
Code:
awk '!NF {next} /\xe0/ {print > "assamese"; next} {print > "english"}' data


# 11  
Old 04-21-2010
alister's code, as I read it, will write a full line of mixed text to single file, never splitting words by language.

The regex tests for the existence of \xe0, which will always be true, based on the example. In order for awk to work correctly, you have to be in a locale that supports both languages, and test each character for assamese-ness or ASCII-ness. Because we are splitting on a per character/word basis.
# 12  
Old 04-21-2010
Correct. It is a line-based solution. For some reason, I jumped to the conclusion that that's what was needed. Perhaps I flashed back to https://www.unix.com/shell-programmin...xcel-file.html :P
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Script to separate file

Hi, Could anyone help me with this please. Input file -- ant 1 2 3 4 2 3 4 56 7 dog 8 9 56 ant 2 3 4 5 cvh 6 7 8 ant 1 3 45 78 0 - Would like to split the file as soon as it encounters the word "ant" very first time. First Output file-- ant 1 2 3 4 2 3 4 56 7 dog 8 9 56 ... (2 Replies)
Discussion started by: Indra2011
2 Replies

2. UNIX for Beginners Questions & Answers

Ls to text file on separate lines

hi, I'm trying to print out the contents of a folder into a .txt file. The code I'm trying amongst variations is: ls -1 > filenames.txt but it prints them all on the same line ie. image102.bmpimage103.bmpimage104.bmpimage105.bmpimage106.bmp how can I change this? Please... (2 Replies)
Discussion started by: newbie100
2 Replies

3. Programming

Read text from file and print each character in separate line

performing this code to read from file and print each character in separate line works well with ASCII encoded text void preprocess_file (FILE *fp) { int cc; for (;;) { cc = getc (fp); if (cc == EOF) break; printf ("%c\n", cc); } } int main(int... (1 Reply)
Discussion started by: khaled79
1 Replies

4. Shell Programming and Scripting

How to grep a log file for words listed in separate text file?

Hello, I want to grep a log ("server.log") for words in a separate file ("white-list.txt") and generate a separate log file containing each line that uses a word from the "white-list.txt" file. Putting that in bullet points: Search through "server.log" for lines that contain any word... (15 Replies)
Discussion started by: nbsparks
15 Replies

5. Shell Programming and Scripting

Separate Text File into Two Lists Using Python

Hello, I have a pretty simple question, but I am new to Python and am trying to write a simple program. Put simply, I want to take a text file that looks like this: 11111 22222 33333 44444 55555 66666 77777 88888 and produce two lists, one containing the contents of the left column, one the... (0 Replies)
Discussion started by: Tyler_92
0 Replies

6. Shell Programming and Scripting

awk print header as text from separate file with getline

I would like to print the output beginning with a header from a seperate file like this: awk 'BEGIN{FS="_";print ((getline < "header.txt")>0)} { if (! ($0 ~ /EL/ ) print }" input.txtWhat am i doing wrong? (4 Replies)
Discussion started by: sdf
4 Replies

7. UNIX for Advanced & Expert Users

shell script to send separate mails to different users from a text file

Hi Friends, Could you guys help me out of this problem... I need to send an email to all the users and the email has to be picked from the text file. text file contains the no. of records like: Code: giridhar 224285 847333 giridhar276@gmail.com ramana 84849 33884... (0 Replies)
Discussion started by: giridhar276
0 Replies

8. Shell Programming and Scripting

Splitting text file into 2 separate files ??

Hi All, I am new to this forumn as well to the UNIX, I have basic knowledge of UNIX which I studied some years ago, now I have to do some shell scripting to load data into Oracle database using sqlldr utility, whcih I am able to do. I have a requirement where I need to do following operation. I... (10 Replies)
Discussion started by: shekharjchandra
10 Replies

9. Shell Programming and Scripting

Separate lines from text file

I have a text file with lot of rows like.. Action & Adventure|2012: Supernova NR|2009-11-01 00:01:00|2010-05-01 23:59:00|Active|3 Action & Adventure|50 Dead Men Walking|2010-01-05 00:01:00|2010-06-30 23:59:00|Active|3 Action & Adventure|Afterwards|2009-11-26 00:01:00|2010-03-26... (3 Replies)
Discussion started by: ramse8pc
3 Replies

10. Shell Programming and Scripting

Separate a portion of text file into another file

Hi, I have my input as follows : I have given two entries- From system Mon Aug 1 23:52:47 2005 Source !100000006!: Impact !100000005!: High Status ! 7!: New Last Name+!100000001!: First Name+ !100000003!: ... (4 Replies)
Discussion started by: srikanth_ksv
4 Replies
Login or Register to Ask a Question