Help in separating a multilingual file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Help in separating a multilingual file
# 1  
Old 01-15-2013
Help in separating a multilingual file

Hello,
I have a text file running into around 100 thousand+ lines which has the following rigid structure:
Quote:
Identity Number (always a Numeric Field)
Quote:
English (a whole set of names
Quote:
Data in another language in UTF8 format
Each field is separated by a comma.
Some examples are given below:
Code:
23,Chinttaman Pagare,चिंतमण पगारे
24, Chinttaman Pateel,चिंतामण पाटल
25, Chinttaman Rout,चिंतामण राऊत
26, Chinttaman Yashawante,चिंतामण यशवंत

I would like to extract the data such that all the English words are stored in one file and the other language words in another file. The numbers would be ignored.
I work under windows OS.
A script in AWK or PERL would be of great help.
Many thanks in advance

Last edited by Scrutinizer; 01-23-2013 at 01:17 AM.. Reason: quote tags -> code tags
# 2  
Old 01-15-2013
Harcoded output files to filea and fileb:

Code:
awk -F, '{ print $2 > "filea"; print $3 > "fileb"}' yourfile

This User Gave Thanks to Chubler_XL For This Post:
# 3  
Old 01-15-2013
Many thanks. I was really dumb. Using the comma as a delimiter I could have got the files out. Guess the enormity of the task put me in panic mode.
Many thanks for bringing me down to earth with a simple solution.
# 4  
Old 01-15-2013
can u pls explain this line??
Login or Register to Ask a Question

Previous Thread | Next Thread

8 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

List of multilingual files

hi, I have in a directory a big number of files and their translation in other languages. A typical name of a file is xx_xxxx_EN.html and its translation xx_xxxx_IT.html . I want to extract a 2 column txt file with the names of the files. For example for the english - italian language pair:... (6 Replies)
Discussion started by: corfuitl
6 Replies

2. Shell Programming and Scripting

Need help separating file lines into three classes

Hi folks, What I have are config files with lines that: are blank, start with a "!" or start with char's(or a blank space and then char's) I am using ksh I can display each line by doing: for INDEX in {0..$LENGTH} do echo "${data}" done What I need to do requires I can... (12 Replies)
Discussion started by: Marc G
12 Replies

3. Shell Programming and Scripting

AWK separating a file into an array

Is there a way to have awk put successive records into an array in a bash script? I have files that say things like name :title :salary Bob :Instructor :30,000 Joyce :Instructor :30,000 Patrick :Manager :40,000 What I want to do is seperate this file into an array so that... (8 Replies)
Discussion started by: tgidzak
8 Replies

4. Shell Programming and Scripting

Need help separating a file

Hi all, I have a single text file, Contig3.fasta, that looks like this: >NAME1 ACCTGGTA >NAME2 GGTTGGACA >NAME3 ATTTTGGGCCAnd It has about 100 items like this in it. What I would like to do is copy each item into 100 different text files, and have them named a certain way Output... (4 Replies)
Discussion started by: repiv
4 Replies

5. Shell Programming and Scripting

Separating list of input files (*.file) with a comma in bash script

Hi all, I'm trying to get a bash script working for a program (bowtie) which takes a list of input files (*.fastq) and assembles them to an output file (outfile.sam). All the .fastq files are in one folder in my home directory (~/infiles). The problem is that the 'bowtie' requires that... (7 Replies)
Discussion started by: TuAd
7 Replies

6. Shell Programming and Scripting

Separating delimited file by pattern with exclusion list

I have a file with the contents below jan_t=jan;feb_t=feb;mar_t=mar;year=2010 jan_t=null;feb_t=feb;mar_t=mar;year=2010 jan_t=jan;feb_t=feb;mar_t=mar;year=2010 I want to extract out all the fields values ending with "_t" , however, i want to exclude feb_t and mar_t from the results In... (6 Replies)
Discussion started by: alienated
6 Replies

7. Shell Programming and Scripting

Merging files into a single tab delimited file with a space separating

I have a folder that contains say 50 files in a sequential order: cdf_1.txt cdf_2.txt cdf_3.txt cdf_3.txt . . . cdf_50.txt. I need to merge these files in the same order into a single tab delimited file. I used the following shell script: for x in {1..50}; do cat cdf_${x}.txt >>... (3 Replies)
Discussion started by: Lucky Ali
3 Replies

8. Shell Programming and Scripting

Separating values from a file and putting to a variable

I am writing into a file testfile.txt values like ./XXXXXXCZ1/tprcm10c.bin ./XXXXXXCZ1_HOT/tprcm09c.bin ./XXXXXXCZ_cold/tprcm05c.bin I want to store the values of tprcm*.bin and XXXXXXCZ* in separate variables Can anybody Pls hlp me out with this ... Thanks (2 Replies)
Discussion started by: ultimatix
2 Replies
Login or Register to Ask a Question