parsing data from a big file using keys from another smaller file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting parsing data from a big file using keys from another smaller file
# 8  
Old 04-06-2011
[edit] Same order as the key file? Hm.... Will there be more than one matching line per key?
# 9  
Old 04-06-2011
Code:
awk -F'\t' 'FNR==NR{k[$1]=NR} NR>FNR{ if($1 in k) {$0=k[$1]"|"$0;print $0;}}' file1.txt file2.txt |sort -t"|" -k1 |cut -d"|" -f2> file3.txt

will do your job.
# 10  
Old 04-06-2011
Use nawk or gawk.
Code:
< data nawk '
BEGIN { FS="\t"
        N=0;
        while(getline key < "keyfile")
        {
                keys[key]=1;
                order[N++]=key;
        }
 }

{       if(keys[$1]) out[$1]=out[$1] $0 "\n";   }

END {   for(M=0; M<N; M++)
        if(out[order[M]])       printf("%s", out[order[M]]);
}'

# 11  
Old 04-06-2011
Code:
awk 'NR==FNR{a[$1]=$0;next}NF==1{print a[$1]}' file2 file1

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extract data according to keys from filename mentioned in file

Hello experts, I want to join a file with files whosE names are mentioned in one of the columns of the same file. File 1 t1,a,b,file number 1 t1,a,c,file number 1 t2,c,d,file number 2 t2,c,e,file number 2 t2,c,f,file number 2 t2,c,g,file number 2 t3,e,f,file number 3 file number 1... (3 Replies)
Discussion started by: ritakadm
3 Replies

2. Shell Programming and Scripting

Parsing data using keys from one file

I have 2 text files where I need to parse data from file 2 using the data from file 1. Below are my sample files File 1 (tab delimited) 257 350 670 845 725 1025 767 820 ... .... .... file 2 (tab delimited) 220..450 TA AB650 ABCED 520..850 GA AB720 ABCDE 700..1100 TC AB820 ABCDE... (2 Replies)
Discussion started by: Lucky Ali
2 Replies

3. Shell Programming and Scripting

parsing characters and number from a big file with brackets

I have a big file with many brackets () in it from which I need to parse number characters and numbers. Below is an example of my file 14 (((A__0:0.02,B__1:0.3)0:0.04,C__0:0.025)2:0.01),(D__0:0.00978,E__2:0.01031)1:0.00362; 15... (1 Reply)
Discussion started by: Lucky Ali
1 Replies

4. Shell Programming and Scripting

Segment a big file into smaller ones

Greeting to all. I have big text file that I would like to segment into many smaller files. Each file should be maximum 10 000 lines. The file is called time.txt. after the execution of the file I would like to have. time_01.txt, time_02, txt, ...,time_n.txt Can anybody help. Br. (2 Replies)
Discussion started by: flash80
2 Replies

5. Shell Programming and Scripting

Sort a big data file

Hello, I have a big data file (160 MB) full of records with pipe(|) delimited those fields. I`m sorting the file on the first field. I'm trying to sort with "sort" command and it brings me 6 minutes. I have tried with some transformation methods in perl but it results "Out of memory". I was... (2 Replies)
Discussion started by: rubber08
2 Replies

6. Shell Programming and Scripting

Helping in parsing subset of text from a big results file

Hi All, I need some help to effectively parse out a subset of results from a big results file. Below is an example of the text file. Each block that I need to parse starts with "reading sequence file 10.codon" (next block starts with another number) and ends with **p-Value(s)**. I have given... (1 Reply)
Discussion started by: Lucky Ali
1 Replies

7. Shell Programming and Scripting

How to cut some data from big file

How to cut data from big file my file around 30 gb I tried "head -50022172 filename > newfile.txt ,and tail -5454283 newfile.txt. It's slowy. afer that I tried sed -n '46467831,50022172p' filename > newfile.txt ,also slow Please recommend me , faster command to cut some data from... (4 Replies)
Discussion started by: almanto
4 Replies

8. Shell Programming and Scripting

perl help to split big verilog file into smaller ones for each module

Hi I have a big verilog file with multiple modules. Each module begin with the code word 'module <module-name>(ports,...)' and end with the 'endmodule' keyword. Could you please suggest the best way to split each of these modules into multiple files? Thank you for the help. Example of... (7 Replies)
Discussion started by: return_user
7 Replies

9. Shell Programming and Scripting

Big data file - sed/grep/awk?

Morning guys. Another day another question. :rolleyes: I am knocking up a script to pull some data from a file. The problem is the file is very big (up to 1 gig in size), so this solution: for results in `grep "^\ ... works, but takes ages (we're talking minutes) to run. The data is held... (8 Replies)
Discussion started by: dlam
8 Replies

10. Shell Programming and Scripting

Parsing the data in a file

Hi, I have file (FILE.tmp) having contents, FILE.tmp ======== filename=menudata records=0000000000037 ldbname=pinsys timestamp=2005/05/14-18:32:33 I want to parse it bring a new file which will look like, filename records ldbname timestamp... (2 Replies)
Discussion started by: Omkumar
2 Replies
Login or Register to Ask a Question