parsing data from a big file using keys from another smaller file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting parsing data from a big file using keys from another smaller file
# 1  
Old 04-06-2011
parsing data from a big file using keys from another smaller file

Hi,
I have 2 files
format of file 1 is:
Code:
a1
b2
a2
c2
d1
f3

format of file 2 is (tab delimited):
Code:
 a1 1.2 0.5 0.06 0.7 0.9 1 0.023
a3  0.91 0.007 0.12 0.34 0.45 1 0.7 
a2  1.05 2.3 0.25 1 0.9 0.3 0.091
b1 1 5.4 0.3 9.2 0.3 0.2 0.1
b2 3 5 7 0.9 1 9 0 1
b3 0.001 1 2.3 4.6 8.9 10 0 1 0
c1 0.9 1 2.3 5.7 8.9 9 0 1
c2 1 2.4 5.7 0.13 1.9 2 5 8
c3 5.7 9 10 11 0.2 0.7 0.9
d1 9.0 5 8 4.5 9 0.99 1.3 1 0
d2 2 4.6 7 9 9 10 11 0 1 2.4 0.44
f1 7 8 4.5 6.8 9.21 0 1 8 4 9 10
f3 0 1 2.3 4.0 3.14 0 1 0.005

I want to use the data in file as a key and parse out the correponding values from file 2 into a third file.

such that file 3 is:
Code:
a1 1.2 0.5 0.06 0.7 0.9 1 0.023
b2 3 5 7 0.9 1 9 0 1
a2  1.05 2.3 0.25 1 0.9 0.3 0.091
c2 1 2.4 5.7 0.13 1.9 2 5 8
d1 9.0 5 8 4.5 9 0.99 1.3 1 0
f3 0 1 2.3 4.0 3.14 0 1 0.005

I need to have the same order of the keys similar to the file 1 in file 3.
please let me know the best way to generate the 3rd file either using awk or sed.
LA
# 2  
Old 04-06-2011
Code:
< datafile awk '
BEGIN { FS="\t"
        # get the very first key.
        getline key < "keyfile" }
{
        # If the data's ahead in order, read keys until you catch up
        # but don't read keys past EOF.
        while(key && (key < $1))
                getline key < "keyfile"

        if(key && (key == $1))
                print;
}'

# 3  
Old 04-06-2011
My real data contains 3000 keys. When I implemented the code you sent, I was only able to parse out values for 20 keys only.
LA
# 4  
Old 04-06-2011
It can't be running out of room, it's not storing anything, so it's not related to the quantity of data. I think, either the keys or the data aren't in legographical order, or, the key file contains blank lines which would make it give up instantly.

Could you post the smallest possible sample of data that shows the problem?
# 5  
Old 04-06-2011
what do you mean by legographical order? I don't see any gaps in either files
# 6  
Old 04-06-2011
Sorted.
# 7  
Old 04-06-2011
I can't sort the key file as I need the file 3 to be generated and ordered in the same order. But I could sort the datafile. The main catch is to keep the same order in the output file as it is present in the key file
LA
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extract data according to keys from filename mentioned in file

Hello experts, I want to join a file with files whosE names are mentioned in one of the columns of the same file. File 1 t1,a,b,file number 1 t1,a,c,file number 1 t2,c,d,file number 2 t2,c,e,file number 2 t2,c,f,file number 2 t2,c,g,file number 2 t3,e,f,file number 3 file number 1... (3 Replies)
Discussion started by: ritakadm
3 Replies

2. Shell Programming and Scripting

Parsing data using keys from one file

I have 2 text files where I need to parse data from file 2 using the data from file 1. Below are my sample files File 1 (tab delimited) 257 350 670 845 725 1025 767 820 ... .... .... file 2 (tab delimited) 220..450 TA AB650 ABCED 520..850 GA AB720 ABCDE 700..1100 TC AB820 ABCDE... (2 Replies)
Discussion started by: Lucky Ali
2 Replies

3. Shell Programming and Scripting

parsing characters and number from a big file with brackets

I have a big file with many brackets () in it from which I need to parse number characters and numbers. Below is an example of my file 14 (((A__0:0.02,B__1:0.3)0:0.04,C__0:0.025)2:0.01),(D__0:0.00978,E__2:0.01031)1:0.00362; 15... (1 Reply)
Discussion started by: Lucky Ali
1 Replies

4. Shell Programming and Scripting

Segment a big file into smaller ones

Greeting to all. I have big text file that I would like to segment into many smaller files. Each file should be maximum 10 000 lines. The file is called time.txt. after the execution of the file I would like to have. time_01.txt, time_02, txt, ...,time_n.txt Can anybody help. Br. (2 Replies)
Discussion started by: flash80
2 Replies

5. Shell Programming and Scripting

Sort a big data file

Hello, I have a big data file (160 MB) full of records with pipe(|) delimited those fields. I`m sorting the file on the first field. I'm trying to sort with "sort" command and it brings me 6 minutes. I have tried with some transformation methods in perl but it results "Out of memory". I was... (2 Replies)
Discussion started by: rubber08
2 Replies

6. Shell Programming and Scripting

Helping in parsing subset of text from a big results file

Hi All, I need some help to effectively parse out a subset of results from a big results file. Below is an example of the text file. Each block that I need to parse starts with "reading sequence file 10.codon" (next block starts with another number) and ends with **p-Value(s)**. I have given... (1 Reply)
Discussion started by: Lucky Ali
1 Replies

7. Shell Programming and Scripting

How to cut some data from big file

How to cut data from big file my file around 30 gb I tried "head -50022172 filename > newfile.txt ,and tail -5454283 newfile.txt. It's slowy. afer that I tried sed -n '46467831,50022172p' filename > newfile.txt ,also slow Please recommend me , faster command to cut some data from... (4 Replies)
Discussion started by: almanto
4 Replies

8. Shell Programming and Scripting

perl help to split big verilog file into smaller ones for each module

Hi I have a big verilog file with multiple modules. Each module begin with the code word 'module <module-name>(ports,...)' and end with the 'endmodule' keyword. Could you please suggest the best way to split each of these modules into multiple files? Thank you for the help. Example of... (7 Replies)
Discussion started by: return_user
7 Replies

9. Shell Programming and Scripting

Big data file - sed/grep/awk?

Morning guys. Another day another question. :rolleyes: I am knocking up a script to pull some data from a file. The problem is the file is very big (up to 1 gig in size), so this solution: for results in `grep "^\ ... works, but takes ages (we're talking minutes) to run. The data is held... (8 Replies)
Discussion started by: dlam
8 Replies

10. Shell Programming and Scripting

Parsing the data in a file

Hi, I have file (FILE.tmp) having contents, FILE.tmp ======== filename=menudata records=0000000000037 ldbname=pinsys timestamp=2005/05/14-18:32:33 I want to parse it bring a new file which will look like, filename records ldbname timestamp... (2 Replies)
Discussion started by: Omkumar
2 Replies
Login or Register to Ask a Question