Lookup two values per line (from a second file) and write the smaller value to another file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Lookup two values per line (from a second file) and write the smaller value to another file
# 1  
Old 04-28-2010
Lookup two values per line (from a second file) and write the smaller value to another file

Hello Unix Gurus,
Please let me know if this is hard to understand and I apologize for my inability to explain better.

I have a file "Foo" with the following structure
Code:
CHR_A         BP_A                         SNP_A  CHR_B         BP_B                         SNP_B           R2
     1      2025239                    rs10910029      1      2025544                    rs10910030     0.997504
     1      2025239                    rs10910029      1      2025659                    rs10752741     0.997504
     1      2025239                    rs10910029      1      2026123                    rs10797413     0.997504
     1      2025239                    rs10910029      1      2030758                     rs4648807     0.905058
     1      2025239                    rs10910029      1      2032403                    rs11584491     0.883846
     1      2025239                    rs10910029      1      2035742                     rs3128337     0.860246
     1      2025239                    rs10910029      1      2037407                     rs3107145     0.844859

I have another file "Bar" with two columns
Code:
SNP       p_value
rs1629826       0.0074785032112008
rs10910030     0.00747871295310032
rs5952098       0.00747983098835803
rs10797413      0.00747988817134826
rs12885929      0.00748196549532576
rs11584491      0.00748416021775555
rs10910029       0.00748489518549456
rs6739969       0.00748668296365185

What I want to accomplish is:
(1) Lookup the p-value of SNP_A and SNP_B in file "Bar" and add these values in adjacent columns to SNP_A and SNP_B

(2) Write the SNP name with the lower p-value per line to another file.

Output would be for (1)
Code:
CHR_A   BP_A	SNP_A  			CHR_B   BP_B      SNP_B           R2			p-SNP_A				p-SNP_B
     1 2025239	rs10910029      1      2025544    rs10910030    0.997504	0.00748489518549456	0.00747871295310032
     1 2025239	rs10910029      1      2025659    rs10752741    0.997504	0.00748489518549456	
     1 2025239	rs10910029      1      2026123    rs10797413    0.997504	0.00748489518549456 0.00747988817134826
     1 2025239	rs10910029      1      2030758    rs4648807     0.905058	0.00748489518549456	
     1 2025239	rs10910029      1      2032403    rs11584491    0.883846	0.00748489518549456	0.00748416021775555
     1 2025239	rs10910029      1      2035742    rs3128337     0.860246	0.00748489518549456	
     1 2025239	rs10910029      1      2037407    rs3107145     0.844859	0.00748489518549456

Output for (2)
Code:
Results.txt
rs10910030	 
rs10910029
rs10797413
rs10910029
rs11584491
rs10910029

Any help is appreciated. I have learned to use a little bit of awk from this site and hence it would be great if the solution is in awk.

Thank you for your time & effort
~GH

Last edited by genehunter; 04-28-2010 at 03:35 AM..
# 2  
Old 04-28-2010
Can you show your desired output too?
# 3  
Old 04-28-2010
Try...
Code:
$ cat Foo
CHR_A         BP_A                         SNP_A  CHR_B         BP_B                         SNP_B           R2
     1      2025239                    rs10910029      1      2025544                    rs10910030     0.997504
     1      2025239                    rs10910029      1      2025659                    rs10752741     0.997504
     1      2025239                    rs10910029      1      2026123                    rs10797413     0.997504
     1      2025239                    rs10910029      1      2030758                     rs4648807     0.905058
     1      2025239                    rs10910029      1      2032403                    rs11584491     0.883846
     1      2025239                    rs10910029      1      2035742                     rs3128337     0.860246
     1      2025239                    rs10910029      1      2037407                     rs3107145     0.844859
$ cat Bar
SNP       p_value
rs1629826       0.0074785032112008
rs10910030     0.00747871295310032
rs5952098       0.00747983098835803
rs10797413      0.00747988817134826
rs12885929      0.00748196549532576
rs11584491      0.00748416021775555
rs10910029       0.00748489518549456
rs6739969       0.00748668296365185
$ awk 'FNR==NR{
          p[$1] = $2
          next
     }
     {
          if (FNR==1) {
               a = "p-SNP_A"
               b = "p-SNP_B"
               c = "Results.txt"
          } else {
               a = p[$3]
               b = p[$6]
               if (p[$3]<=p[$6] || !p[$6])
                    c = $3
               else
                    c = $6
          }
          printf("%6s%8s%12s%7s%13s%14s%12s%27s%27s\n",$1,$2,$3,$4,$5,$6,$7,a,b) > "outfile1"
          print c > "outfile2"
     }' Bar Foo
$ head outfile[12]
==> outfile1 <==
 CHR_A    BP_A       SNP_A  CHR_B         BP_B         SNP_B          R2                    p-SNP_A                    p-SNP_B
     1 2025239  rs10910029      1      2025544    rs10910030    0.997504        0.00748489518549456        0.00747871295310032
     1 2025239  rs10910029      1      2025659    rs10752741    0.997504        0.00748489518549456
     1 2025239  rs10910029      1      2026123    rs10797413    0.997504        0.00748489518549456        0.00747988817134826
     1 2025239  rs10910029      1      2030758     rs4648807    0.905058        0.00748489518549456
     1 2025239  rs10910029      1      2032403    rs11584491    0.883846        0.00748489518549456        0.00748416021775555
     1 2025239  rs10910029      1      2035742     rs3128337    0.860246        0.00748489518549456
     1 2025239  rs10910029      1      2037407     rs3107145    0.844859        0.00748489518549456

==> outfile2 <==
Results.txt
rs10910030
rs10910029
rs10797413
rs10910029
rs11584491
rs10910029
rs10910029

# 4  
Old 04-28-2010
Dear Ygor,
That worked like a charm.
Thank you so much
~GH

Hi Sorry, there is a small problem in formatting of the file. Some of the values in adjacent cols are joined together in outputfile1
Code:
     1222326556  rs12402838      1    222339565     rs4653930    0.743775                         NA           0.93950500338575
     1222326556AFFX-SNP_11693801__rs12402838      1    222326695    rs12403374    0.974822                         NA          0.999974829064874
     1222326556AFFX-SNP_11693801__rs12402838      1    222327447     rs6604887     0.97931                         NA                          1

Can you please explain the following line in your code, especially the numerical values ?
Code:
printf("%6s%8s%12s%7s%13s%14s%12s%27s%27s\n",$1,$2,$3,$4,$5,$6,$7,a,b)

I figured it out that the spacing per string was less (i guess). So instead i added tabs. Please see below. But I am sure there is a more elegant way of tab delimiting it.
Code:
 printf("%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t\n",$1,$2,$3,$4,$5,$6,$7,a,b) >


Last edited by genehunter; 04-28-2010 at 03:55 PM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Splitting a text file into smaller files with awk, how to create a different name for each new file

Hello, I have some large text files that look like, putrescine Mrv1583 01041713302D 6 5 0 0 0 0 999 V2000 2.0928 -0.2063 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 5.6650 0.2063 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 3.5217 ... (3 Replies)
Discussion started by: LMHmedchem
3 Replies

2. Shell Programming and Scripting

awk to lookup stored variable in file and print matching line

The bash bash below extracts the oldest folder from a directory and stores it in filename That result will match a line in bold in input. In the matching line there is an_xxx digit in italics that (once the leading zero is removed) will match a line in link. That is the lint to print in output.... (2 Replies)
Discussion started by: cmccabe
2 Replies

3. Shell Programming and Scripting

How can I write variables to same line of a file?

I am trying to keep variables in a file. if I have all variables at the same time, I can write them all like below. echo $var1","$var2","$var3 But, these variables are being calculated at different times then they are lost so I want to keep them in a file seperated by "," . echo... (5 Replies)
Discussion started by: snr_silencer
5 Replies

4. Shell Programming and Scripting

Lookup field values in two fixed format file in UNIX - not working

I have 2 fixed length files input#1 & input#2. I want to match the rows based on the value in position 37-50 in both files (pos 37-50 will have same value in both files). If any matching record is found then cut the value against company code & Invoice number from input file #1 (position 99 until... (3 Replies)
Discussion started by: Lingaraju
3 Replies

5. UNIX for Dummies Questions & Answers

Count the lines with the same values in a column and write the output to a file

Hey everyone! I have a tab delimited data set which I want to create an output contained the calculation of number of those lines with a certain value in 2nd and 3rd column. my input file is like this: ID1 1 10M AAATTTCCGG ID2 5 4M ACGT ID3 5 8M ACCTTGGA ID4 5 ... (7 Replies)
Discussion started by: @man
7 Replies

6. Shell Programming and Scripting

how read specific line in a file and write it in a new text file?

I have list of files in a directory 'dir'. Each file is of type HTML. I need to read each file and get the string which starts with 'http' and write them in a new text file. How can i do this shell scripting? file1.html <head> <url>http://www.google.com</url> </head> file2.html <head>... (6 Replies)
Discussion started by: vel4ever
6 Replies

7. Shell Programming and Scripting

parsing data from a big file using keys from another smaller file

Hi, I have 2 files format of file 1 is: a1 b2 a2 c2 d1 f3 format of file 2 is (tab delimited): a1 1.2 0.5 0.06 0.7 0.9 1 0.023 a3 0.91 0.007 0.12 0.34 0.45 1 0.7 a2 1.05 2.3 0.25 1 0.9 0.3 0.091 b1 1 5.4 0.3 9.2 0.3 0.2 0.1 b2 3 5 7 0.9 1 9 0 1 b3 0.001 1 2.3 4.6 8.9 10 0 1 0... (10 Replies)
Discussion started by: Lucky Ali
10 Replies

8. Shell Programming and Scripting

Write in a file with pipe also in same line

hi, i want to write in a file the output of one command and pile also the same output like ls -lrt > some_file | wc -l (9 Replies)
Discussion started by: narang.mohit
9 Replies

9. UNIX for Advanced & Expert Users

Clueless about how to lookup and reverse lookup IP addresses under a file!!.pls help

Write a quick shell snippet to find all of the IPV4 IP addresses in any and all of the files under /var/lib/output/*, ignoring whatever else may be in those files. Perform a reverse lookup on each, and format the output neatly, like "IP=192.168.0.1, ... (0 Replies)
Discussion started by: choco4202002
0 Replies

10. Shell Programming and Scripting

Re-write first line of a file before printing

Morning All, Quite a simple one this, I hope. What I want to do is to re-write the first line of a file before it's sent to print. The line will be blank initially, and I want to insert some text. The operation can either be done on the file itself (modifying the file on disk), OR in a... (2 Replies)
Discussion started by: alexop
2 Replies
Login or Register to Ask a Question