Merge with common column


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Merge with common column
# 1  
Old 11-08-2012
Merge with common column

hi i have two files and i wanted to join them using common column. try to do this using "join" command but that did not help.

File 1:

Code:
123             9a.vcf               hy92.vcf          hy90.vcf
Index    Ref     Alt     Ref     Alt     Ref     Alt
315       14       0       7       4      17       2
321        5       9       3       8       7      11
327       13       0       9       3      16       1
336       13       1       8       4      18       1
445       14       0       9       3      18       1

file 2:
Code:
Gene     Position         Reference Base         Alternate Base          Alternate count         Alternate count         Snp format      Sequence
lcl|utg7180000001423     315      A      G      3                [A/G]   CTAGAATCAAAGCAAGAATACACTCTTTTTTTTGGAAAAGAATATCTCATGTTTGCTCTTAAACTCCAAAACATATATCTACTTCAAAGTAGTGTTTTCCTGAGAGAAAGCACTATCGATAAGTTATGTCCCAACTCCAAATTATTACATCAAAGTAGAAACAAGATGACATATATATATCAACCGTATATTCTGCACATGATGTG
lcl|utg7180000001423     321      G      A/A,C  41       39,2    [-/-]   CTAGAATCAAAGCAAGAATACACTCTTTTTTTTGGAAAAGAATATCTCATGTTTGCTCTTAAACTCCAAAACATATATCTACTTCAAAGTAGTGTTTTCCTGAGAGAAAGCACTATCGATAAGTTATGTCCCAACTCCAAATTATTACATCAAAGTAGAAACAAGATGACATATATATATCAACCGTATATTCTGCACATGATGTG
lcl|utg7180000001423     327      A      C      4                [A/C]   CTAGAATCACTAGAATCAAAGCAAGAATACACTCTTTTTTTTGGAAAAGAATATCTCATGTTTGCTCTTAAACTCCAAAACATATATCTACTTCAAAGTAGTGTTTTCCTGAGAGAAAGCACTATCGATAAGTTATGTCCCAACTCCAAATTATTACATCAAAGTAGAAACAAGATGACATATATATATCAACCGTATATTCTGCACATGATGTG

i wanted to match first column in 1st file with 2nd column in second file and get union of both files i.e i need all the fields from file 1 as well as file 2.

---------- Post updated at 12:44 AM ---------- Previous update was at 12:21 AM ----------

an update is that the following code is able to merge the columns but its taking off the headings. i need to retain them in the final file..

Code:
#!/bin/awk -f
  # store the first file, indexed by col2
  NR==FNR {f1[$1] = $0; next}
  # output only if file1 contains file2's col2
  ($2 in f1) {print $0,"\t",f1[$2]}


Last edited by Scrutinizer; 11-08-2012 at 01:44 AM.. Reason: Changed icode to code
# 2  
Old 11-08-2012
If you are using join then the files need to be in sorted order and the headers would need to be removed... What should the output look like (sample)?


--edit--
Try:
Code:
#!/bin/awk -f
  # store the first file, indexed by col2
  NR==FNR {f1[$1] = $0; next}
  # output only if file1 contains file2's col2  
  $2 in f1{$0=$0 "\t" f1[$2]}
  {print}


Last edited by Scrutinizer; 11-08-2012 at 01:57 AM..
# 3  
Old 11-08-2012
Thank you but this is giving out the headings for second file only. i want those for first file also. First file contains headings in two rows. Also this script is printing out common column twice.. it need to be printed only once.
# 4  
Old 11-08-2012
Try

Code:
awk 'NR<3{print}
  NR==FNR {f1[$1]=$0;next}
  $2 in f1{$0=$0 "\t" f1[$2]}
  {print}' file1 file2

# 5  
Old 11-08-2012
Thankyou , pamu

but its printing the headers in three lines on one file and no headers on other file. see the output below.

Code:
123             9a.vcf               hy92.vcf          hy90.vcf
Index    Ref     Alt     Ref     Alt     Ref     Alt
Gene     Position         Reference Base         Alternate Base          Alternate count         Alternate count         Snp format      Sequence
lcl|utg7180000001423     315      A      G      3                [A/G]   CTAGAATCAAAGCAAGAATACACTCTTTTTTTTGGAAAAGAATATCTCATGTTTGCTCTTAAACTCCAAAACATATATCTACTTCAAAGTAGTGTTTTCCTGAGAGAAAGCACTATCGATAAGTTATGTCCCAACTCCAAATTATTACATCAAAGTAGAAACAAGATGACATATATATATCAACCGTATATTCTGCACATGATGTG      315       14       0       7       4      17       2
lcl|utg7180000001423     321      G      A/A,C  41       39,2    [-/-]   CTAGAATCAAAGCAAGAATACACTCTTTTTTTTGGAAAAGAATATCTCATGTTTGCTCTTAAACTCCAAAACATATATCTACTTCAAAGTAGTGTTTTCCTGAGAGAAAGCACTATCGATAAGTTATGTCCCAACTCCAAATTATTACATCAAAGTAGAAACAAGATGACATATATATATCAACCGTATATTCTGCACATGATGTG      321        5       9       3       8       7      11
lcl|utg7180000001423     327      A      C      4                [A/C]   CTAGAATCACTAGAATCAAAGCAAGAATACACTCTTTTTTTTGGAAAAGAATATCTCATGTTTGCTCTTAAACTCCAAAACATATATCTACTTCAAAGTAGTGTTTTCCTGAGAGAAAGCACTATCGATAAGTTATGTCCCAACTCCAAATTATTACATCAAAGTAGAAACAAGATGACATATATATATCAACCGTATATTCTGCACATGATGTG     327       13       0       9       3      16       1

# 6  
Old 11-08-2012
Quote:
Originally Posted by empyrean
Thankyou , pamu

but its printing the headers in three lines on one file and no headers on other file. see the output below.
try

Code:
awk 'NR<3{s=s?s" "$0:$0}
  NR==FNR {f1[$1]=$0;next}
  FNR==1 && s{print $0,s;s=""}
  $2 in f1{print $0"\t"f1[$2]}' file1 file2

# 7  
Old 11-08-2012
Thanks again. But the delimiters are not aligning i mean the heading is not in tab delimited format as the results are.. another thing is that the common column is repeating twice. i want it to be in second column but shoudnt repeat after sequence info.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Merge multiple files with common header

Hi all, Say i have multiple files x1 x2 x3 x4, all with common header (date, time, year, age),, How can I merge them to one singe file "X" in shell scripting Thanks for your suggestions. (2 Replies)
Discussion started by: msarguru
2 Replies

2. Shell Programming and Scripting

Seperated by columns, merge in a file, sort them on common column

Hi All, I have 4 files in below format. I took them as an example. File 1: Cut from position 1-4 then 6-7 then 8-14 then rest left and make them as columns in one new file. Inserting character H to the initial of all line like HCTOT. CTOT 456787897 Low fever CTOR 556712345 High fever... (2 Replies)
Discussion started by: Mannu2525
2 Replies

3. Shell Programming and Scripting

Merge files based on both common and uncommon rows

Hi, I have two files A (2190 rows) and file B (1100 rows). I want to merge the contents of two files based on common field, also I need the unmatched rows from file A file A: ABC XYZ PQR file B: >LMN|chr1:11000-12456: >ABC|chr15:176578-187678: >PQR|chr3:14567-15866: output... (3 Replies)
Discussion started by: Diya123
3 Replies

4. Shell Programming and Scripting

Count and merge using common column

I have the following records from multiple files. 415 A G 415 A G 415 A T 415 A . 415 A . 421 G A 421 G A,C 421 G A 421 G A 421 G A,C 421 G . 427 A C 427 A ... (3 Replies)
Discussion started by: empyrean
3 Replies

5. Shell Programming and Scripting

file merge based on common columns

I have two files 1.txt 34, ABC, 7, 8, 0.9 35, CDE, 6.5, -2, 0.01 2.txt 34, ABC, 9, 6, -1.9 35, CDE, 8.5, -2.3, 5.01 So in both files common columns are 1 and 2 so final o/p should look like 34, ABC, 7, 8, 0.9, 9, 6, -1.9 35, CDE, 6.5, -2, 0.01, 8.5, -2.3, 5.01 I tried using... (3 Replies)
Discussion started by: manas_ranjan
3 Replies

6. UNIX for Dummies Questions & Answers

Merge rows with common column

Dear all I have big file with two columns A_AA960715 GO:0006952 A_AA960715 GO:0008152 A_AA960715 GO:0016491 A_AA960715 GO:0007165 A_AA960715 GO:0005618 A_AA960716 GO:0006952 A_AA960716 GO:0005618 A_AA960716... (15 Replies)
Discussion started by: AAWT
15 Replies

7. UNIX for Dummies Questions & Answers

Writing a loop to merge multiple files by common column

I have 100 data files labelled 250.1.txt through 250.100.txt. The second column of the data files partially match (there is about %90 overlap). Each data file has 4 columns. I want the merge all these text files by the matching values in the second column. In the output, the first column should... (1 Reply)
Discussion started by: evelibertine
1 Replies

8. Shell Programming and Scripting

"Join" or "Merge" more than 2 files into single output based on common key (column)

Hi All, I have working (Perl) code to combine 2 input files into a single output file using the join function that works to a point, but has the following limitations: 1. I am restrained to 2 input files only. 2. Only the "matched" fields are written out to the "matched" output file and... (1 Reply)
Discussion started by: Katabatic
1 Replies

9. Shell Programming and Scripting

merge rows based on a common column

Hi guys, Please guide me if you have a solution to this problem. I have tried paste -s but it's not giving the desired output. I have a file with the following content- A123 box1 B345 bat2 C431 my_id A123 service C431 box1 A123 my_id I need two different outputs- OUTPUT1 A123... (6 Replies)
Discussion started by: smriti_shridhar
6 Replies

10. Shell Programming and Scripting

merge based on common, awk help

All, $ cat x.txt z 11 az x 12 ax y 13 ay $ cat y.txt ay TT ax NN Output required: y 13 ay TT x 12 ax NN (3 Replies)
Discussion started by: jkl_jkl
3 Replies
Login or Register to Ask a Question