Sponsored Content
Top Forums Shell Programming and Scripting Paste columns based on common column: multiple files Post 303009459 by genome on Thursday 14th of December 2017 04:16:14 PM
Old 12-14-2017
Paste columns based on common column: multiple files

Hi all,

I've multiple files. In this case 5. Space separated columns. Each file has 12 columns. Each file has 300-400K lines.
I want to get the output such that if a value in column 2 is present in all the files then get all the columns of that value and print it side by side.

Desired output must have 60 columns.

File 1
head HGWAS1/merged_info_CHR1.info

Code:
snp_id rs_id position a0 a1 exp_freq_a1 info certainty type info_type0 concord_type0 r2_type0
--- 1:10235:T:TA 10235 T TA 0.001 0.157 0.998 0 -1 -1 -1
--- rs145072688:10352:T:TA 10352 T TA 0.436 0.435 0.646 0 -1 -1 -1
--- 1:10505:A:T 10505 A T 0.000 0.095 1.000 0 -1 -1 -1
--- 1:10506:C:G 10506 C G 0.000 0.095 1.000 0 -1 -1 -1
--- 1:10511:G:A 10511 G A 0.000 0.001 1.000 0 -1 -1 -1
--- 1:10539:C:A 10539 C A 0.000 0.017 1.000 0 -1 -1 -1
--- 1:10542:C:T 10542 C T 0.000 0.211 1.000 0 -1 -1 -1
--- 1:10579:C:A 10579 C A 0.000 0.155 1.000 0 -1 -1 -1

File 2
head HGWAS2/merged_info_CHR1.info
Code:
snp_id rs_id position a0 a1 exp_freq_a1 info certainty type info_type0 concord_type0 r2_type0
--- 1:10177:A:AC 10177 A AC 0.414 0.473 0.670 0 -1 -1 -1
--- 1:10235:T:TA 10235 T TA 0.001 0.141 0.999 0 -1 -1 -1
--- rs145072688:10352:T:TA 10352 T TA 0.427 0.488 0.673 0 -1 -1 -1
--- 1:10505:A:T 10505 A T 0.000 0.045 1.000 0 -1 -1 -1
--- 1:10506:C:G 10506 C G 0.000 0.045 1.000 0 -1 -1 -1
--- 1:10511:G:A 10511 G A 0.000 0.020 1.000 0 -1 -1 -1
--- 1:10539:C:A 10539 C A 0.001 0.426 0.999 0 -1 -1 -1
--- 1:10579:C:A 10579 C A 0.000 0.003 1.000 0 -1 -1 -1

File 3
head HGWAS3/merged_info_CHR1.info
Code:
snp_id rs_id position a0 a1 exp_freq_a1 info certainty type info_type0 concord_type0 r2_type0
--- 1:10177:A:AC 10177 A AC 0.434 0.522 0.691 0 -1 -1 -1
--- 1:10235:T:TA 10235 T TA 0.000 0.122 0.999 0 -1 -1 -1
--- rs145072688:10352:T:TA 10352 T TA 0.421 0.526 0.693 0 -1 -1 -1
--- 1:10505:A:T 10505 A T 0.000 0.132 0.999 0 -1 -1 -1
--- 1:10506:C:G 10506 C G 0.000 0.132 0.999 0 -1 -1 -1
--- 1:10539:C:A 10539 C A 0.001 0.294 0.999 0 -1 -1 -1
--- 1:10542:C:T 10542 C T 0.000 0.001 1.000 0 -1 -1 -1
--- 1:10579:C:A 10579 C A 0.000 0.081 1.000 0 -1 -1 -1

File 4
head HGWAS4/merged_info_CHR1.info
Code:
snp_id rs_id position a0 a1 exp_freq_a1 info certainty type info_type0 concord_type0 r2_type0
--- 1:10177:A:AC 10177 A AC 0.418 0.539 0.700 0 -1 -1 -1
--- 1:10235:T:TA 10235 T TA 0.001 0.180 0.998 0 -1 -1 -1
--- rs145072688:10352:T:TA 10352 T TA 0.406 0.528 0.695 0 -1 -1 -1
--- 1:10505:A:T 10505 A T 0.000 0.063 1.000 0 -1 -1 -1
--- 1:10506:C:G 10506 C G 0.000 0.063 1.000 0 -1 -1 -1
--- 1:10511:G:A 10511 G A 0.000 0.015 1.000 0 -1 -1 -1
--- 1:10539:C:A 10539 C A 0.001 0.079 0.999 0 -1 -1 -1
--- 1:10542:C:T 10542 C T 0.000 0.022 1.000 0 -1 -1 -1
--- 1:10579:C:A 10579 C A 0.000 0.007 1.000 0 -1 -1 -1

File 5
head HGWAS6/merged_info_CHR1.info
Code:
snp_id rs_id position a0 a1 exp_freq_a1 info certainty type info_type0 concord_type0 r2_type0
--- 1:10177:A:AC 10177 A AC 0.406 0.512 0.695 0 -1 -1 -1
--- 1:10235:T:TA 10235 T TA 0.000 0.115 0.999 0 -1 -1 -1
--- rs145072688:10352:T:TA 10352 T TA 0.407 0.522 0.695 0 -1 -1 -1
--- 1:10505:A:T 10505 A T 0.000 0.029 1.000 0 -1 -1 -1
--- 1:10506:C:G 10506 C G 0.000 0.029 1.000 0 -1 -1 -1
--- 1:10511:G:A 10511 G A 0.000 0.759 1.000 0 -1 -1 -1
--- 1:10539:C:A 10539 C A 0.001 0.205 0.999 0 -1 -1 -1
--- 1:10542:C:T 10542 C T 0.000 0.012 1.000 0 -1 -1 -1

Desired output: Order of the column is important from each file.
Code:
snp_id rs_id position a0 a1 exp_freq_a1 info certainty type info_type0 concord_type0 r2_type0 snp_id rs_id position a0 a1 exp_freq_a1 info certainty type info_type0 concord_type0 r2_type0 snp_id rs_id position a0 a1 exp_freq_a1 info certainty type info_type0 concord_type0 r2_type0 snp_id rs_id position a0 a1 exp_freq_a1 info certainty type info_type0 concord_type0 r2_type0 snp_id rs_id position a0 a1 exp_freq_a1 info certainty type info_type0 concord_type0 r2_type0 
--- 1:10235:T:TA 10235 T TA 0.001 0.157 0.998 0 -1 -1 -1 --- 1:10235:T:TA 10235 T TA 0.001 0.141 0.999 0 -1 -1 -1 --- 1:10235:T:TA 10235 T TA 0.000 0.122 0.999 0 -1 -1 -1 --- 1:10235:T:TA 10235 T TA 0.001 0.180 0.998 0 -1 -1 -1 --- 1:10235:T:TA 10235 T TA 0.000 0.115 0.999 0 -1 -1 -1
--- rs145072688:10352:T:TA 10352 T TA 0.436 0.435 0.646 0 -1 -1 -1 --- rs145072688:10352:T:TA 10352 T TA 0.427 0.488 0.673 0 -1 -1 -1 --- rs145072688:10352:T:TA 10352 T TA 0.421 0.526 0.693 0 -1 -1 -1 --- rs145072688:10352:T:TA 10352 T TA 0.406 0.528 0.695 0 -1 -1 -1 --- rs145072688:10352:T:TA 10352 T TA 0.407 0.522 0.695 0 -1 -1 -1
--- 1:10505:A:T 10505 A T ---similarly
--- 1:10506:C:G 10506 C G  --similary
--- 1:10539:C:A 10539 C A ---similarly

What I've done:
I used join, then I end up with 24 columns. But Column two has values present in all the files. Next, I can grep it in all files and am losing out there. I am failing to put the grepped output in columnar format. Apparently not a good way to aim for this problem Smilie


Moderator's Comments:
Mod Comment Please use CODE tags correctly as required by forum rules!

Last edited by RudiC; 12-15-2017 at 04:07 AM.. Reason: Changed QUOTE to CODE tags.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to convert 2 column data into multiple columns based on a keyword in a row??

Hi Friends I have the following input data in 2 columns. SNo 1 I1 Value I2 Value I3 Value SNo 2 I4 Value I5 Value I6 Value I7 Value SNo 3 I8 Value I9 Value ............... ................ SNo N (1 Reply)
Discussion started by: ks_reddy
1 Replies

2. Shell Programming and Scripting

sum multiple columns based on column value

i have a file - it will be in sorted order on column 1 abc 0 1 abc 2 3 abc 3 5 def 1 7 def 0 1 -------- i'd like (awk maybe?) to get the results (any ideas)??? abc 5 9 def 1 8 (2 Replies)
Discussion started by: jjoe
2 Replies

3. Shell Programming and Scripting

Merging 2 files based on a common column

Hi All, I do have 2 files file 1 has 4 tab delimited columns 234 a c dfgyu 294 b g fih 302 c h jzh 328 z c san 597 f g son File 2 has 2 tab delimted columns 234 23 302 24 597 24 I want to merge file 2 with file 1 based on the data common in both files which is the first column so... (6 Replies)
Discussion started by: Lucky Ali
6 Replies

4. Shell Programming and Scripting

Join multiple files based on 1 common column

I have n files (for ex:64 files) with one similar column. Is it possible to combine them all based on that column ? file1 ax100 20 30 40 ax200 22 33 44 file2 ax100 10 20 40 ax200 12 13 44 file2 ax100 0 0 4 ax200 2 3 4 (9 Replies)
Discussion started by: quincyjones
9 Replies

5. UNIX for Dummies Questions & Answers

Writing a loop to merge multiple files by common column

I have 100 data files labelled 250.1.txt through 250.100.txt. The second column of the data files partially match (there is about %90 overlap). Each data file has 4 columns. I want the merge all these text files by the matching values in the second column. In the output, the first column should... (1 Reply)
Discussion started by: evelibertine
1 Replies

6. Shell Programming and Scripting

common entries between files based on 1st column

Hi, I am trying to get the common entries from 2 files based on 1st field.. However when I try to do in perl I am getting blank output.. How can I do this in awk? open(BUFF1, "my_genes"); open(BUFF3, "rawcounts"); #open(WRBUFF,">result_rawcounts"); while($line =<BUFF1>) { ... (3 Replies)
Discussion started by: Diya123
3 Replies

7. UNIX for Dummies Questions & Answers

How to join 2 .txt files based on a common column?

Hi all, I'm trying to join two .txt file tab delimitated based on a common column. File 1 transcript_id gene_id length effective_length expected_count TPM FPKM IsoPct comp1000201_c0_seq1 comp1000201_c0 337 183.51 0.00 0.00 0.00 0.00 comp1000297_c0_seq1 ... (1 Reply)
Discussion started by: alisrpp
1 Replies

8. UNIX for Dummies Questions & Answers

Merge selective columns from files based on common key

Hi, I am trying to selectively merge two files based on keys reported in the 1st column. File1: #file1-header1 file1-header2 111 qwe rtz uio 198 asd fgh jkl 165 yxc 789 poi uzt rew 89 lkj File2: #file2-header2 file2-header2 165 ghz nko2 ... (2 Replies)
Discussion started by: dovah
2 Replies

9. Shell Programming and Scripting

Join columns across multiple lines in a Text based on common column using BASH

Hello, I have a file with 2 columns ( tableName , ColumnName) delimited by a Pipe like below . File is sorted by ColumnName. Table1|Column1 Table2|Column1 Table5|Column1 Table3|Column2 Table2|Column2 Table4|Column3 Table2|Column3 Table2|Column4 Table5|Column4 Table2|Column5 From... (6 Replies)
Discussion started by: nv186000
6 Replies

10. UNIX for Beginners Questions & Answers

How to copy a column of multiple files and paste into new excel file (next to column)?

I have data of an excel files as given below, file1 org1_1 1 1 2.5 100 org1_2 1 2 5.5 98 org1_3 1 3 7.2 88 file2 org2_1 1 1 2.5 100 org2_2 1 2 5.5 56 org2_3 1 3 7.2 70 I have multiple excel files as above shown. I have to copy column 1, column 4 and paste into a new excel file as... (26 Replies)
Discussion started by: dineshkumarsrk
26 Replies
All times are GMT -4. The time now is 11:02 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy