Compare and replace two columns from two files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Compare and replace two columns from two files
# 1  
Old 07-03-2018
Compare and replace two columns from two files

Hello,

I have two text tab delimited files

File 1 has 30 columns. I am pasting only first 9

Code:
Chr    Position    Ref    Alt    Score    Gene    HGVS_C   HGVS_P    Coding_Consequence    dbSNP
1    17312743    C    T    1    Gene1    -    ATP13A2:NM_001141974.2:exon27:c.3214G>A:p.A1072T;ATP13A2:NM_001141973.2:exon29:c.3501G>A:p.P1167P,ATP13A2:NM_022089.3:exon29:c.3516G>A:p.P1172P    SNV    1
1    17313343    G    A    1    Gene2    -    ATP13A2:NM_001141973.2:exon27:c.3177C>T:p.A1059A,ATP13A2:NM_001141974.2:exon26:c.3060C>T:p.A1020A,ATP13A2:NM_022089.3:exon27:c.3192C>T:p.A1064A    SNV    2
1    17313654    C    T    1    Gene3    -    ATP13A2:NM_001141973.2:exon26:c.2955G>A:p.V985V,ATP13A2:NM_001141974.2:exon25:c.2838G>A:p.V946V,ATP13A2:NM_022089.3:exon26:c.2970G>A:p.V990Vsynonymous    SNV    3
1    17314942    G    A    1    Gene4    -    ATP13A2:NM_001141973.2:exon24:c.2622C>T:p.G874G,ATP13A2:NM_001141974.2:exon23:c.2505C>T:p.G835G,ATP13A2:NM_022089.3:exon24:c.2637C>T:p.G879Gsynonymous    SNV    4
1    17319011    G    A    1    Gene5    -    ATP13A2:NM_001141973.2:exon17:c.1800C>T:p.P600P,ATP13A2:NM_001141974.2:exon17:c.1800C>T:p.P600P,ATP13A2:NM_022089.3:exon17:c.1815C>T:p.P605Psynonymous    SNV    5
1    20960230    C    T    1    Gene6    -    PINK1:NM_032409.2:exon1:c.189C>T:p.L63L    SNV    6
1    20964328    A    G    1    Gene7    NM_032409.2:exon2:c.388-7A>G    -    -    7
1    20972048    G    A    1    Gene8    NM_032409.2:exon5:c.960-5G>A;NR_046507.1:exon2:c.3981+30C>T    -    -    8
 1    43395635    C    T    1    Gene9    -    SLC2A1:NM_006516.2:exon5:c.588G>A:p.P196P    synonymous    9

File 2 has 4 columns


Code:
CHROM    POS    ID    REF    ALT    ANN[*].FEATUREID:ANN[*].HGVS_C    ANN[*].HGVS_P
1    17312743    rs3170740    C    T    NM_001141974.2:c.3214G>A,NM_022089.3:c.3516G>A,NM_001141973.2:c.3501G>A,NM_001135247.1:c.-7975G>A,NM_017459.2:c.-7975G>A    p.Ala1072Thr,p.Pro1172Pro,p.Pro1167Pro,.,.
1    17313343    rs9435659    G    A    NM_022089.3:c.3192C>T,NM_001141973.2:c.3177C>T,NM_001141974.2:c.3060C>T    p.Ala1064Ala,p.Ala1059Ala,p.Ala1020Ala
1    17313654    rs761421    C    T    NM_022089.3:c.2970G>A,NM_001141973.2:c.2955G>A,NM_001141974.2:c.2838G>A    p.Val990Val,p.Val985Val,p.Val946Val
1    17314942    rs9435662    G    A    NM_022089.3:c.2637C>T,NM_001141973.2:c.2622C>T,NM_001141974.2:c.2505C>T    p.Gly879Gly,p.Gly874Gly,p.Gly835Gly
1    17319011    rs2076603    G    A    NM_022089.3:c.1815C>T,NM_001141973.2:c.1800C>T,NM_001141974.2:c.1800C>T    p.Pro605Pro,p.Pro600Pro,p.Pro600Pro
1    20960230    rs45530340    C    T    NM_032409.2:c.189C>T,NR_106732.1:n.59C>T    p.Leu63Leu,.
1    20964328    rs2298298    A    G    NM_032409.2:c.388-7A>G,NR_106732.1:n.*4047A>G,NR_046507.1:n.*4822T>C    .,.,.
1    20972048    rs3131713    G    A    NM_032409.2:c.960-5G>A,NR_046507.1:n.3981+30C>T    .,.
1    43395635    rs2229682    C    T    NM_006516.2:c.588G>A    p.Pro196Pro

I would like to

1) compare Chr:Position from File 1 with CHROM:POS from file 2 and then

2) if values match replace column 7 (HGVS_Noncoding) and 8 (HGVS_Coding) from file 1 with values from file 2, column 6 and 7 respectively.

3) The original header from file 1 remains


4) Column 4 from file 2 has many ".," of various length which needs to be removed and changed to "-" in the final file

Desired output

Code:
Chr    Position    Ref    Alt    Score    Gene    HGVS_C    HGVS_P    Coding_Consequence    dbSNP
1    17312743    C    T    1    Gene1    NM_001141974.2:c.3214G>A,NM_022089.3:c.3516G>A,NM_001141973.2:c.3501G>A,NM_001135247.1:c.-7975G>A,NM_017459.2:c.-7975G>A    p.Ala1072Thr,p.Pro1172Pro,p.Pro1167Pro,.,.    SNV    1
1    17313343    G    A    1    Gene2    NM_022089.3:c.3192C>T,NM_001141973.2:c.3177C>T,NM_001141974.2:c.3060C>T    p.Ala1064Ala,p.Ala1059Ala,p.Ala1020Ala    SNV    2
1    17313654    C    T    1    Gene3    NM_022089.3:c.2970G>A,NM_001141973.2:c.2955G>A,NM_001141974.2:c.2838G>A    p.Val990Val,p.Val985Val,p.Val946Val    SNV    3
1    17314942    G    A    1    Gene4    NM_022089.3:c.2637C>T,NM_001141973.2:c.2622C>T,NM_001141974.2:c.2505C>T    p.Gly879Gly,p.Gly874Gly,p.Gly835Gly    SNV    4
1    17319011    G    A    1    Gene5    NM_022089.3:c.1815C>T,NM_001141973.2:c.1800C>T,NM_001141974.2:c.1800C>T    p.Pro605Pro,p.Pro600Pro,p.Pro600Pro    SNV    5
1    20960230    C    T    1    Gene6    NM_032409.2:c.189C>T,NR_106732.1:n.59C>T    p.Leu63Leu    SNV    6
1    20964328    A    G    1    Gene7    NM_032409.2:c.388-7A>G,NR_106732.1:n.*4047A>G,NR_046507.1:n.*4822T>C    -    -    7
1    20972048    G    A    1    Gene8    NM_032409.2:c.960-5G>A,NR_046507.1:n.3981+30C>T    -    -    8
 1    43395635    C    T    1    Gene9    NM_006516.2:c.588G>A    p.Pro196Pro    synonymous    9

I am not able to write a command that fulfils all conditions, I am only able to replace columns using two different commands
Code:

#replace column 6 from file 1 with column 7 from file 2

 awk 'FNR==NR{a[NR]=$6; next}{$7=a[FNR]}1' FS='\t' OFS='\t' file2 file1 > file3


#replace column 7 from file 1 with column 8 from file 2

 awk 'FNR==NR{a[NR]=$7; next}{$8=a[FNR]}1' FS='\t' OFS='\t' file2 file3 > file4


#remove pattern from column 7 from file
awk '{gsub(/\.,.*/,"-");}1' file4 > final.txt #patterns not changed with this command

Suggestions are appreciated. Many thanks.

Last edited by nans; 07-03-2018 at 09:44 AM..
# 2  
Old 07-03-2018
Here is my attempt, I've only replace fields that start with two or more ., characters as you seem to have fields ending with these characters in your desired output:

Code:
awk '
FNR==1 {
   if (NR>1) print
   next
}
{ key = $1 FS $2 }
FNR==NR{
    hgvs_c[key]=$6
    hgvs_p[key]=$7
    next
}
key in hgvs_c {
   $7 = hgvs_c[key]
   $8 = hgvs_p[key]
}
{ gsub(/\t[\.,]{2,}[^\t]*/, "\t-") }
1' FS='\t' OFS='\t' file2 file1

This User Gave Thanks to Chubler_XL For This Post:
# 3  
Old 07-04-2018
Thank you very much, this works well.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

How to compare two columns in two files?

Hi All, I have a.dat file with content 1,338,30253395122015103,2015103,UB0085000,STMT151117055527002,,, 1,338,30253395122015103,2015103,UB0085000,STMT151117055527001,,, and b.dat having content 1,STMT151117055527001,a1.txt,b1.txt,c1.txt 1,STMT151117055527002,a2.txt,b2.txt,c2.txt ... (13 Replies)
Discussion started by: PRAMOD 96
13 Replies

2. UNIX for Dummies Questions & Answers

Help need to compare columns in files

Hi, Below is my requirement file1 id|cnt 1|1 2|2 3|3 file2 id_1|cnt_1 1|1 2|1 3|1 I want to compare cnt and cnt_1 columns, if they are differ then give the details Am using below awk command, but the output is not as expected. (2 Replies)
Discussion started by: grandhirahuletl
2 Replies

3. Shell Programming and Scripting

Compare 2 csv files by columns, then extract certain columns of matcing rows

Hi all, I'm pretty much a newbie to UNIX. I would appreciate any help with UNIX coding on comparing two large csv files (greater than 10 GB in size), and output a file with matching columns. I want to compare file1 and file2 by 'id' and 'chain' columns, then extract exact matching rows'... (5 Replies)
Discussion started by: bkane3
5 Replies

4. Shell Programming and Scripting

Compare columns in different files

Hi, I have two files like this: 8 1.3 10 1.3 12 1.3 15 1.3 21 1.3 and 1 2 3 4 10 11 15 16 21 22 (3 Replies)
Discussion started by: jamie_123
3 Replies

5. Shell Programming and Scripting

Compare Columns of two files

Hi I have file 1 like this and file 2 like this I need to compare column 3 of both files and delete lines in file1 with same column 3 values in two files. So the output is I tried with perl but didnt work. A perl code will be good as I am learning the language, but any other code would... (1 Reply)
Discussion started by: polsum
1 Replies

6. Shell Programming and Scripting

How to compare the columns in two .csv files?

Hi I have to compare two .csv files which having 4 columns and i am expecting the output if there is difference in the 3,4columns in two files with respect to the first column. if my statement is not clear please refer the example. Input: ----- File 1 : hostname MAC SWITCH_IP SWITCH_PORT... (7 Replies)
Discussion started by: Kanchana
7 Replies

7. UNIX for Dummies Questions & Answers

Compare Columns in two files

Hi all, I would like to compare a column in one file to a column in another file and when there is a match it prints the first column and the corresponding second column. Example File1 ABA ABC ABE ABF File 2 ABA 123 ABB 124 ABD 125 ABC 126 So what I would like printed to a file... (0 Replies)
Discussion started by: pcg
0 Replies

8. Shell Programming and Scripting

How to compare 2 files & get only few columns based on a condition related to both files?

Hiiiii friends I have 2 files which contains huge data & few lines of it are as shown below File1: b.dat(which has 21 columns) SSR 1976 8 12 13 10 44.00 39.0700 70.7800 7.0 0 0.00 0 2.78 0.00 0.00 0 0.00 2.78 0 NULL ISC 1976 8 12 22 32 37.39 36.2942 70.7338... (6 Replies)
Discussion started by: reva
6 Replies

9. Shell Programming and Scripting

How to compare two columns in two files?

Hello all, Could someone please let me know shell script or awk solution to compare two columns in two files? Here is the sample - file1.txt abc/xyz,M1234 ddd/lyg,M2345 cnn/tnt,G0123 file2.txt A,abc/xyz,kk,dd,zz,DCT,G0123,1 A,ddd/lyg,kk,dd,zz,DCT,M1234,1... (17 Replies)
Discussion started by: sncoupons
17 Replies

10. Shell Programming and Scripting

Compare few columns from two files

My Friends, Need your help to find the difference between few columns from two comma delimited files. For example, File1 and File2 has 22 columns, and I want to find the difference in first 12 columns. I have list of file names in MyListOfFiles2Compare.txt. Data is separated with commas.... (5 Replies)
Discussion started by: manish44
5 Replies
Login or Register to Ask a Question