Chr Position Ref Alt Score Gene HGVS_C HGVS_P Coding_Consequence dbSNP
1 17312743 C T 1 Gene1 - ATP13A2:NM_001141974.2:exon27:c.3214G>A:p.A1072T;ATP13A2:NM_001141973.2:exon29:c.3501G>A:p.P1167P,ATP13A2:NM_022089.3:exon29:c.3516G>A:p.P1172P SNV 1
1 17313343 G A 1 Gene2 - ATP13A2:NM_001141973.2:exon27:c.3177C>T:p.A1059A,ATP13A2:NM_001141974.2:exon26:c.3060C>T:p.A1020A,ATP13A2:NM_022089.3:exon27:c.3192C>T:p.A1064A SNV 2
1 17313654 C T 1 Gene3 - ATP13A2:NM_001141973.2:exon26:c.2955G>A:p.V985V,ATP13A2:NM_001141974.2:exon25:c.2838G>A:p.V946V,ATP13A2:NM_022089.3:exon26:c.2970G>A:p.V990Vsynonymous SNV 3
1 17314942 G A 1 Gene4 - ATP13A2:NM_001141973.2:exon24:c.2622C>T:p.G874G,ATP13A2:NM_001141974.2:exon23:c.2505C>T:p.G835G,ATP13A2:NM_022089.3:exon24:c.2637C>T:p.G879Gsynonymous SNV 4
1 17319011 G A 1 Gene5 - ATP13A2:NM_001141973.2:exon17:c.1800C>T:p.P600P,ATP13A2:NM_001141974.2:exon17:c.1800C>T:p.P600P,ATP13A2:NM_022089.3:exon17:c.1815C>T:p.P605Psynonymous SNV 5
1 20960230 C T 1 Gene6 - PINK1:NM_032409.2:exon1:c.189C>T:p.L63L SNV 6
1 20964328 A G 1 Gene7 NM_032409.2:exon2:c.388-7A>G - - 7
1 20972048 G A 1 Gene8 NM_032409.2:exon5:c.960-5G>A;NR_046507.1:exon2:c.3981+30C>T - - 8
1 43395635 C T 1 Gene9 - SLC2A1:NM_006516.2:exon5:c.588G>A:p.P196P synonymous 9
File 2 has 4 columns
Code:
CHROM POS ID REF ALT ANN[*].FEATUREID:ANN[*].HGVS_C ANN[*].HGVS_P
1 17312743 rs3170740 C T NM_001141974.2:c.3214G>A,NM_022089.3:c.3516G>A,NM_001141973.2:c.3501G>A,NM_001135247.1:c.-7975G>A,NM_017459.2:c.-7975G>A p.Ala1072Thr,p.Pro1172Pro,p.Pro1167Pro,.,.
1 17313343 rs9435659 G A NM_022089.3:c.3192C>T,NM_001141973.2:c.3177C>T,NM_001141974.2:c.3060C>T p.Ala1064Ala,p.Ala1059Ala,p.Ala1020Ala
1 17313654 rs761421 C T NM_022089.3:c.2970G>A,NM_001141973.2:c.2955G>A,NM_001141974.2:c.2838G>A p.Val990Val,p.Val985Val,p.Val946Val
1 17314942 rs9435662 G A NM_022089.3:c.2637C>T,NM_001141973.2:c.2622C>T,NM_001141974.2:c.2505C>T p.Gly879Gly,p.Gly874Gly,p.Gly835Gly
1 17319011 rs2076603 G A NM_022089.3:c.1815C>T,NM_001141973.2:c.1800C>T,NM_001141974.2:c.1800C>T p.Pro605Pro,p.Pro600Pro,p.Pro600Pro
1 20960230 rs45530340 C T NM_032409.2:c.189C>T,NR_106732.1:n.59C>T p.Leu63Leu,.
1 20964328 rs2298298 A G NM_032409.2:c.388-7A>G,NR_106732.1:n.*4047A>G,NR_046507.1:n.*4822T>C .,.,.
1 20972048 rs3131713 G A NM_032409.2:c.960-5G>A,NR_046507.1:n.3981+30C>T .,.
1 43395635 rs2229682 C T NM_006516.2:c.588G>A p.Pro196Pro
I would like to
1) compare Chr:Position from File 1 with CHROM:POS from file 2 and then
2) if values match replace column 7 (HGVS_Noncoding) and 8 (HGVS_Coding) from file 1 with values from file 2, column 6 and 7 respectively.
3) The original header from file 1 remains
4) Column 4 from file 2 has many ".," of various length which needs to be removed and changed to "-" in the final file
Desired output
Code:
Chr Position Ref Alt Score Gene HGVS_C HGVS_P Coding_Consequence dbSNP
1 17312743 C T 1 Gene1 NM_001141974.2:c.3214G>A,NM_022089.3:c.3516G>A,NM_001141973.2:c.3501G>A,NM_001135247.1:c.-7975G>A,NM_017459.2:c.-7975G>A p.Ala1072Thr,p.Pro1172Pro,p.Pro1167Pro,.,. SNV 1
1 17313343 G A 1 Gene2 NM_022089.3:c.3192C>T,NM_001141973.2:c.3177C>T,NM_001141974.2:c.3060C>T p.Ala1064Ala,p.Ala1059Ala,p.Ala1020Ala SNV 2
1 17313654 C T 1 Gene3 NM_022089.3:c.2970G>A,NM_001141973.2:c.2955G>A,NM_001141974.2:c.2838G>A p.Val990Val,p.Val985Val,p.Val946Val SNV 3
1 17314942 G A 1 Gene4 NM_022089.3:c.2637C>T,NM_001141973.2:c.2622C>T,NM_001141974.2:c.2505C>T p.Gly879Gly,p.Gly874Gly,p.Gly835Gly SNV 4
1 17319011 G A 1 Gene5 NM_022089.3:c.1815C>T,NM_001141973.2:c.1800C>T,NM_001141974.2:c.1800C>T p.Pro605Pro,p.Pro600Pro,p.Pro600Pro SNV 5
1 20960230 C T 1 Gene6 NM_032409.2:c.189C>T,NR_106732.1:n.59C>T p.Leu63Leu SNV 6
1 20964328 A G 1 Gene7 NM_032409.2:c.388-7A>G,NR_106732.1:n.*4047A>G,NR_046507.1:n.*4822T>C - - 7
1 20972048 G A 1 Gene8 NM_032409.2:c.960-5G>A,NR_046507.1:n.3981+30C>T - - 8
1 43395635 C T 1 Gene9 NM_006516.2:c.588G>A p.Pro196Pro synonymous 9
I am not able to write a command that fulfils all conditions, I am only able to replace columns using two different commands
Code:
#replace column 6 from file 1 with column 7 from file 2
awk 'FNR==NR{a[NR]=$6; next}{$7=a[FNR]}1' FS='\t' OFS='\t' file2 file1 > file3
#replace column 7 from file 1 with column 8 from file 2
awk 'FNR==NR{a[NR]=$7; next}{$8=a[FNR]}1' FS='\t' OFS='\t' file2 file3 > file4
#remove pattern from column 7 from file
awk '{gsub(/\.,.*/,"-");}1' file4 > final.txt #patterns not changed with this command
My Friends,
Need your help to find the difference between few columns from two comma delimited files. For example, File1 and File2 has 22 columns, and I want to find the difference in first 12 columns.
I have list of file names in MyListOfFiles2Compare.txt. Data is separated with commas.... (5 Replies)
Hello all,
Could someone please let me know shell script or awk solution to compare two columns in two files? Here is the sample -
file1.txt
abc/xyz,M1234
ddd/lyg,M2345
cnn/tnt,G0123
file2.txt
A,abc/xyz,kk,dd,zz,DCT,G0123,1
A,ddd/lyg,kk,dd,zz,DCT,M1234,1... (17 Replies)
Hi all,
I would like to compare a column in one file to a column in another file and when there is a match it prints the first column and the corresponding second column. Example
File1
ABA
ABC
ABE
ABF
File 2
ABA 123
ABB 124
ABD 125
ABC 126
So what I would like printed to a file... (0 Replies)
Hi
I have to compare two .csv files which having 4 columns and i am expecting the output if there is difference in the 3,4columns in two files with respect to the first column. if my statement is not clear please refer the example.
Input:
-----
File 1 :
hostname MAC SWITCH_IP SWITCH_PORT... (7 Replies)
Hi
I have file 1 like this
and file 2 like this
I need to compare column 3 of both files and delete lines in file1 with same column 3 values in two files. So the output is
I tried with perl but didnt work. A perl code will be good as I am learning the language, but any other code would... (1 Reply)
Hi all, I'm pretty much a newbie to UNIX. I would appreciate any help with UNIX coding on comparing two large csv files (greater than 10 GB in size), and output a file with matching columns.
I want to compare file1 and file2 by 'id' and 'chain' columns, then extract exact matching rows'... (5 Replies)
Hi,
Below is my requirement
file1
id|cnt
1|1
2|2
3|3
file2
id_1|cnt_1
1|1
2|1
3|1
I want to compare cnt and cnt_1 columns, if they are differ then give the details
Am using below awk command, but the output is not as expected. (2 Replies)
Hi All,
I have a.dat file with content
1,338,30253395122015103,2015103,UB0085000,STMT151117055527002,,,
1,338,30253395122015103,2015103,UB0085000,STMT151117055527001,,,
and b.dat having content
1,STMT151117055527001,a1.txt,b1.txt,c1.txt
1,STMT151117055527002,a2.txt,b2.txt,c2.txt
... (13 Replies)