Compare & subtract lines in files by column using awk.


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Compare & subtract lines in files by column using awk.
# 1  
Old 01-12-2013
Code Compare & subtract lines in files by column using awk.

I have two files with similar column pattern as given below :

2 sample lines from file1 are given below.

Code:
18    12630    .    G    T    49.97    .    AC=2;AF=1.00;AN=2;DP=3;Dels=0.00;FS=0.000;HRun=0;HaplotypeScore=0.0000;MQ=60.00;MQ0=0;NDA=1;QD=16.66;SB=-0.01    GT:AD:DP:GQ:PL    1/1:0,3:3:9.01:82,9,0
18    12842    .    G    A    82.02    .    AC=1;AF=0.50;AN=2;BaseQRankSum=-1.898;DP=16;Dels=0.00;FS=6.560;HRun=3;HaplotypeScore=3.9547;MQ=53.50;MQ0=0;MQRankSum=1.247;NDA=1;QD=5.13;ReadPosRankSum=-0.705;SB=-0.01    GT:AD:DP:GQ:PL    0/1:10,6:16:99:112,0,224

2 sample lines from file 2 are given below:

Code:
18	12630	.	G	T	35.04	.	AC=1;AF=0.50;AN=2;BaseQRankSum=-0.727;DP=4;Dels=0.00;FS=0.000;HRun=0;HaplotypeScore=0.8667;MQ=52.65;MQ0=0;MQRankSum=-0.727;NDA=1;QD=8.76;ReadPosRankSum=-0.727;SB=-0.01	GT:AD:DP:GQ:PL	0/1:1,3:4:17.63:65,0,18
18	12768	.	G	C	41.03	.	AC=1;AF=0.50;AN=2;BaseQRankSum=-3.635;DP=31;Dels=0.00;FS=0.000;HRun=0;HaplotypeScore=0.0000;MQ=51.53;MQ0=0;MQRankSum=-4.010;NDA=1;QD=1.32;ReadPosRankSum=-2.087;SB=-0.01	GT:AD:DP:GQ:PL	0/1:23,8:31:71.02:71,0,524

I want to compare the columns in bold between the two files.Smilie

condition is if the columns of file 1 matches with columns of file 2, the line from first file to be deleted. i.e.(if columns(file 1)==columns(file 2) delete entire line from file1)

How it can be done using awk script. ?

Last edited by Scott; 01-12-2013 at 09:11 AM.. Reason: Code tags
# 2  
Old 01-12-2013
The specification is a bit vague, pls be more specific AND use code tags as advised! From what I infer from your samples, try:
Code:
$ awk 'NR==FNR{Ar[$1$2$4$5]++;next} !(($1$2$4$5) in Ar)' file2 file1
18    12842    .    G    A    82.02    .    AC=1;AF=0. . . .

# 3  
Old 01-14-2013
Thanks RudiC

Quote:
$ awk 'NR==FNR{Ar[$1$2$4$5]++;next} !(($1$2$4$5) in Ar)' file2 file1
It worked as expected. Thanks RudiCSmilie
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Need awk or Shell script to compare Column-1 of two different CSV files and print if column-1 matche

Example: I have files in below format file 1: zxc,133,joe@example.com cst,222,xyz@example1.com File 2 Contains: hxd hcd jws zxc cst File 1 has 50000 lines and file 2 has around 30000 lines : Expected Output has to be : hxd hcd jws (5 Replies)
Discussion started by: TestPractice
5 Replies

2. Shell Programming and Scripting

Compare two files column values using awk

Judi # cat File1 judi /export/home 76 judi /usr 83 judi # judi # cat File2 judi /export/home 79 judi /usr 82 judi # if COLUMN3 of File2 is greater that COLUMN3 of File1, then print File2's lines juid /export/home 79 Code tags please (2 Replies)
Discussion started by: judi
2 Replies

3. UNIX for Dummies Questions & Answers

awk command to compare files by column

So I have this issue. I have 4 files. the first one is the master file who has all possible combinations: file 1 - a - b - c - d - e the other three have some of the letters and a number instead of - for example file 2 34 a 5 c file 3 10 b 12 ... (3 Replies)
Discussion started by: Quijotes
3 Replies

4. Shell Programming and Scripting

Compare files & extract column awk

I have two tab delimited files as given below: File_1: PV16 E1 865 2814 1950 PV16 E2 2756 3853 1098 PV16 E4 3333 3620 288 PV16 E5 3850 4101 252 PV16 E6 83 559 477 PV16 E7 562 858 297 PV16 L2 4237 5658 ... (10 Replies)
Discussion started by: vaibhavvsk
10 Replies

5. Shell Programming and Scripting

How to compare the values of a column in awk in a same file and consecutive lines..

I would like to compare the values of 2nd column of consecutive lines of same file in such a way so that if the difference between first value and second value is more than 100 it should print complete line else ignore line. Input File ========== PDB 2500 RTDB 123 RTDB-EAGLE 122 VSCCP 2565... (4 Replies)
Discussion started by: manuswami
4 Replies

6. Shell Programming and Scripting

How to subtract the adjacent lines from a single column?

Hi All, I have a file with only one column and i need to subtract the adjacent lines of the same column and print it in the same column. For Example: (Input) Col1 5 10 12 6 9 12 5 . . . .output should be like this: (12 Replies)
Discussion started by: Fredrick
12 Replies

7. UNIX for Dummies Questions & Answers

Compare two files using awk or sed, add values in a column if their previous fields are same

Hi All, I have two files file1: abc,def,ghi,5,jkl,mno pqr,stu,ghi,10,vwx,xyz cba,ust,ihg,4,cdu,oqw file2: ravi,def,kishore ramu,ust,krishna joseph,stu,mike I need two output files as follows In my above example, each row in file1 has 6 fields and each row in file2 has 3... (1 Reply)
Discussion started by: yerruhari
1 Replies

8. UNIX for Advanced & Expert Users

Compare two files using awk or sed, add values in a column if their previous fields are same

Hi All, I have two files file1: abc,def,ghi,5,jkl,mno pqr,stu,ghi,10,vwx,xyz cba,ust,ihg,4,cdu,oqw file2: ravi,def,kishore ramu,ust,krishna joseph,stu,mike I need two output files as follows In my above example, each row in file1 has 6 fields and each row in file2 has 3... (1 Reply)
Discussion started by: yerruhari
1 Replies

9. Shell Programming and Scripting

awk to compare lines of two files and print output on screen

hey guys, I have two files both with two columns, I have already created an awk code to ignore certain lines (e.g lines that start with 963) as they wou ld begin with a certain string, however, the rest I have added together and calculated the average. At the moment the code also displays... (3 Replies)
Discussion started by: chlfc
3 Replies

10. Shell Programming and Scripting

awk compare column between 2 files

Hi, I would like to compare file1 and file2 file1 1 2 3 file2 1 a 2 b 3 c 4 d The result should only print out "d" in file 2. Thanks (3 Replies)
Discussion started by: phamp008
3 Replies
Login or Register to Ask a Question