Compare files & extract column awk


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Compare files & extract column awk
# 8  
Old 03-18-2013
Quote:
Originally Posted by panyam
Still it's not clear.

You want to do a line by comparison from file1 to file2?

something like the below sudo code?

Code:
for line1 in file1
do

file line2 in file2
do

if file1.fields <your comparison stuff> file2.fields --success
then
print file2 fields , file1 2nd column
else
print just file2 fields
fi

done

done

?
yes similar kind
# 9  
Old 03-18-2013
Something like :

Code:
# awk '{$1=$1}NR==FNR{s=NR;L[NR]=$0;next}{for(i=0;++i<=s;)
{n=split(L[i],F," ");
if ((F[1]==$2)&&($3>=F[3])&&($4<=F[4])) {print $0 OFS F[2]}
}}' OFS="\t" File_1 File_2
1099    PV16    766     837     E7
1099    PV16    904     975     E1
1099    PV16    1022    1118    E1
1099    PV16    1438    1531    E1
1099    PV16    1572    1643    E1
1099    PV16    1952    2023    E1
1099    PV16    2070    2211    E1
1099    PV16    2322    2393    E1
1099    PV16    2432    2503    E1
1099    PV16    2545    2616    E1
1099    PV16    2808    2884    E2
1099    PV16    3050    3121    E2
1099    PV16    3507    3693    E2
1099    PV16    3696    3767    E2
1099    PV16    4811    4882    L2

Or are you also willing to display the lines for which the conditions are never met ?

Code:
# awk '{$1=$1}NR==FNR{s=NR;L[NR]=$0;next}{for(i=0;++i<=s;)
{e="Null";n=split(L[i],F," ");
if ((F[1]==$2)&&($3>=F[3])&&($4<=F[4])) {e=F[2]}
print $0 OFS e}
}' OFS="\t" File_1 File_2

Code:
1099    PV16    766     837     Null
1099    PV16    766     837     Null
1099    PV16    766     837     Null
1099    PV16    766     837     Null
1099    PV16    766     837     Null
1099    PV16    766     837     E7
1099    PV16    766     837     Null
1099    PV16    766     837     Null
1099    PV16    904     975     E1
1099    PV16    904     975     Null
1099    PV16    904     975     Null
1099    PV16    904     975     Null
1099    PV16    904     975     Null
1099    PV16    904     975     Null
1099    PV16    904     975     Null
1099    PV16    904     975     Null
1099    PV16    1022    1118    E1
1099    PV16    1022    1118    Null
1099    PV16    1022    1118    Null
1099    PV16    1022    1118    Null
1099    PV16    1022    1118    Null
1099    PV16    1022    1118    Null
1099    PV16    1022    1118    Null
1099    PV16    1022    1118    Null
1099    PV17    1198    1269    Null
1099    PV17    1198    1269    Null
1099    PV17    1198    1269    Null
1099    PV17    1198    1269    Null
1099    PV17    1198    1269    Null
1099    PV17    1198    1269    Null
1099    PV17    1198    1269    Null
1099    PV17    1198    1269    Null
1099    PV16    1438    1531    E1
1099    PV16    1438    1531    Null
1099    PV16    1438    1531    Null
1099    PV16    1438    1531    Null
1099    PV16    1438    1531    Null
1099    PV16    1438    1531    Null
1099    PV16    1438    1531    Null
1099    PV16    1438    1531    Null
1099    PV16    1572    1643    E1
1099    PV16    1572    1643    Null
1099    PV16    1572    1643    Null
1099    PV16    1572    1643    Null
1099    PV16    1572    1643    Null
1099    PV16    1572    1643    Null
1099    PV16    1572    1643    Null
1099    PV16    1572    1643    Null
1099    PV16    1952    2023    E1
1099    PV16    1952    2023    Null
1099    PV16    1952    2023    Null
1099    PV16    1952    2023    Null
1099    PV16    1952    2023    Null
1099    PV16    1952    2023    Null
1099    PV16    1952    2023    Null
1099    PV16    1952    2023    Null
1099    PV16    2070    2211    E1
1099    PV16    2070    2211    Null
1099    PV16    2070    2211    Null
1099    PV16    2070    2211    Null
1099    PV16    2070    2211    Null
1099    PV16    2070    2211    Null
1099    PV16    2070    2211    Null
1099    PV16    2070    2211    Null
1099    PV16    2322    2393    E1
1099    PV16    2322    2393    Null
1099    PV16    2322    2393    Null
1099    PV16    2322    2393    Null
1099    PV16    2322    2393    Null
1099    PV16    2322    2393    Null
1099    PV16    2322    2393    Null
1099    PV16    2322    2393    Null
1099    PV16    2432    2503    E1
1099    PV16    2432    2503    Null
1099    PV16    2432    2503    Null
1099    PV16    2432    2503    Null
1099    PV16    2432    2503    Null
1099    PV16    2432    2503    Null
1099    PV16    2432    2503    Null
1099    PV16    2432    2503    Null
1099    PV16    2545    2616    E1
1099    PV16    2545    2616    Null
1099    PV16    2545    2616    Null
1099    PV16    2545    2616    Null
1099    PV16    2545    2616    Null
1099    PV16    2545    2616    Null
1099    PV16    2545    2616    Null
1099    PV16    2545    2616    Null
1099    PV16    2808    2884    Null
1099    PV16    2808    2884    E2
1099    PV16    2808    2884    Null
1099    PV16    2808    2884    Null
1099    PV16    2808    2884    Null
1099    PV16    2808    2884    Null
1099    PV16    2808    2884    Null
1099    PV16    2808    2884    Null
1099    PV16    3050    3121    Null
1099    PV16    3050    3121    E2
1099    PV16    3050    3121    Null
1099    PV16    3050    3121    Null
1099    PV16    3050    3121    Null
1099    PV16    3050    3121    Null
1099    PV16    3050    3121    Null
1099    PV16    3050    3121    Null
1099    PV16    3507    3693    Null
1099    PV16    3507    3693    E2
1099    PV16    3507    3693    Null
1099    PV16    3507    3693    Null
1099    PV16    3507    3693    Null
1099    PV16    3507    3693    Null
1099    PV16    3507    3693    Null
1099    PV16    3507    3693    Null
1099    PV16    3696    3767    Null
1099    PV16    3696    3767    E2
1099    PV16    3696    3767    Null
1099    PV16    3696    3767    Null
1099    PV16    3696    3767    Null
1099    PV16    3696    3767    Null
1099    PV16    3696    3767    Null
1099    PV16    3696    3767    Null
1099    PV16    4036    4107    Null
1099    PV16    4036    4107    Null
1099    PV16    4036    4107    Null
1099    PV16    4036    4107    Null
1099    PV16    4036    4107    Null
1099    PV16    4036    4107    Null
1099    PV16    4036    4107    Null
1099    PV16    4036    4107    Null
1099    PV19    4246    4317    Null
1099    PV19    4246    4317    Null
1099    PV19    4246    4317    Null
1099    PV19    4246    4317    Null
1099    PV19    4246    4317    Null
1099    PV19    4246    4317    Null
1099    PV19    4246    4317    Null
1099    PV19    4246    4317    Null
1099    PV20    4383    4454    Null
1099    PV20    4383    4454    Null
1099    PV20    4383    4454    Null
1099    PV20    4383    4454    Null
1099    PV20    4383    4454    Null
1099    PV20    4383    4454    Null
1099    PV20    4383    4454    Null
1099    PV20    4383    4454    Null
1099    PV18    4673    4797    Null
1099    PV18    4673    4797    Null
1099    PV18    4673    4797    Null
1099    PV18    4673    4797    Null
1099    PV18    4673    4797    Null
1099    PV18    4673    4797    Null
1099    PV18    4673    4797    Null
1099    PV18    4673    4797    Null
1099    PV16    4811    4882    Null
1099    PV16    4811    4882    Null
1099    PV16    4811    4882    Null
1099    PV16    4811    4882    Null
1099    PV16    4811    4882    Null
1099    PV16    4811    4882    Null
1099    PV16    4811    4882    L2
1099    PV16    4811    4882    Null

Just another piece of code you may wanna try in case you want all lines display at least once (those who never match will be display once an those who match several line from file1 will be displayed everytime it matches the condition.

So if we add for example a line "G2" like :
Code:
# head -2 File_1
PV16    E1    865    2814    1950
PV16    G2    865    2814    1950

The code

Code:
awk '{$1=$1}NR==FNR{s=NR;L[NR]=$0;next}{f=0;for(i=0;++i<=s;)
{e="Null";n=split(L[i],F," ");
if ((F[1]==$2)&&($3>=F[3])&&($4<=F[4])) {e=F[2];print $0 OFS e;f=1}
};if(!f) print $0 OFS e
}' OFS="\t" File_1 File_2

will give

Code:
1099    PV16    766     837     E7
1099    PV16    904     975     E1
1099    PV16    904     975     G2
1099    PV16    1022    1118    E1
1099    PV16    1022    1118    G2
1099    PV17    1198    1269    Null
1099    PV16    1438    1531    E1
1099    PV16    1438    1531    G2
1099    PV16    1572    1643    E1
1099    PV16    1572    1643    G2
1099    PV16    1952    2023    E1
1099    PV16    1952    2023    G2
1099    PV16    2070    2211    E1
1099    PV16    2070    2211    G2
1099    PV16    2322    2393    E1
1099    PV16    2322    2393    G2
1099    PV16    2432    2503    E1
1099    PV16    2432    2503    G2
1099    PV16    2545    2616    E1
1099    PV16    2545    2616    G2
1099    PV16    2808    2884    E2
1099    PV16    3050    3121    E2
1099    PV16    3507    3693    E2
1099    PV16    3696    3767    E2
1099    PV16    4036    4107    Null
1099    PV19    4246    4317    Null
1099    PV20    4383    4454    Null
1099    PV18    4673    4797    Null
1099    PV16    4811    4882    L2


Last edited by ctsgnb; 03-18-2013 at 02:10 PM..
This User Gave Thanks to ctsgnb For This Post:
# 10  
Old 03-19-2013
Smilie It worked as expected. Hey ctsgnb Thanks a lot Smilie.
I'm very much interested to learn such multicolumn comparison tasks especially if there are multiple file and multiple columns. Please recommend some online tutorials links to understand in depth about awk implementation.
# 11  
Old 03-19-2013
If you are interested in learning awk :

You can read "Awk - A Tutorial and Introduction - by Bruce Barnett" (the PDF can be found for free quite easy over the web)

... and then... practice and practice !
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Need awk or Shell script to compare Column-1 of two different CSV files and print if column-1 matche

Example: I have files in below format file 1: zxc,133,joe@example.com cst,222,xyz@example1.com File 2 Contains: hxd hcd jws zxc cst File 1 has 50000 lines and file 2 has around 30000 lines : Expected Output has to be : hxd hcd jws (5 Replies)
Discussion started by: TestPractice
5 Replies

2. Shell Programming and Scripting

Compare 2 files and extract the data which is present in other file - awk is not working

file2 content f1file2 content f1,1,2,3,4,5 f1,2,4,6,8,10 f10,1,2,3,4,5 f10,2,4,6,8,10 f5,1,2,3,4,5 f5,2,4,6,8,10awk 'FNR==NR{a;next}; !($1 in a)' file2 file1output f10,1,2,3,4,5 f10,2,4,6,8,10 f5,1,2,3,4,5 f5,2,4,6,8,10awk 'FNR==NR{a;next}; ($1 in a)' file2 file1output nothing... (4 Replies)
Discussion started by: gksenthilkumar
4 Replies

3. Shell Programming and Scripting

Compare two files column values using awk

Judi # cat File1 judi /export/home 76 judi /usr 83 judi # judi # cat File2 judi /export/home 79 judi /usr 82 judi # if COLUMN3 of File2 is greater that COLUMN3 of File1, then print File2's lines juid /export/home 79 Code tags please (2 Replies)
Discussion started by: judi
2 Replies

4. UNIX for Dummies Questions & Answers

awk command to compare files by column

So I have this issue. I have 4 files. the first one is the master file who has all possible combinations: file 1 - a - b - c - d - e the other three have some of the letters and a number instead of - for example file 2 34 a 5 c file 3 10 b 12 ... (3 Replies)
Discussion started by: Quijotes
3 Replies

5. Shell Programming and Scripting

Compare & subtract lines in files by column using awk.

I have two files with similar column pattern as given below : 2 sample lines from file1 are given below. 18 12630 . G T 49.97 . AC=2;AF=1.00;AN=2;DP=3;Dels=0.00;FS=0.000;HRun=0;HaplotypeScore=0.0000;MQ=60.00;MQ0=0;NDA=1;QD=16.66;SB=-0.01 GT:AD:DP:GQ:PL ... (2 Replies)
Discussion started by: vaibhavvsk
2 Replies

6. Shell Programming and Scripting

Compare two files using awk or sed, add values in a column if their previous fields are same

Hi All, I have two files file1: abc,def,ghi,5,jkl,mno pqr,stu,ghi,10,vwx,xyz cba,ust,ihg,4,cdu,oqw file2: ravi,def,kishore ramu,ust,krishna joseph,stu,mike I need two output files as follows In my above example, each row in file1 has 6 fields and each row in file2 has 3... (3 Replies)
Discussion started by: yerruhari
3 Replies

7. UNIX for Dummies Questions & Answers

Compare two files using awk or sed, add values in a column if their previous fields are same

Hi All, I have two files file1: abc,def,ghi,5,jkl,mno pqr,stu,ghi,10,vwx,xyz cba,ust,ihg,4,cdu,oqw file2: ravi,def,kishore ramu,ust,krishna joseph,stu,mike I need two output files as follows In my above example, each row in file1 has 6 fields and each row in file2 has 3... (1 Reply)
Discussion started by: yerruhari
1 Replies

8. UNIX for Advanced & Expert Users

Compare two files using awk or sed, add values in a column if their previous fields are same

Hi All, I have two files file1: abc,def,ghi,5,jkl,mno pqr,stu,ghi,10,vwx,xyz cba,ust,ihg,4,cdu,oqw file2: ravi,def,kishore ramu,ust,krishna joseph,stu,mike I need two output files as follows In my above example, each row in file1 has 6 fields and each row in file2 has 3... (1 Reply)
Discussion started by: yerruhari
1 Replies

9. Shell Programming and Scripting

How to extract a column from two different files in AWK?

Hi guys, I need help in extracting one column of numbers from two different files and display it in a output file. In specific, I want to extrac the column no.2 ($2) from each file, file1.txt, file2.txt. Then place both extracted columns in a one file, out.txt. the line command I use to... (7 Replies)
Discussion started by: solracq
7 Replies

10. Shell Programming and Scripting

awk compare column between 2 files

Hi, I would like to compare file1 and file2 file1 1 2 3 file2 1 a 2 b 3 c 4 d The result should only print out "d" in file 2. Thanks (3 Replies)
Discussion started by: phamp008
3 Replies
Login or Register to Ask a Question