Comapring columns in 2 files and printing the values that differ.


 
Thread Tools Search this Thread
Operating Systems Linux Comapring columns in 2 files and printing the values that differ.
# 1  
Old 11-02-2015
Comapring columns in 2 files and printing the values that differ.

I have a file (file-1) that looks like this,
Code:
DIP-10097N|refseq:NP_416170|uniprotkb:P30015
DIP-10117N|refseq:NP_414973|uniprotkb:P08177
DIP-10168N|refseq:NP_418766|uniprotkb:P15005
DIP-10199N|refseq:NP_415632|uniprotkb:P30958
DIP-10358N|refseq:NP_418659|uniprotkb:P28903
DIP-10440N|refseq:NP_289596|uniprotkb:P20082
DIP-10441N|refseq:NP_417502|uniprotkb:P20083
DIP-10441N|refseq:NP_417502|uniprotkb:P20083
DIP-10467N|refseq:NP_415423|uniprotkb:P09373
DIP-10469N|refseq:NP_418386|uniprotkb:P32674
DIP-10562N|refseq:NP_418370|uniprotkb:P17888
DIP-10582N|refseq:NP_414864|uniprotkb:P77743
DIP-10592N|refseq:NP_415819|uniprotkb:P37344

and another (file-2) which looks like this,
Code:
DIP-10331N|refseq:NP_311078|uniprotkb:P12638    DIP-10117N|refseq:NP_414973|uniprotkb:P08177
DIP-10331N|refseq:NP_311078|uniprotkb:P12638    DIP-10840N|refseq:NP_414640|uniprotkb:P10408
DIP-1025N|refseq:NP_414574|uniprotkb:P00968     DIP-10097N|refseq:NP_416170|uniprotkb:P30015
DIP-10467N|refseq:NP_415423|uniprotkb:P09373    DIP-10097N|refseq:NP_416170|uniprotkb:P30015
DIP-10117N|refseq:NP_414973|uniprotkb:P08177    DIP-10117N|refseq:NP_414973|uniprotkb:P08177
DIP-10117N|refseq:NP_414973|uniprotkb:P08177    DIP-10750N|refseq:NP_289799|uniprotkb:P02410
DIP-10117N|refseq:NP_414973|uniprotkb:P08177    DIP-10757N|refseq:NP_288150|uniprotkb:P02421

In output I want to print the contents of file-1 plus the value in either column of file-2 that has the same value as that of file-1 in the other column. Like this,
Code:
DIP-10097N|refseq:NP_416170|uniprotkb:P30015 DIP-1025N|refseq:NP_414574|uniprotkb:P00968
DIP-10097N|refseq:NP_416170|uniprotkb:P30015 DIP-10467N|refseq:NP_415423|uniprotkb:P09373
DIP-10117N|refseq:NP_414973|uniprotkb:P08177 DIP-10117N|refseq:NP_414973|uniprotkb:P08177
DIP-10117N|refseq:NP_414973|uniprotkb:P08177 DIP-10750N|refseq:NP_289799|uniprotkb:P02410
DIP-10117N|refseq:NP_414973|uniprotkb:P08177    DIP-10757N|refseq:NP_288150|uniprotkb:P02421
DIP-10117N|refseq:NP_414973|uniprotkb:P08177 DIP-10331N|refseq:NP_311078|uniprotkb:P12638
DIP-10467N|refseq:NP_415423|uniprotkb:P09373 DIP-10097N|refseq:NP_416170|uniprotkb:P30015

Any help would be highly appreciated.

Last edited by Syeda Sumayya; 11-02-2015 at 02:06 AM..
# 2  
Old 11-02-2015
Hello Syeda,

Not clear to me like do you want to compare both the fields of the files, could you please try following and let me know if this helps you.
Code:
awk 'FNR==NR{A[$1];next} ($1 in A){print $0} ($2 in A){print $0}' file1 file2

Output will be as follows.
Code:
DIP-10331N|refseq:NP_311078|uniprotkb:P12638    DIP-10117N|refseq:NP_414973|uniprotkb:P08177
DIP-1025N|refseq:NP_414574|uniprotkb:P00968     DIP-10097N|refseq:NP_416170|uniprotkb:P30015
DIP-10467N|refseq:NP_415423|uniprotkb:P09373    DIP-10097N|refseq:NP_416170|uniprotkb:P30015
DIP-10467N|refseq:NP_415423|uniprotkb:P09373    DIP-10097N|refseq:NP_416170|uniprotkb:P30015
DIP-10117N|refseq:NP_414973|uniprotkb:P08177    DIP-10117N|refseq:NP_414973|uniprotkb:P08177
DIP-10117N|refseq:NP_414973|uniprotkb:P08177    DIP-10117N|refseq:NP_414973|uniprotkb:P08177
DIP-10117N|refseq:NP_414973|uniprotkb:P08177    DIP-10750N|refseq:NP_289799|uniprotkb:P02410
DIP-10117N|refseq:NP_414973|uniprotkb:P08177    DIP-10757N|refseq:NP_288150|uniprotkb:P02421

Here I am comparing $1 of file1 to both the fields of file2.
Let us know how It goes for you.


Thanks,
R. Singh
# 3  
Old 11-02-2015
No, not really, this isn't exactly what I wanted.
There was a little mistake in the output I had shown, sorry! I've corrected it now.

I do want to compare $1 of file-1 with both columns of file-2, but in the output I want to print $1 first and then the value (same or different) that is next to it in file-2.

Last edited by Syeda Sumayya; 11-02-2015 at 02:25 AM..
# 4  
Old 11-02-2015
Try
Code:
awk '
FNR==NR         {C[$1]++
                 R[$1,C[$1]]=$2
                 C[$2]++
                 R[$2,C[$2]]=$1
                 next
                }
$1 in C         {for (i=1; i<=C[$1]; i++) print $1, R[$1,i]
                }
' file2 file1      
DIP-10097N|refseq:NP_416170|uniprotkb:P30015 DIP-1025N|refseq:NP_414574|uniprotkb:P00968
DIP-10097N|refseq:NP_416170|uniprotkb:P30015 DIP-10467N|refseq:NP_415423|uniprotkb:P09373
DIP-10117N|refseq:NP_414973|uniprotkb:P08177 DIP-10331N|refseq:NP_311078|uniprotkb:P12638
DIP-10117N|refseq:NP_414973|uniprotkb:P08177 DIP-10117N|refseq:NP_414973|uniprotkb:P08177
DIP-10117N|refseq:NP_414973|uniprotkb:P08177 DIP-10117N|refseq:NP_414973|uniprotkb:P08177
DIP-10117N|refseq:NP_414973|uniprotkb:P08177 DIP-10750N|refseq:NP_289799|uniprotkb:P02410
DIP-10117N|refseq:NP_414973|uniprotkb:P08177 DIP-10757N|refseq:NP_288150|uniprotkb:P02421
DIP-10467N|refseq:NP_415423|uniprotkb:P09373 DIP-10097N|refseq:NP_416170|uniprotkb:P30015

This User Gave Thanks to RudiC For This Post:
# 5  
Old 11-02-2015
Or Ravinder's solution slightly changed
Code:
awk 'FNR==NR{A[$1];next} ($1 in A){print $1,$2} ($2 in A){print $2,$1}' file1 file2

The order of the output is the one of file2.

Last edited by MadeInGermany; 11-02-2015 at 04:52 PM..
This User Gave Thanks to MadeInGermany For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Match value in two files and replace values in selected columns

The purpose is to check if values for column 3 and 4 in file1 match with column 1 in file2. If any value match do: 1) Replace values in file2 for column 2 and 3 using the information of file1 columns 5 and 6 2) Replace string ($1,1,5) and string ($1,6,5) in file2 with values of columns 7... (8 Replies)
Discussion started by: jiam912
8 Replies

2. Shell Programming and Scripting

Comparing two columns in two files and printing a third based on a match

Hello all, First post here. I did not notice a previous post to help me down the right path. I am looking to compare a column in a CSV file against another file (which is not a column match one for one) but more or less when a match is made, I would like to append a third column that contains a... (17 Replies)
Discussion started by: dis0wned
17 Replies

3. Shell Programming and Scripting

Comparing same column from two files, printing whole row with matching values

First I'd like to apologize if I opened a thread which is already open somewhere. I did a bit of searching but could quite find what I was looking for, so I will try to explaing what I need. I'm writing a script on our server, got to a point where I have two files with results. Example: File1... (6 Replies)
Discussion started by: mitabrev83
6 Replies

4. Shell Programming and Scripting

Common values in 2 columns in 2 files

Hello, Suppose I have these 2 tab delimited files, where the second column in first file contains matching values from first column of the second file, I would like to get an output like this: File A 1 A 2 B 3 C File B A Apple C Cinnabon B Banana I would like... (1 Reply)
Discussion started by: Mohamed EL Hadi
1 Replies

5. UNIX for Dummies Questions & Answers

Printing all the values in the middle of two columns

Hi, I have a tab delimited text file with three columns: Input: 1 25734 25737 1 32719 32724 1 59339 59342 1 59512 59513 1 621740 621745 For each row of the text file I want to print out all the values between the second and third columns, including them. The... (3 Replies)
Discussion started by: evelibertine
3 Replies

6. UNIX for Dummies Questions & Answers

Comparing two test files and printing out the values that do not match

Hi, I have two text files with matching first columns. Some of the values in the second column do not match. I want to write a script to print out the rows (only the first column) where the values in the second column do not match. Example: Input 1 A 1 B 2 C 3 D 4 Input 2 A 2 B 2... (6 Replies)
Discussion started by: evelibertine
6 Replies

7. Shell Programming and Scripting

Sum up values of columns in 4 files using shell script

I am new to shell script.I have records like below in 4 different files which have about 10000 records each, all records unique and sorted based on column 2. 1 2 3 4 5 6 --------------------------- SR|1010478|000044590|1|0|0| SR|1014759|000105790|1|0|0| SR|1016609|000108901|1|0|0|... (2 Replies)
Discussion started by: reach.sree@gmai
2 Replies

8. UNIX for Dummies Questions & Answers

Comparing two text files by a column and printing values that do not match

I have two text files where the first three columns are exactly the same. I want to compare the fourth column of the text files and if the values are different, print that row into a new output file. How do I go about doing that? File 1: 100 rs3794811 0.01 0.3434 100 rs8066551 0.01... (8 Replies)
Discussion started by: evelibertine
8 Replies

9. UNIX for Dummies Questions & Answers

Comparing the 2nd column in two different files and printing corresponding 9th columns in new file

Dear Gurus, I am very new to UNIX. I appreciate your help to manage my files. I have 16 files with equal number of columns in it. Each file has 9 columns separated by space. I need to compare the values in the second column of first file and obtain the corresponding value in the 9th column... (12 Replies)
Discussion started by: Unilearn
12 Replies

10. Shell Programming and Scripting

Columns comparision of two large size files and printing the difference

Hi Experts, My requirement is to compare the second field/column in two files, if the second column is same in both the files then compare the first field. If the first is not matching then print the first and second fields of both the files. first file (a .txt) < 1210018971FF0000,... (6 Replies)
Discussion started by: krao
6 Replies
Login or Register to Ask a Question