add a column and match two files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting add a column and match two files
# 1  
Old 08-14-2009
add a column and match two files

I have two files:
File #1:
Code:
...... 
ATOM     91 H2'' G   A   3      17.357   8.753 -30.401  1.00   0.00           A 
ATOM     92  O2' G   A   3      16.590   9.059 -28.495  1.00   0.00           A 
ATOM     93  H2' G   A   3      16.670   9.792 -27.880  1.00   0.00           A 
ATOM     94  O3' G   A   3      16.875  11.895 -29.146  1.00   0.00           A 
ATOM     95    P U   A   4      17.646  13.251 -28.975  1.00   0.00           A 
ATOM     96  O1P U   A   4      18.619  13.509 -30.118  1.00   0.00           A 
ATOM     97  O2P U   A   4      18.188  13.245 -27.547  1.00   0.00           A 
.......


File #2:
Code:
...... 
H3'   T     1.32 
C2'   T     2.01 
H2''  T     1.34 
H2'   T     1.34 
O3'   T     1.77 
P     G     2.15 
O1P   G     1.70 
O2P   G     1.70 
O5'   G     1.77 
H2''  G     1.34 
......

For File#1, I want to add a column. The content of this column is from File#2.
The procedure is, for each line in File#1,
first, search the line in File#2 satisfies: $1(in File#2) == $3(in File#1), $2(in File#2) == $4(in File#1),
e.g., for line#1 in File#1, find the line in File#2 satisfy $1==H2'' and $2==G ;
then, add $3 of File#2 to the end of the line in File#1, e.g., add 1.34 to the end of line#1 in File#1.

Thank you!

Last edited by rockytodd; 08-14-2009 at 02:38 PM..
# 2  
Old 08-14-2009
To keep the forums high quality for all users, please take the time to format your posts correctly.
  1. Use Code Tags when you post any code or data samples so others can easily read your code.
    You can easily do this by highlighting your code and then clicking on the # in the editing menu. (You can also type code tags [code] and [/code] by hand.)
  2. Avoid adding color or different fonts and font size to your posts.
    Selective use of color to highlight a single word or phrase can be useful at times, but using color, in general, makes the forums harder to read, especially bright colors like red.
  3. Be careful when you cut-and-paste, edit any odd characters and make sure all links are working property.

Thank You.

The UNIX and Linux Forums
Reply With Quote

Code:
awk 'NR==FNR{a[$1FS$2]=$3;next}{print $0,a[$3FS$4]}' file2 file1


Last edited by danmero; 08-14-2009 at 02:29 PM..
# 3  
Old 08-14-2009
From your example, that script can do what you want:
Code:
#!/bin/sh

cat /dev/null > file3
while read line1; do
  c1=$(echo ${line1} | cut -d' ' -f3-4)
  while read line2; do
    c2=$(echo ${line2} | cut -d' ' -f1-2)
    if [ "${c1}" = "${c2}" ]; then 
      echo "${line1} $(echo ${line2} | cut -d' ' -f3)" >> file3
    fi
  done < file2
done < file1

exit 0

the first command, we create an empty file3 ;
the first while loop: we read a ${line1} from file1 and we extract the third and fourth fields;
the second while loop, we read ${line2} from file2, we extract the first two fields, then we compare the two strings. If they match, we add ${line1} and the third field from file2 on file3.

EDIT: WOW Pwned with a single awk line! Smilie

Last edited by tukuyomi; 08-15-2009 at 03:07 PM..
# 4  
Old 08-14-2009
Thanks a lot, it works well, could you please explain the code a little bit?
Quote:
Originally Posted by danmero

Code:
awk 'NR==FNR{a[$1FS$2]=$3;next}{print $0,a[$3FS$4]}' file2 file1

# 5  
Old 08-14-2009
Awesome, but I can't understand either ...
From what I can read, a[$1FS$2] seems to be related to file2 (FS is a field separator)
NR and FNR are built-in awk variables
a[] is an array, but why is it refering to file2... mystery ^^ or maybe [$1FS$2] is a character class, I don't understand Smilie, still reading the man for more infos Smilie

Last edited by tukuyomi; 08-14-2009 at 03:05 PM..
# 6  
Old 08-14-2009
I'll do my best Smilie
Code:
awk '
		NR==FNR{   				# process first file(file2 as reference file)
				a[$1FS$2]=$3		# store file2 field $3 information in array a indexed by key $1FS$2 where FS is the original field separator
				next			# process next line
			}
			{				# process second file (file2)
				print $0,a[$3FS$4]	# print each line and the array a element where the key is $3FS$4 from second file(3rd and 4th field divided by FS)
			}
	' file2 file1

# 7  
Old 08-14-2009
You did great! That way of programming and using arrays is awesome! Thank you for the explanation! Smilie
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Match sum of values in each column with the corresponding column value present in trailer record

Hi All, I have a requirement where I need to find sum of values from column D through O present in a CSV file and check whether the sum of each Individual column matches with the value present for that corresponding column present in the trailer record. For example, let's assume for column D... (9 Replies)
Discussion started by: tpk
9 Replies

2. Shell Programming and Scripting

Match string in two files and add data to one file

Gents, file1 S 65733.00 19793.00 1 0 318592.8 2792489.5 29.1063000008 S 65733.00 19801.00 1 0 323120.8 2789153.6 13.3063000044 S 66009.00 19713.00 1 0 318672.7 2792538.2 30.6063000120 S 65801.00 19799.00 1 ... (2 Replies)
Discussion started by: jiam912
2 Replies

3. Shell Programming and Scripting

awk Print New Column For Every Two Lines and Match On Multiple Column Values to print another column

Hi, My input files is like this axis1 0 1 10 axis2 0 1 5 axis1 1 2 -4 axis2 2 3 -3 axis1 3 4 5 axis2 3 4 -1 axis1 4 5 -6 axis2 4 5 1 Now, these are my following tasks 1. Print a first column for every two rows that has the same value followed by a string. 2. Match on the... (3 Replies)
Discussion started by: jacobs.smith
3 Replies

4. Shell Programming and Scripting

Compare 2 files and match column data and align data from 3 column

Hello experts, Please help me in achieving this in an easier way possible. I have 2 csv files with following data: File1 08/23/2012 12:35:47,JOB_5330 08/23/2012 12:35:47,JOB_5330 08/23/2012 12:36:09,JOB_5340 08/23/2012 12:36:14,JOB_5340 08/23/2012 12:36:22,JOB_5350 08/23/2012... (5 Replies)
Discussion started by: asnandhakumar
5 Replies

5. Shell Programming and Scripting

Column content match and add suffix

My input chr3 galGal3_xenoRefFlat CDS 4178235 4178264 0.000000 + 0 gene_id "T6J4.19; T6J4_19"; transcript_id "T6J4.19; T6J4_19"; chr3 galGal3_xenoRefFlat exon 4178235 4178264 0.000000 + . gene_id "T6J4.19; T6J4_19"; transcript_id "T6J4.19;... (2 Replies)
Discussion started by: jacobs.smith
2 Replies

6. UNIX for Dummies Questions & Answers

Match values/IDs from column and text files

Hello, I am trying to modify 2 files, to yield results in a 3rd file. File-1 is a 8-columned file, separted with tab. 1234:1 xyz1234 blah blah blah blah blah blah 1234:1 xyz1233 blah blah blah blah blah blah 1234:1 abc1234 blah blah blah blah blah blah n/a RRR0000 blah blah blah... (1 Reply)
Discussion started by: ad23
1 Replies

7. UNIX for Dummies Questions & Answers

Comparing two text files by a column and printing values that do not match

I have two text files where the first three columns are exactly the same. I want to compare the fourth column of the text files and if the values are different, print that row into a new output file. How do I go about doing that? File 1: 100 rs3794811 0.01 0.3434 100 rs8066551 0.01... (8 Replies)
Discussion started by: evelibertine
8 Replies

8. Shell Programming and Scripting

Comparing two files and printing 2nd column if match found

Hi guys, I'm rather new at using UNIX based systems, and when it comes to scripting etc I'm even newer. I have two files which i need to compare. file1: (some random ID's) 451245 451288 136588 784522 file2: (random ID's + e-mail assigned to ID) 123888 xc@xc.com 451245 ... (21 Replies)
Discussion started by: spirm8
21 Replies

9. Shell Programming and Scripting

Match column 3 in file1 to column 1 in file 2 and replace with column 2 from file2

Match column 3 in file1 to column 1 in file 2 and replace with column 2 from file2 file 1 sample SNDK 80004C101 AT XLNX 983919101 BB NETL 64118B100 BS AMD 007903107 CC KLAC 482480100 DC TER 880770102 KATS ATHR 04743P108 KATS... (7 Replies)
Discussion started by: rydz00
7 Replies

10. UNIX for Dummies Questions & Answers

two files.say a and b.both have long columns.i wanna match the column fron 1st file w

ex: a file has : 122323 123456456 125656879 678989965t635 234323432 b has : this is finance no. this is phone no this is extn ajkdgag idjsidj i want the o/p as: 122323 his is finance no. 123456456 this is phone no 123456456 ... (4 Replies)
Discussion started by: TRUPTI
4 Replies
Login or Register to Ask a Question