File comparing and appending based on fields


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting File comparing and appending based on fields
# 1  
Old 05-11-2014
File comparing and appending based on fields

I want to compare 2 files, locus_file.txt is a very large file and attr.txt is a small file. I want to match the first 2 columns of the first file to the second column of attr.txt and print the attributes together.

locus_file.txt:large file
Code:
LOC_Os02g47020, LOC_Os03g57840,0.88725114
LOC_Os02g47020, LOC_Os07g36080,0.94455624
LOC_Os02g47020, LOC_Os03g02590,0.81881344

attr.txt: attribute file
Code:
blue LOC_Os02g47020
red  LOC_Os02g40830
blue LOC_Os07g36080
yellow LOC_Os03g57840
red LOC_Os03g02590

Desired output:

Code:
LOC_Os02g47020, LOC_Os03g57840,0.88725114,blue, yellow
LOC_Os02g47020, LOC_Os07g36080,0.94455624,blue, blue
LOC_Os02g47020, LOC_Os03g02590,0.81881344,blue, red

Note that: for example, In the first line of desired output, the 4th column has the color of LOC_Os02g47020 from attr.txt and the 5th column has the color of LOC_Os03g57840 from attr.txt

Last edited by Sanchari; 05-11-2014 at 01:13 PM..
# 2  
Old 05-11-2014
Although I strongly encourage you to stop putting spaces in filenames (and although in one place you said your filenames start with a lowercase f and in one place you said your filenames start with an uppercase F, the following seems to do what you want (assuming the filenames start with an uppercase F):
Code:
awk -F ', *' '
FNR == NR {
	c[$2] = $1
	next
}
{	printf("%s,%s, %s\n", $0, c[$1], c[$2])
}' "File 2" "File 1"

If you want to use this on a Solaris/SunOS system, change awk to/usr/xpg4/bin/awk, /usr/xpg6/bin/awk, or nawk.
# 3  
Old 05-11-2014
Actually File2 is getting printed and not File1 with this code. Its my mistake that I did not mention the file names properly..have corrected it now. I want the final result to be appended to locus_file.txt. Thanks
# 4  
Old 05-11-2014
I don't know what you mean. Adjusting for your new filenames, the code I suggested becomes:
Code:
awk -F ', *' '
FNR == NR {
	c[$2] = $1
	next
}
{	printf("%s,%s, %s\n", $0, c[$1], c[$2])
}' attr.txt locus_file.txt

and produces the output:
Code:
LOC_Os02g47020, LOC_Os03g57840,0.88725114,blue, yellow
LOC_Os02g47020, LOC_Os07g36080,0.94455624,blue, blue
LOC_Os02g47020, LOC_Os03g02590,0.81881344,blue, red

which is exactly what you said you wanted.

If you want this output appended to locus_file.txt (which seems very weird), then change the script to:
Code:
awk -F ', *' '
FNR == NR {
	c[$2] = $1
	next
}
{	printf("%s,%s, %s\n", $0, c[$1], c[$2])
}' attr.txt locus_file.txt > locus_file$$.txt && \
cat locus_file$$.txt >> locus_file.txt && rm locus_file$$.txt

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Splitting the file based on two fields - Fixed length file

Hi , I am having a scenario where I need to split the file based on two field values. The file is a fixed length file. ex: AA0998703000000000000190510095350019500010005101980301 K 0998703000000000000190510095351019500020005101480 ... (4 Replies)
Discussion started by: saj
4 Replies

2. Shell Programming and Scripting

awk to update file based on match in 3 fields

Trying to use awk to store the value of $5 in file1 in array x. That array x is then used to search $4 of file1 to find aa match (I use x to skip the header in file1). Since $4 can have multiple strings in it seperated by a , (comma), I split them and iterate througn each split looking for a match.... (2 Replies)
Discussion started by: cmccabe
2 Replies

3. Shell Programming and Scripting

awk sort based on difference of fields and print all fields

Hi I have a file as below <field1> <field2> <field3> ... <field_num1> <field_num2> Trying to sort based on difference of <field_num1> and <field_num2> in desceding order and print all fields. I tried this and it doesn't sort on the difference field .. Appreciate your help. cat... (9 Replies)
Discussion started by: newstart
9 Replies

4. Shell Programming and Scripting

Join fields in a same file based on condition

I have an input file like this... All iI want to do is If the lines are identical except for the last field i want to merge them into single line input_file I feel something is nothing I feel something is everything apple mango banana apple mango grapes I want to get output like this:... (3 Replies)
Discussion started by: raj_k
3 Replies

5. Shell Programming and Scripting

Appending information from 2nd file into 1st based on intervals

Hi, I am trying to gather information from the second file and append it to the first file. input HWUSI-EAS000_29:1:100:10000:11479#0/1 + chr5 14458050 ATTGGCTGAGGTCCTACTAGTTGTGATGTGTAAGTGT HHHHHHGDGGEDGGGDGCGEDDEFFFAGE 0 second file:... (14 Replies)
Discussion started by: Diya123
14 Replies

6. Shell Programming and Scripting

Join fields comparing 4 fields using awk

Hi All, I am looking for an awk script to do the following Join the fields together only if the first 4 fields are same. Can it be done with join function in awk?? a,b,c,d,8,,, a,b,c,d,,7,, a,b,c,d,,,9, a,b,p,e,8,,, a.b,p,e,,9,, a,b,p,z,,,,9 a,b,p,z,,8,, desired output: ... (1 Reply)
Discussion started by: aksijain
1 Replies

7. Shell Programming and Scripting

substitution of fields based on header from another file

I have two files File1 with position (chromosome:start) and individuals as header, where pos is something like 1:2000 and every individual has a value. I have many columns and rows, here an example (tab separated): pos ind1 ind2 ind3 indn... 1:2000 0 0.1 0.1 1 1:2500 0.99 0.2 0.1 0.2 2:1000... (2 Replies)
Discussion started by: kuin
2 Replies

8. Shell Programming and Scripting

Comparing two csv file fields using awk script

Hi All, I want to remove the rows from File1.csv by comparing the columns/fields in the File2.csv. I only need the records whose first column is same and the second column is different for the same record in both files.Here is an example on what I need. File1.csv: RAJAK|ACTIVE|1... (2 Replies)
Discussion started by: rajak.net
2 Replies

9. Shell Programming and Scripting

Comparing fields in 1 file to another file

Need help with doing field comparisons. File1 204.11.23.1 fastmovie.mp4 209.13.1.1 slowmovie.mp4 file 2 NY USA 201.1.1.1 200 freemovie.mp4 CA USA 204.11.23.1 404 notfastmovie.mp4 CA USA 204.11.23.1 200 fastmovie.mp4 basically need to take the first file and find exact matches in... (10 Replies)
Discussion started by: satcon25
10 Replies

10. Shell Programming and Scripting

How to keep appending a newly created file based on some keywords

Hi Friends, I have to create a new log file everyday and append it with content based on some keywords found in another log file. Here is what I have tried so far... grep Error /parentfolder/someLogFile.log >> /parentfolder /Archive/"testlogfile_error_`date '+%d%m%y'`.txt" grep error... (6 Replies)
Discussion started by: supreet
6 Replies
Login or Register to Ask a Question