awk match to update contents of file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk match to update contents of file
# 15  
Old 09-07-2016
Basically, I just am trying to use $1,$2, and $3 as the unique key, and use that to lookup in $2, $3, $4 in file2 , also tab-delimited. If a match is foung then it prints out the selected fields, if it does not match then it can just skip it. I apologize for the confusion and appreciate the help. Since my files are rather large I was trying to be brief, but I can see that's no help Smilie

file1
Code:
Match:
68521889    C    T
167099158    A    G
18122506    G    A

file2
Code:
....
....
.... 68521889    C    T     1   2
.... 167099158    A    G  1    2
.... 18122506    G    A    1    2

awk
Code:
awk '
BEGIN {    FS = OFS = "\t"
}
NR == 1 {
outfile = FILENAME
}
FNR == NR {
i[++ic] = $1","$2","$3 
}
{    if($2 in o)
o[$2] =  $2 OFS $3 OFS $4
}
END {    for(j = 1; j <= ic; j++)
print o[i[j]] > outfile
}' file1 file2

desired result (updated file1)
Code:
68521889    C    T     1
167099158    A    G  1
18122506    G    A    1

# 16  
Old 09-07-2016
Hello cmccabe,

Let's say we have following Input_files:
Code:
cat Input_file1
Match:
68521889    C    T
167099158    A    G
18122506    G    A
212121313  Q    W
234324234  t    w

cat Input_file2
.... 68521889    C    T     1   2
.... 167099158    A    G  1    2
.... 18122506    G    A    1    2

Then following is the code on same.
Code:
awk 'NR==1{out_file=FILENAME;next} FNR==NR{A[$1,$2,$3]=$0;next} (($2,$3,$4) in A){Q=Q?Q ORS A[$2,$3,$4] OFS $5:A[$2,$3,$4] OFS $5} END{print Q > out_file}' OFS="\t" Input_file1  Input_file2

Output will be stored into Input_file1 as follows(you could set field seprator to tab if you have tab delimited Input_files).
Code:
68521889    C    T	1
167099158    A    G	1
18122506    G    A	1

Thanks,
R. Singh
This User Gave Thanks to RavinderSingh13 For This Post:
# 17  
Old 09-07-2016
I have been trying to understand how the awk @Don Cragun posted. I have modified it slightly and it is very close to working, but if a match is not found then nothing needs to happen or that line is skipped and left unchanged. Currently the lines that do not match are removed and replaced with a null value. I think the line in bold needs to have something added to it, but I am not sure what.

awk
Code:
awk '
BEGIN {FS = OFS = "\t"
}
NR == 1 {
outfile = FILENAME
}
FNR == NR {
o[i[++ic] = $1 FS $2 FS $3]     
}
{if($2 FS $4 FS $5 in o)
o[$2 FS $4 FS $5] = $2 OFS $4 OFS $5 OFS $50 OFS $51 OFS $52 OFS $53
}
END {for(j = 1; j <= ic; j++)
print o[i[j]] > outfile
}' file1 file2

Original file1 before awk was run
Code:
Match:
68521889    C    T
167099158    A    G
18122506    G    A

current output
Code:
68521889    C    T    GOOD    50    het    4
167099158    A    G    GOOD    210    hom    55
18122506    G    A    GOOD    189    het    8

desired output

Code:
Match:
68521889    C    T    GOOD    50    het    4
167099158    A    G    GOOD    210    hom    55
18122506    G    A    GOOD    189    het    8

Thank you all for your help, explanations, and patience Smilie

Last edited by cmccabe; 09-07-2016 at 05:09 PM.. Reason: added details
# 18  
Old 09-07-2016
Quote:
Originally Posted by cmccabe
I have been trying to understand how the awk @Don Cragun posted. I have modified it slightly and it is very close to working, but if a match is not found then nothing needs to happen or that line is skipped and left unchanged. Currently the lines that do not match are removed and replaced with a null value. I think the line in bold needs to have something added to it, but I am not sure what.

awk
Code:
awk '
BEGIN {FS = OFS = "\t"
}
NR == 1 {
outfile = FILENAME
}
FNR == NR {
o[i[++ic] = $1 FS $2 FS $3]     
}
{if($2 FS $4 FS $5 in o)
o[$2 FS $4 FS $5] = $2 OFS $4 OFS $5 OFS $50 OFS $51 OFS $52 OFS $53
}
END {for(j = 1; j <= ic; j++)
print o[i[j]] > outfile
}' file1 file2

Original file1 before awk was run
Code:
Match:
68521889    C    T
167099158    A    G
18122506    G    A

current output
Code:
68521889    C    T    GOOD    50    het    4
167099158    A    G    GOOD    210    hom    55
18122506    G    A    GOOD    189    het    8

desired output

Code:
Match:
68521889    C    T    GOOD    50    het    4
167099158    A    G    GOOD    210    hom    55
18122506    G    A    GOOD    189    het    8

Thank you all for your help, explanations, and patience Smilie
Hi cmccable,
I would have hoped that by now you would know that if you can't decipher my code and you tell me you can't understand it, I would be happy to supply a commented version.... Smilie

Yes, if you do not assign any value to o[key], lines for keys that are not found in file2 will be printed as empty lines. If you want unmatched lines to be left unchanged, then o[i[line#] = key] must be set to the original input line (i.e., $0) from file1.

With such abbreviated data from file1 and no sample file2 data, I can only make wild guesses. But, I think you're getting close. Try:
Code:
awk '
BEGIN {	FS = OFS = "\t"
}
NR == 1 {
	outfile = FILENAME
}
FNR == NR {
	o[i[++ic] = $1 OFS $2 OFS $3] = $0
	next
}
{	if($2 OFS $4 OFS $5 in o)
		o[$2 OFS $4 OFS $5] = $2 OFS $4 OFS $5 OFS $50 OFS $51 OFS $52 OFS $53
}
END {	for(j = 1; j <= ic; j++)
		print o[i[j]] > outfile
}' file1 file2

or, slightly more compactly, but with commentary added:
Code:
awk '
BEGIN {	FS = OFS = "\t"
}
NR == 1 {
	# Save the output file pathname from the pathname naming the 1st input
	# file.
	outfile = FILENAME
}
FNR == NR {
	# For each line in the 1st input file set i[line#] to the three fields
	# in the 1st input file (and keep a count of the number of lines in
	# that file (ic) that make up that key, and set o[key] to the entire
	# input line.
	o[i[++ic] = $1 OFS $2 OFS $3] = $0
	next	# This line was missing in my original script, but since there
		# was only one field in file1 and there were no empty lines in
		# file1, it didn't affect the final output.  With the new input
		# file formats, this line must be included to guarantee correct
		# results.
}
{	# Set key to the key fields from file2.  If this key is present in the
	# o[] array, set o[key] to the desired otuput for this key.
	if((key = $2 OFS $4 OFS $5) in o)
		o[key] = key OFS $50 OFS $51 OFS $52 OFS $53
}
END {	# Now that we have processed all of the data in both input files,
	# overwrite the 1st input file with the desired output replacing each
	# line in that file with its original contents (if there was no match)
	# or the collected fields from the 2nd input file (if there was a
	# match).
	for(j = 1; j <= ic; j++)
		print o[i[j]] > outfile
}' file1 file2

since we have no file2 to use as sample data, both of these are obviously untested. I know that all of the tabs I have in my code don't matter to awk, but for my sanity (and for the mental health of anyone who may try to decipher this code later), please do not remove them.
This User Gave Thanks to Don Cragun For This Post:
# 19  
Old 09-10-2016
Thank you all for your help and explanations. Smilie
# 20  
Old 09-13-2016
If I assume your sample data then below code might work in this case-

Code:
awk -F '\t' 'NR==FNR{A[$1]=$0; next} $2 in A{print $0;}1' file1 file2 | uniq -c | awk '$1 != "1" {print $4"\t"$5}' > outfile

HTH!!

Regards,
Mannu
This User Gave Thanks to Mannu2525 For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to update file based on match in 3 fields

Trying to use awk to store the value of $5 in file1 in array x. That array x is then used to search $4 of file1 to find aa match (I use x to skip the header in file1). Since $4 can have multiple strings in it seperated by a , (comma), I split them and iterate througn each split looking for a match.... (2 Replies)
Discussion started by: cmccabe
2 Replies

2. Shell Programming and Scripting

awk to update value based on pattern match in another file

In the awk, thanks you @RavinderSingh13, for the help in below, hopefully it is close as I am trying to update the value in $12 of the tab-delimeted file2 with the matching value in $1 of the space delimeted file1. I have added comments for each line as well. Thank you :). awk awk '$12 ==... (10 Replies)
Discussion started by: cmccabe
10 Replies

3. Shell Programming and Scripting

awk to update value in field of out file using contents of another Ask

In the out.txt below I am trying to use awk to update the contents of $9.. If $9 contains a + or - then $8 of out.txt is used as a key to lookup in $2 of file. When a match ( there will always be one) is found the $3 value of that file is used to update $9 of out.txt separated by a :. So the... (6 Replies)
Discussion started by: cmccabe
6 Replies

4. Shell Programming and Scripting

awk to update file based on partial match in field1 and exact match in field2

I am trying to create a cronjob that will run on startup that will look at a list.txt file to see if there is a later version of a database using database.txt as the source. The matching lines are written to output. $1 in database.txt will be in list.txt as a partial match. $2 of database.txt... (2 Replies)
Discussion started by: cmccabe
2 Replies

5. Shell Programming and Scripting

awk to update specific value in file with match and add +1 to specific digit

I am trying to use awk to match the NM_ in file with $1 of id which is tab-delimited. The NM_ will always be in the line of file that starts with > and be after the second _. When there is a match between each NM_ and id, then the value of $2 in id is substituted or used to update the NM_. Each NM_... (3 Replies)
Discussion started by: cmccabe
3 Replies

6. Shell Programming and Scripting

awk to update field in file based of match in another

I am trying to use awk to match two files that are tab-delimited. When a match is found between file1 $1 and file2 $4, $4 in file2 is updated using the $2 value in file1. If no match is found then the next line is processed. Thank you :). file1 uc001bwr.3 ADC uc001bws.3 ADC... (4 Replies)
Discussion started by: cmccabe
4 Replies

7. Shell Programming and Scripting

awk to update field file based on match

If $1 in file1 matches $2 in file2. Then the value in $2 of file2 is updated to $1"."$2 of file2. The awk seems to only match the two files but not update. Thank you :). awk awk 'NR==FNR{A ; next} $1 in A { $2 = a }1' file1 file2 file1 name version NM_000593 5 NM_001257406... (3 Replies)
Discussion started by: cmccabe
3 Replies

8. Shell Programming and Scripting

[Solved] Lookup a file and match the contents

Hi, I appreciate all who have been very helpful to me in providing valuable suggestions and replies. I want to write a script to look up a file and match the contents. Let me go through the scenario. Lets say i have two files Content file: abc, bcd, adh|bcdf|adh|wed bcf, cdf,... (2 Replies)
Discussion started by: forums123456
2 Replies

9. Shell Programming and Scripting

update file contents using shell script

Hi, I am having a file which contains as below Names(aaaa ,bbbb ,cccc ,dddd) now i want the file to be updated with new value 'eeee' as below Names(aaaa ,bbbb ,cccc ,dddd ,eeee) Is there a way to script this ? Thanks, (5 Replies)
Discussion started by: drams
5 Replies

10. Shell Programming and Scripting

How to update the contents in a file conditionally?

Hi All, I have a data file which has two columns Location and the Count. The file looks like this India 1 US 0 UK 2 China 0 What I have to do is whenever I fails to login to Oracle then I have to add 1 to the count for that location. Whenever my script fails to login to Oracle for a... (5 Replies)
Discussion started by: rajus19
5 Replies
Login or Register to Ask a Question