awk match to update contents of file

09-07-2016

Registered User

1,393, 20

Join Date: Nov 2013

Last Activity: 1 May 2020, 2:35 PM EDT

Location: Chicago

Posts: 1,393

Thanks Given: 901

Thanked 20 Times in 19 Posts

Basically, I just am trying to use $1,$2, and $3 as the unique key, and use that to lookup in $2, $3, $4 in file2 , also tab-delimited. If a match is foung then it prints out the selected fields, if it does not match then it can just skip it. I apologize for the confusion and appreciate the help. Since my files are rather large I was trying to be brief, but I can see that's no help

file1

Code:

Match:
68521889    C    T
167099158    A    G
18122506    G    A

file2

Code:

....
....
.... 68521889    C    T     1   2
.... 167099158    A    G  1    2
.... 18122506    G    A    1    2

awk

Code:

awk '
BEGIN {    FS = OFS = "\t"
}
NR == 1 {
outfile = FILENAME
}
FNR == NR {
i[++ic] = $1","$2","$3 
}
{    if($2 in o)
o[$2] =  $2 OFS $3 OFS $4
}
END {    for(j = 1; j <= ic; j++)
print o[i[j]] > outfile
}' file1 file2

desired result (updated file1)

Code:

68521889    C    T     1
167099158    A    G  1
18122506    G    A    1

cmccabe

View Public Profile for cmccabe

Find all posts by cmccabe

09-07-2016

Moderator

3,105, 1,603

Join Date: May 2013

Last Activity: 31 August 2020, 1:46 AM EDT

Location: Chennai

Posts: 3,105

Thanks Given: 1,269

Thanked 1,603 Times in 1,369 Posts

Hello cmccabe,

Let's say we have following Input_files:

Code:

cat Input_file1
Match:
68521889    C    T
167099158    A    G
18122506    G    A
212121313  Q    W
234324234  t    w

cat Input_file2
.... 68521889    C    T     1   2
.... 167099158    A    G  1    2
.... 18122506    G    A    1    2

Then following is the code on same.

Code:

awk 'NR==1{out_file=FILENAME;next} FNR==NR{A[$1,$2,$3]=$0;next} (($2,$3,$4) in A){Q=Q?Q ORS A[$2,$3,$4] OFS $5:A[$2,$3,$4] OFS $5} END{print Q > out_file}' OFS="\t" Input_file1  Input_file2

Output will be stored into Input_file1 as follows(you could set field seprator to tab if you have tab delimited Input_files).

Code:

68521889    C    T	1
167099158    A    G	1
18122506    G    A	1

Thanks,
R. Singh

This User Gave Thanks to RavinderSingh13 For This Post:

RavinderSingh13

View Public Profile for RavinderSingh13

Find all posts by RavinderSingh13

09-07-2016

Registered User

1,393, 20

Join Date: Nov 2013

Last Activity: 1 May 2020, 2:35 PM EDT

Location: Chicago

Posts: 1,393

Thanks Given: 901

Thanked 20 Times in 19 Posts

I have been trying to understand how the awk @Don Cragun posted. I have modified it slightly and it is very close to working, but if a match is not found then nothing needs to happen or that line is skipped and left unchanged. Currently the lines that do not match are removed and replaced with a null value. I think the line in bold needs to have something added to it, but I am not sure what.

awk

Code:

awk '
BEGIN {FS = OFS = "\t"
}
NR == 1 {
outfile = FILENAME
}
FNR == NR {
o[i[++ic] = $1 FS $2 FS $3]     
}
{if($2 FS $4 FS $5 in o)
o[$2 FS $4 FS $5] = $2 OFS $4 OFS $5 OFS $50 OFS $51 OFS $52 OFS $53
}
END {for(j = 1; j <= ic; j++)
print o[i[j]] > outfile
}' file1 file2

Original file1 before awk was run

Code:

Match:
68521889    C    T
167099158    A    G
18122506    G    A

current output

Code:

68521889    C    T    GOOD    50    het    4
167099158    A    G    GOOD    210    hom    55
18122506    G    A    GOOD    189    het    8

desired output

Code:

Match:
68521889    C    T    GOOD    50    het    4
167099158    A    G    GOOD    210    hom    55
18122506    G    A    GOOD    189    het    8

Thank you all for your help, explanations, and patience

Last edited by cmccabe; 09-07-2016 at 05:09 PM.. Reason: added details

cmccabe

View Public Profile for cmccabe

Find all posts by cmccabe

09-07-2016

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

Quote:

Originally Posted by cmccabe

Code:

awk '
BEGIN {FS = OFS = "\t"
}
NR == 1 {
outfile = FILENAME
}
FNR == NR {
o[i[++ic] = $1 FS $2 FS $3]     
}
{if($2 FS $4 FS $5 in o)
o[$2 FS $4 FS $5] = $2 OFS $4 OFS $5 OFS $50 OFS $51 OFS $52 OFS $53
}
END {for(j = 1; j <= ic; j++)
print o[i[j]] > outfile
}' file1 file2

Original file1 before awk was run

Code:

Match:
68521889    C    T
167099158    A    G
18122506    G    A

current output

Code:

68521889    C    T    GOOD    50    het    4
167099158    A    G    GOOD    210    hom    55
18122506    G    A    GOOD    189    het    8

desired output

Code:

Match:
68521889    C    T    GOOD    50    het    4
167099158    A    G    GOOD    210    hom    55
18122506    G    A    GOOD    189    het    8

Thank you all for your help, explanations, and patience Smilie

Hi cmccable,
I would have hoped that by now you would know that if you can't decipher my code and you tell me you can't understand it, I would be happy to supply a commented version....

Yes, if you do not assign any value to o[key], lines for keys that are not found in file2 will be printed as empty lines. If you want unmatched lines to be left unchanged, then o[i[line#] = key] must be set to the original input line (i.e., $0) from file1.

With such abbreviated data from file1 and no sample file2 data, I can only make wild guesses. But, I think you're getting close. Try:

Code:

awk '
BEGIN {	FS = OFS = "\t"
}
NR == 1 {
	outfile = FILENAME
}
FNR == NR {
	o[i[++ic] = $1 OFS $2 OFS $3] = $0
	next
}
{	if($2 OFS $4 OFS $5 in o)
		o[$2 OFS $4 OFS $5] = $2 OFS $4 OFS $5 OFS $50 OFS $51 OFS $52 OFS $53
}
END {	for(j = 1; j <= ic; j++)
		print o[i[j]] > outfile
}' file1 file2

or, slightly more compactly, but with commentary added:

Code:

awk '
BEGIN {	FS = OFS = "\t"
}
NR == 1 {
	# Save the output file pathname from the pathname naming the 1st input
	# file.
	outfile = FILENAME
}
FNR == NR {
	# For each line in the 1st input file set i[line#] to the three fields
	# in the 1st input file (and keep a count of the number of lines in
	# that file (ic) that make up that key, and set o[key] to the entire
	# input line.
	o[i[++ic] = $1 OFS $2 OFS $3] = $0
	next	# This line was missing in my original script, but since there
		# was only one field in file1 and there were no empty lines in
		# file1, it didn't affect the final output.  With the new input
		# file formats, this line must be included to guarantee correct
		# results.
}
{	# Set key to the key fields from file2.  If this key is present in the
	# o[] array, set o[key] to the desired otuput for this key.
	if((key = $2 OFS $4 OFS $5) in o)
		o[key] = key OFS $50 OFS $51 OFS $52 OFS $53
}
END {	# Now that we have processed all of the data in both input files,
	# overwrite the 1st input file with the desired output replacing each
	# line in that file with its original contents (if there was no match)
	# or the collected fields from the 2nd input file (if there was a
	# match).
	for(j = 1; j <= ic; j++)
		print o[i[j]] > outfile
}' file1 file2

since we have no file2 to use as sample data, both of these are obviously untested. I know that all of the tabs I have in my code don't matter to awk, but for my sanity (and for the mental health of anyone who may try to decipher this code later), please do not remove them.

This User Gave Thanks to Don Cragun For This Post:

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

09-10-2016

Registered User

1,393, 20

Join Date: Nov 2013

Last Activity: 1 May 2020, 2:35 PM EDT

Location: Chicago

Posts: 1,393

Thanks Given: 901

Thanked 20 Times in 19 Posts

Thank you all for your help and explanations.

cmccabe

View Public Profile for cmccabe

Find all posts by cmccabe

09-13-2016

Registered User

49, 5

Join Date: Jul 2014

Last Activity: 12 May 2020, 9:02 AM EDT

Posts: 49

Thanks Given: 8

Thanked 5 Times in 5 Posts

If I assume your sample data then below code might work in this case-

Code:

awk -F '\t' 'NR==FNR{A[$1]=$0; next} $2 in A{print $0;}1' file1 file2 | uniq -c | awk '$1 != "1" {print $4"\t"$5}' > outfile

HTH!!

Regards,
Mannu

This User Gave Thanks to Mannu2525 For This Post:

Mannu2525

View Public Profile for Mannu2525

Find all posts by Mannu2525

Shell Programming and Scripting

awk match to update contents of file

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to update file based on match in 3 fields

Discussion started by: cmccabe

2. Shell Programming and Scripting

awk to update value based on pattern match in another file

Discussion started by: cmccabe

3. Shell Programming and Scripting

awk to update value in field of out file using contents of another Ask

Discussion started by: cmccabe

4. Shell Programming and Scripting

awk to update file based on partial match in field1 and exact match in field2

Discussion started by: cmccabe

5. Shell Programming and Scripting

awk to update specific value in file with match and add +1 to specific digit

Discussion started by: cmccabe

6. Shell Programming and Scripting

awk to update field in file based of match in another

Discussion started by: cmccabe

7. Shell Programming and Scripting

awk to update field file based on match

Discussion started by: cmccabe

8. Shell Programming and Scripting

[Solved] Lookup a file and match the contents

Discussion started by: forums123456

9. Shell Programming and Scripting

update file contents using shell script

Discussion started by: drams

10. Shell Programming and Scripting

How to update the contents in a file conditionally?

Discussion started by: rajus19