awk to update field using matching value in file1 and substring in field in file2


Login or Register to Reply

 
Thread Tools Search this Thread
# 1  
Old 06-17-2017
awk to update field using matching value in file1 and substring in field in file2

In the awk below I am trying to set/update the value of $14 in file2 in
bold, using the matching NM_ in $12 or $9 in file2
with the NM_ in $2 of file1.
The lengths of $9 and $12 can be variable but what is consistent is the start pattern
will always be NM_ and the end pattern is always ; (semi-colon). or a break (if it is the last)

What is extracted into $14 is all the text from the start to end (string between the NM_ up to the ; or
break.
The value in $7 determines the field to use,that is if $7 is exonic
then $12 is used to extract from. If $7 is not exonic then
$9 is used to extract from. There will always be a value in $7 and
exonic is there the majority of the time, but not always.
I added comments to each line as well in my attempt as to what I think is happening. I hope it is close or a start. Thank you Smilie.


awk
Code:
awk -v OFS='\t' 'NR==FNR{split($2,a,"[.]"); k=a[1]; c[k]++} {   # split $2 in file1 on the . and strore valvue in array k using array c to iterate over each line
                     for(i=1;i<=num;i++){ # start for loop in file2 on fields
                         if($7 ==  /exonic/){ # check value in $7 and if it is exonic
                        k=sub(/NM_*;/,"",$12,array[i]); # match the k array from file1 to the string starting with NM_ in $12 up to the ; and read the value into array i
                        $14=array[i]   # set $14 to array i
                                           };  # close block
                     if($7 !=  /exonic/){  # check value in $7 and if it is not exonic
                     k=sub(/NM_*;/,"",$9,array[i);  match the k array from file1 to the string starting with NM_ in $9 up to the ; and read the value into array i
                     $14=array[i] # set $14 to array i
                                        };  # close block
                                       }
                     next  # process next line
                     }1' file1 file2

file1 space delimited
Code:
ATP13A2 NM_022089.3
PPT1 NM_000310.3

file2 tab-delimited
Code:
2    chr1    17314702    17314702    C    T    exonic    ATP13A2    .    .    synonymous SNV    ATP13A2:NM_001141974:exon24:c.2658G>A:p.S886S;ATP13A2:NM_001141973:exon25:c.2775G>A:p.S925S;ATP13A2:NM_022089:exon25:c.2790G>A:p.S930S    rs3738815     .
3    chr1    40562993    40562993    T    C    UTR5    PPT1    NM_001142604:c.-83A>G;NM_000310:c.-83A>G    .    .    .    rs6600313     .

desired output tab-delimited
Code:
2    chr1    17314702    17314702    C    T    exonic    ATP13A2    .    .    synonymous SNV    ATP13A2:NM_001141974:exon24:c.2658G>A:p.S886S;ATP13A2:NM_001141973:exon25:c.2775G>A:p.S925S;ATP13A2:NM_022089:exon25:c.2790G>A:p.S930S    rs3738815     NM_022089.3:exon25:c.2790G>A:p.S930S
3    chr1    40562993    40562993    T    C    UTR5    PPT1    NM_001142604:c.-83A>G;NM_000310:c.-83A>G    .    .    .    rs6600313     NM_000310.3:c.-83A>G


Last edited by cmccabe; 06-17-2017 at 03:27 PM.. Reason: fixed format
# 2  
Old 06-18-2017
Sorry cmccabe, I missed the question in your post.

Can you please clearly post your question?
This User Gave Thanks to Neo For This Post:
cmccabe (06-18-2017)
# 3  
Old 06-18-2017
Sorry Neo, does the below help?

The NM_ value of $2 in file1, after splitting on the ., will match a substring NM_ in $12 (the majority of the time), or $9 (in some cases).
The substring that matches is extracted starting from the NM_ until the ; or break (if it is the last value, like in case 1 in the example).
The text in $7 of file2 determines the field to use/ extract from.... that is if $7=exonic, then use $12, but if $7 is not =exonic, then use $9.
The extracted value is used to update $14 from a . to the extracted value. Thank you very much Smilie.

Last edited by cmccabe; 06-18-2017 at 07:39 AM.. Reason: added details
Login or Register to Reply

|
Thread Tools Search this Thread
Search this Thread:
Advanced Search

More UNIX and Linux Forum Topics You Might Find Helpful
awk to look up values in File 2 from File 1, & printingNth field of File1 based value of File2 $2 samonl Shell Programming and Scripting 5 03-04-2018 04:37 PM
awk to update field in file2 if not the same as file1 cmccabe Shell Programming and Scripting 4 01-04-2017 10:08 AM
awk to search field2 in file2 using range of fields file1 and using match to another field in file1 cmccabe Shell Programming and Scripting 6 12-22-2016 09:55 AM
awk to update value in field based on another field cmccabe Shell Programming and Scripting 2 12-03-2016 10:52 AM
Compare file1 and file2, print matching lines in same order as file1 pathunkathunk UNIX for Dummies Questions & Answers 4 02-10-2015 01:48 AM
Replacing first field of file2 with the second filed of file1 for matching cases ailnilanjan Shell Programming and Scripting 6 11-06-2014 05:14 AM
Compare file1 for matching line in file2 and print the difference in matching lines RasB15 Shell Programming and Scripting 2 11-07-2013 09:04 AM
Pattern Matching & replacing of content in file1 with file2 siramitsharma Shell Programming and Scripting 4 03-09-2013 05:59 AM
Retreive the records from file2 by using the first field in file1 i150371485 Shell Programming and Scripting 4 10-18-2012 08:11 AM
if matching strings in file1 and file2, add column from file1 to file2 pathunkathunk UNIX for Dummies Questions & Answers 3 07-19-2012 11:02 PM
using field 2 in file2 to complete field 3 in file1 smarones Shell Programming and Scripting 8 11-04-2011 06:33 AM
AWK: Pattern match between 2 files, then compare a field in file1 as > or < field in file2 right_coaster Shell Programming and Scripting 4 10-06-2011 06:07 PM
Read Field from file1 and find and replace in file2 gdevadas Shell Programming and Scripting 8 04-07-2011 05:28 PM
print contents of file2 for matching pattern in file1 - AWK i.scientist UNIX for Advanced & Expert Users 6 09-07-2009 11:45 PM
awk/sed search lines in file1 matching columns in file2 floripoint Shell Programming and Scripting 1 12-17-2008 10:36 PM