Unix/Linux Go Back    


Shell Programming and Scripting BSD, Linux, and UNIX shell scripting — Post awk, bash, csh, ksh, perl, php, python, sed, sh, shell scripts, and other shell scripting languages questions here.

awk to update field using matching value in file1 and substring in field in file2

Shell Programming and Scripting


Reply    
 
Thread Tools Search this Thread Display Modes
    #1  
Old Unix and Linux 1 Week Ago
cmccabe cmccabe is offline
Registered User
 
Join Date: Nov 2013
Last Activity: 27 June 2017, 8:38 AM EDT
Location: Chicago
Posts: 1,145
Thanks: 687
Thanked 15 Times in 14 Posts
awk to update field using matching value in file1 and substring in field in file2

In the awk below I am trying to set/update the value of $14 in file2 in
bold, using the matching NM_ in $12 or $9 in file2
with the NM_ in $2 of file1.
The lengths of $9 and $12 can be variable but what is consistent is the start pattern
will always be NM_ and the end pattern is always ; (semi-colon). or a break (if it is the last)

What is extracted into $14 is all the text from the start to end (string between the NM_ up to the ; or
break.
The value in $7 determines the field to use,that is if $7 is exonic
then $12 is used to extract from. If $7 is not exonic then
$9 is used to extract from. There will always be a value in $7 and
exonic is there the majority of the time, but not always.
I added comments to each line as well in my attempt as to what I think is happening. I hope it is close or a start. Thank you Linux.


awk

Code:
awk -v OFS='\t' 'NR==FNR{split($2,a,"[.]"); k=a[1]; c[k]++} {   # split $2 in file1 on the . and strore valvue in array k using array c to iterate over each line
                     for(i=1;i<=num;i++){ # start for loop in file2 on fields
                         if($7 ==  /exonic/){ # check value in $7 and if it is exonic
                        k=sub(/NM_*;/,"",$12,array[i]); # match the k array from file1 to the string starting with NM_ in $12 up to the ; and read the value into array i
                        $14=array[i]   # set $14 to array i
                                           };  # close block
                     if($7 !=  /exonic/){  # check value in $7 and if it is not exonic
                     k=sub(/NM_*;/,"",$9,array[i);  match the k array from file1 to the string starting with NM_ in $9 up to the ; and read the value into array i
                     $14=array[i] # set $14 to array i
                                        };  # close block
                                       }
                     next  # process next line
                     }1' file1 file2

file1 space delimited

Code:
ATP13A2 NM_022089.3
PPT1 NM_000310.3

file2 tab-delimited

Code:
2    chr1    17314702    17314702    C    T    exonic    ATP13A2    .    .    synonymous SNV    ATP13A2:NM_001141974:exon24:c.2658G>A:p.S886S;ATP13A2:NM_001141973:exon25:c.2775G>A:p.S925S;ATP13A2:NM_022089:exon25:c.2790G>A:p.S930S    rs3738815     .
3    chr1    40562993    40562993    T    C    UTR5    PPT1    NM_001142604:c.-83A>G;NM_000310:c.-83A>G    .    .    .    rs6600313     .

desired output tab-delimited

Code:
2    chr1    17314702    17314702    C    T    exonic    ATP13A2    .    .    synonymous SNV    ATP13A2:NM_001141974:exon24:c.2658G>A:p.S886S;ATP13A2:NM_001141973:exon25:c.2775G>A:p.S925S;ATP13A2:NM_022089:exon25:c.2790G>A:p.S930S    rs3738815     NM_022089.3:exon25:c.2790G>A:p.S930S
3    chr1    40562993    40562993    T    C    UTR5    PPT1    NM_001142604:c.-83A>G;NM_000310:c.-83A>G    .    .    .    rs6600313     NM_000310.3:c.-83A>G


Last edited by cmccabe; 1 Week Ago at 03:27 PM.. Reason: fixed format
Sponsored Links
    #2  
Old Unix and Linux 1 Week Ago
Neo's Unix or Linux Image
Neo Neo is offline Forum Staff  
Administrator
 
Join Date: Sep 2000
Last Activity: 27 June 2017, 2:42 PM EDT
Location: Asia pacific region
Posts: 13,581
Thanks: 850
Thanked 1,139 Times in 533 Posts
Sorry cmccabe, I missed the question in your post.

Can you please clearly post your question?
The Following User Says Thank You to Neo For This Useful Post:
cmccabe (1 Week Ago)
Sponsored Links
    #3  
Old Unix and Linux 1 Week Ago
cmccabe cmccabe is offline
Registered User
 
Join Date: Nov 2013
Last Activity: 27 June 2017, 8:38 AM EDT
Location: Chicago
Posts: 1,145
Thanks: 687
Thanked 15 Times in 14 Posts
Sorry Neo, does the below help?

The NM_ value of $2 in file1, after splitting on the ., will match a substring NM_ in $12 (the majority of the time), or $9 (in some cases).
The substring that matches is extracted starting from the NM_ until the ; or break (if it is the last value, like in case 1 in the example).
The text in $7 of file2 determines the field to use/ extract from.... that is if $7=exonic, then use $12, but if $7 is not =exonic, then use $9.
The extracted value is used to update $14 from a . to the extracted value. Thank you very much Linux.

Last edited by cmccabe; 1 Week Ago at 07:39 AM.. Reason: added details
Sponsored Links
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Linux More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
awk to update field in file2 if not the same as file1 cmccabe Shell Programming and Scripting 4 01-04-2017 10:08 AM
awk to search field2 in file2 using range of fields file1 and using match to another field in file1 cmccabe Shell Programming and Scripting 6 12-22-2016 09:55 AM
Replacing first field of file2 with the second filed of file1 for matching cases ailnilanjan Shell Programming and Scripting 6 11-06-2014 05:14 AM
using field 2 in file2 to complete field 3 in file1 smarones Shell Programming and Scripting 8 11-04-2011 06:33 AM
AWK: Pattern match between 2 files, then compare a field in file1 as > or < field in file2 right_coaster Shell Programming and Scripting 4 10-06-2011 06:07 PM



All times are GMT -4. The time now is 04:57 PM.