awk to update field using matching value in file1 and substring in field in file2


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk to update field using matching value in file1 and substring in field in file2
# 1  
Old 06-17-2017
awk to update field using matching value in file1 and substring in field in file2

In the awk below I am trying to set/update the value of $14 in file2 in
bold, using the matching NM_ in $12 or $9 in file2
with the NM_ in $2 of file1.
The lengths of $9 and $12 can be variable but what is consistent is the start pattern
will always be NM_ and the end pattern is always ; (semi-colon). or a break (if it is the last)

What is extracted into $14 is all the text from the start to end (string between the NM_ up to the ; or
break.
The value in $7 determines the field to use,that is if $7 is exonic
then $12 is used to extract from. If $7 is not exonic then
$9 is used to extract from. There will always be a value in $7 and
exonic is there the majority of the time, but not always.
I added comments to each line as well in my attempt as to what I think is happening. I hope it is close or a start. Thank you Smilie.


awk
Code:
awk -v OFS='\t' 'NR==FNR{split($2,a,"[.]"); k=a[1]; c[k]++} {   # split $2 in file1 on the . and strore valvue in array k using array c to iterate over each line
                     for(i=1;i<=num;i++){ # start for loop in file2 on fields
                         if($7 ==  /exonic/){ # check value in $7 and if it is exonic
                        k=sub(/NM_*;/,"",$12,array[i]); # match the k array from file1 to the string starting with NM_ in $12 up to the ; and read the value into array i
                        $14=array[i]   # set $14 to array i
                                           };  # close block
                     if($7 !=  /exonic/){  # check value in $7 and if it is not exonic
                     k=sub(/NM_*;/,"",$9,array[i);  match the k array from file1 to the string starting with NM_ in $9 up to the ; and read the value into array i
                     $14=array[i] # set $14 to array i
                                        };  # close block
                                       }
                     next  # process next line
                     }1' file1 file2

file1 space delimited
Code:
ATP13A2 NM_022089.3
PPT1 NM_000310.3

file2 tab-delimited
Code:
2    chr1    17314702    17314702    C    T    exonic    ATP13A2    .    .    synonymous SNV    ATP13A2:NM_001141974:exon24:c.2658G>A:p.S886S;ATP13A2:NM_001141973:exon25:c.2775G>A:p.S925S;ATP13A2:NM_022089:exon25:c.2790G>A:p.S930S    rs3738815     .
3    chr1    40562993    40562993    T    C    UTR5    PPT1    NM_001142604:c.-83A>G;NM_000310:c.-83A>G    .    .    .    rs6600313     .

desired output tab-delimited
Code:
2    chr1    17314702    17314702    C    T    exonic    ATP13A2    .    .    synonymous SNV    ATP13A2:NM_001141974:exon24:c.2658G>A:p.S886S;ATP13A2:NM_001141973:exon25:c.2775G>A:p.S925S;ATP13A2:NM_022089:exon25:c.2790G>A:p.S930S    rs3738815     NM_022089.3:exon25:c.2790G>A:p.S930S
3    chr1    40562993    40562993    T    C    UTR5    PPT1    NM_001142604:c.-83A>G;NM_000310:c.-83A>G    .    .    .    rs6600313     NM_000310.3:c.-83A>G


Last edited by cmccabe; 06-17-2017 at 04:27 PM.. Reason: fixed format
# 2  
Old 06-18-2017
Sorry cmccabe, I missed the question in your post.

Can you please clearly post your question?
This User Gave Thanks to Neo For This Post:
# 3  
Old 06-18-2017
Sorry Neo, does the below help?

The NM_ value of $2 in file1, after splitting on the ., will match a substring NM_ in $12 (the majority of the time), or $9 (in some cases).
The substring that matches is extracted starting from the NM_ until the ; or break (if it is the last value, like in case 1 in the example).
The text in $7 of file2 determines the field to use/ extract from.... that is if $7=exonic, then use $12, but if $7 is not =exonic, then use $9.
The extracted value is used to update $14 from a . to the extracted value. Thank you very much Smilie.

Last edited by cmccabe; 06-18-2017 at 08:39 AM.. Reason: added details
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to look up values in File 2 from File 1, & printingNth field of File1 based value of File2 $2

I have two files which are the output of a multiple choice vocab test (60 separate questions) from 104 people (there are some missing responses) and the question list. I have the item list in one file (File1) Item,Stimulus,Choice1,Choice2,Choice3,Choice4,Correct... (5 Replies)
Discussion started by: samonl
5 Replies

2. Shell Programming and Scripting

awk to update field in file2 if not the same as file1

Trying to use awk to: update $2 in file2 with the $2 value in file1, if $1 in file1 matches $13 in file2, which is tab-delimeted. The $2values may already be the same so in that case nothing happens and the next line is processed. There are exactly 4,605 unique $13 values. Thank you :). ... (4 Replies)
Discussion started by: cmccabe
4 Replies

3. Shell Programming and Scripting

awk to search field2 in file2 using range of fields file1 and using match to another field in file1

I am trying to use awk to find all the $2 values in file2 which is ~30MB and tab-delimited, that are between $2 and $3 in file1 which is ~2GB and tab-delimited. I have just found out that I need to use $1 and $2 and $3 from file1 and $1 and $2of file2 must match $1 of file1 and be in the range... (6 Replies)
Discussion started by: cmccabe
6 Replies

4. Shell Programming and Scripting

Replacing first field of file2 with the second filed of file1 for matching cases

Dear All, Need your help..:D I am not regular on shell scripts..:( I have 2 files.. Content of file1 cellRef 4};"4038_2_MTNL_KALAMBOLI" cellRef 1020};"4112_3_RAINBOW_BLDG" cellRef 134};"4049_2_TATA_HOSPITAL" cellRef 1003};"4242_3_HITESH_CONSTRUCTION" cellRef... (6 Replies)
Discussion started by: ailnilanjan
6 Replies

5. Shell Programming and Scripting

Retreive the records from file2 by using the first field in file1

Hi Freinds, i have a file1 as below file1 1|ndmf|fdd|d3484|34874 2|jdehf|wru7|478|w489 3|dfkj|wej|484|49894 file2 contains lakhs of records and not in sorted order i want to retrive only the records from file2 by searcing the first field of file 1 i used grep ^1 file2... (4 Replies)
Discussion started by: i150371485
4 Replies

6. UNIX for Dummies Questions & Answers

if matching strings in file1 and file2, add column from file1 to file2

I have very limited coding skills but I'm wondering if someone could help me with this. There are many threads about matching strings in two files, but I have no idea how to add a column from one file to another based on a matching string. I'm looking to match column1 in file1 to the number... (3 Replies)
Discussion started by: pathunkathunk
3 Replies

7. Shell Programming and Scripting

using field 2 in file2 to complete field 3 in file1

Hello, I was hoping someone could help me with this work related problem... basically what I want to do is the following: file2: 1 o 2 t 4 f 5 v 7 n 8 e 10 a file1: 1 : (8 Replies)
Discussion started by: smarones
8 Replies

8. Shell Programming and Scripting

AWK: Pattern match between 2 files, then compare a field in file1 as > or < field in file2

First, thanks for the help in previous posts... couldn't have gotten where I am now without it! So here is what I have, I use AWK to match $1 and $2 as 1 string in file1 to $1 and $2 as 1 string in file2. Now I'm wondering if I can extend this AWK command to incorporate the following: If $1... (4 Replies)
Discussion started by: right_coaster
4 Replies

9. Shell Programming and Scripting

Read Field from file1 and find and replace in file2

Hi All, I have file1 line below: $myName$|xxx Now I need to read the file1 and find for $myName$ in file2 and replace with xxx file1: $myName$|xxx file2: My name is $myName$ expected output in file2 after executing the script is below: my name is xxx Thanks, (8 Replies)
Discussion started by: gdevadas
8 Replies

10. UNIX for Advanced & Expert Users

print contents of file2 for matching pattern in file1 - AWK

File1 row is same as column 2 in file 2. Also file 2 will either start with A, B or C. And 3rd column in file 2 is always F2. When column 2 of file 2 matches file1 column, print all those rows into a separate file. Here is an example. file 1: 100 103 104 108 file 2: ... (6 Replies)
Discussion started by: i.scientist
6 Replies
Login or Register to Ask a Question