awk to update file if value within range


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk to update file if value within range
# 1  
Old 03-12-2016
awk to update file if value within range

I have a file (sorted_unknown) with ~1400 $5 values before the - that are "unknown". What I am trying to do is use the text in $2 of (sort_targets) to update those "unknown" values in the (sorted_unknown).
In $1 of (sort_targets) there are a set of numbers that can be used to update the "unknown" if it is in the range of the specific$4 of (sorted_unknown). The awk below does run but it does not produce the correct output (seems to be matching the ranges incorrectly). When I use the actual files (sort_targets and sorted_unknown) the first line in the output should be SEMA4G and it is SFTPA2 as of now Thank you Smilie.

example:
sort_targets
Code:
           $1           $2
chr6:3224495-3227968 TUBB2B
chr16:89988417-90002505 TUBB3

sorted_unknown
Code:
chr16   89985657    89986630    chr16:89985657-89986630 MC1R-2270|gc=63.5
chr16   89989779    89989898    chr16:89989779-89989898 unknown-2271|gc=73.9
chr16   89998969    89999097    chr16:89998969-89999097 unknown-2272|gc=57
chr16   89999866    89999996    chr16:89999866-89999996 unknown-2273|gc=55.4
chr16   90001127    90002222    chr16:90001127-90002222 unknown-2274|gc=63.9
chr17   1173848 1174575 chr17:1173848-1174575   BHLHA9-3|gc=78.7

Desired output (unknown updated to TUBB3 because the TUBB3 because the $4 value in sorted_unown is within the range of $1 value in sort_targets).
Code:
chr16   89985657    89986630    chr16:89985657-89986630 MC1R-2270|gc=63.5
chr16   89989779    89989898    chr16:89989779-89989898 TUBB3-2271|gc=73.9
chr16   89998969    89999097    chr16:89998969-89999097 TUBB3-2272|gc=57
chr16   89999866    89999996    chr16:89999866-89999996 TUBB3-2273|gc=55.4
chr16   90001127    90002222    chr16:90001127-90002222 TUBB3-2274|gc=63.9
chr17   1173848 1174575 chr17:1173848-1174575   BHLHA9-3|gc=78.7

awk
Code:
awk -v OFS='\t' 'NR==FNR{split($1,a,/[:-]/)
                           rstart[a[1]]=a[2]
                           rend[a[1]]=a[3]
                           value[a[1]]=$2
                           next} 
     $5~/unknown/ && $2>=rstart[$1] && $3<=rend[$1]
                          {sub(/unknown/,value[$1],$5)}1' sort_targets sorted_unknown > output.bed


Last edited by cmccabe; 03-12-2016 at 01:28 PM.. Reason: added attachments of input files and details
# 2  
Old 03-12-2016
Try:
Code:
     $5~/unknown/ && $2>=rstart[$1] && $3<=rend[$1] {
                          sub(/unknown/,value[$1],$5)}1'

instead of
Code:
     $5~/unknown/ && $2>=rstart[$1] && $3<=rend[$1]
                          {sub(/unknown/,value[$1],$5)}1'

Code:
condition { 
  action }

means: if "condition", then "action"

Code:
condition 
{ action }

means: if "condition" then print record $0 (line), + always do "action" (regardless of "condition" )

Last edited by Scrutinizer; 03-12-2016 at 04:32 PM..
This User Gave Thanks to Scrutinizer For This Post:
# 3  
Old 03-12-2016
Thank you, I will give it a try Monday Smilie.
# 4  
Old 03-14-2016
The original output is attached using the unchanged command and output2 uses the modified command. The modified command seems to have more unknowns in it. Thank you Smilie.
# 5  
Old 03-14-2016
No surprise. Scrutinizer's modified version ONLY substitutes if ALL conditions are met, which seemingly rarely is the case. The unmodified version prints the original line if it fits, and then prints EVERY line with unknown replaced.
On top, either solution only replaces "unknown" with the last entry from sort_targets.txt that fits $1, as for repeating $1 contents, the value is always overwritten and just the last one prevails.
# 6  
Old 03-14-2016
what would you recommend as I am struggling to fix it... thank you Smilie.
# 7  
Old 03-14-2016
The curly brace was only the most obvious flaw...

Adaptation of your script, try:


Code:
awk -v OFS='\t' '
  NR==FNR {
    split($1,a,/[:-]/)
    i=a[1]
    C[i]++
    rstart[i,C[i]]=a[2]
    rend[i,C[i]]=a[3]
    value[i,C[i]]=$2
    next
  } 
  
  $5~/unknown/ && $1 in C {
    for(j=1; j<=C[$1]; j++) {
      if( $2>=rstart[$1,j] && $3<=rend[$1,j]) {
        sub(/unknown/,value[$1,j],$5)
        break
      }
    }
  }
  1
' sort_targets sorted_unknown > output.bed


Last edited by Scrutinizer; 03-14-2016 at 04:48 PM.. Reason: Efficiency improvement
This User Gave Thanks to Scrutinizer For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

awk to update file with partial matching line in another file and append text

In the awk below I am trying to cp and paste each matching line in f2 to $3 in f1 if $2 of f1 is in the line in f2 somewhere. There will always be a match (usually more then 1) and my actual data is much larger (several hundreds of lines) in both f1 and f2. When the line in f2 is pasted to $3 in... (4 Replies)
Discussion started by: cmccabe
4 Replies

2. UNIX for Beginners Questions & Answers

How to sum value of a column by range defined in another file awk?

I have two files, file1.table is the count table, and the other is the range condition file2.range. file1.table chr start end count N1 0 48 1 N1 48 181 2 N1 181 193 0 N1 193 326 2 N1 326 457 0 N1 457 471 1 N1 471 590 2 N1 590 604 1 N1 604 752 1 N1 752 875 1 file2.range... (12 Replies)
Discussion started by: yifangt
12 Replies

3. Shell Programming and Scripting

awk to update file with sum of matching fields in another file

In the awk below I am trying to add a penalty to a score to each matching $1 in file2 based on the sum of $3+$4 (variable TL) from file1. Then the $4 value in file1 is divided by TL and multiplied by 100 (this valvue is variable S). Finally, $2 in file2 - S gives the updated $2 result in file2.... (2 Replies)
Discussion started by: cmccabe
2 Replies

4. Shell Programming and Scripting

awk to print out lines that do not fall between range in file

In the awk below I am trying to print out those lines in file2 that are no between $2 and $3 in file1. Both files are tab-delimeted and I think it's close but currently it is printeing out the matches. The --- are not part of the files they are just to show what lines match or fall into the range... (6 Replies)
Discussion started by: cmccabe
6 Replies

5. Shell Programming and Scripting

awk to update value in field of out file using contents of another Ask

In the out.txt below I am trying to use awk to update the contents of $9.. If $9 contains a + or - then $8 of out.txt is used as a key to lookup in $2 of file. When a match ( there will always be one) is found the $3 value of that file is used to update $9 of out.txt separated by a :. So the... (6 Replies)
Discussion started by: cmccabe
6 Replies

6. Shell Programming and Scripting

awk to update unknown value in file using range of another

I am trying to use awk to update all the unknown values in $6 of file2, if the $4 value in file 2 is within the range of $1 of file1. If there is already a value in $6 other then unknown, it is skipped and the next line is processed. In my awk attempt below the final output is 6 tab-delimited... (6 Replies)
Discussion started by: cmccabe
6 Replies

7. Shell Programming and Scripting

awk to filter file using range in another file

I have a very large tab-delimited, ~2GB file2 that I am trying to filter using $2 of file1. If $2 of file1 is in the range of $2 and $3 in file1 then the entire line of file2 is outputed. If the range match is not found then that line is skipped. The awk below does run but no output results. ... (3 Replies)
Discussion started by: cmccabe
3 Replies

8. Shell Programming and Scripting

awk to lookup value in one file in another range

I am trying to update the below awk, kindly provided by @RavinderSingh13, to update each line of file1 with either Low or No Low based on matching $2 of file1 to a range in $2 and $3 of file2. If the $2 value in file1 matches the range in file2 then that line is Low, otherwise it is No Low in the... (3 Replies)
Discussion started by: cmccabe
3 Replies

9. Shell Programming and Scripting

awk to lookup section of file in a range of another file

In the below, I am trying to lookup $1 and $2 from file1, in a range search using $1 $2 $3 of file2. If the search key from file1 is found in file2, then the word low is printed in the last field of that line in the updated file1. Only the last section of file1 needs to be searched, but I am not... (6 Replies)
Discussion started by: cmccabe
6 Replies

10. Shell Programming and Scripting

awk match to update contents of file

I am trying to match $1 in file1 with $2 in file2. If a match is found then $3 and $4 of file2 are copied to file1. Both files are tab-delimeted and I am getting a syntax error and would also like to update file1 in-place without creating a new file, but am not sure how. Thank you :). file1 ... (19 Replies)
Discussion started by: cmccabe
19 Replies
Login or Register to Ask a Question