awk to lookup value in one file in another range


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk to lookup value in one file in another range
# 1  
Old 10-17-2016
awk to lookup value in one file in another range

I am trying to update the below awk, kindly provided by @RavinderSingh13, to update each line of file1 with either Low or No Low based on matching $2 of file1 to a range in $2 and $3 of file2. If the $2 value in file1 matches the range in file2 then that line is Low, otherwise it is No Low in the last field of the line. The awk just needs to be updated to look at each line and check weather it is Low or No Low, but I am not sure how to do that. Thank you Smilie.


file1
Code:
Match:
chr15    68521889    C    T    exonic    CLN6    GOOD    50    het    4
chr7    147183143    A    G    intronic    CNTNAP2    GOOD    382    het    22
Missing in Reference but found in IDP:
chr2    51666313    T    C    intergenic    NRXN1,NONE    GOOD    108    het    7
chr2    166903445    T    C    exonic    SCN1A    GOOD    400    het    28
Missing in IDP but found in Reference:
2    166210776    C    T    exonic    SCN2A    c.[2994C>T]+[=]    3095    23.1    24.56
7    148106478    -    GT    intronic    CNTNAP2    c.3716-5_3716-4insGT    4168    28.6    51.01

file2
Code:
chr2    50573818    50574097    NRXN1
chr7    148106400    148106550    CNTNAP2

desired output
Code:
Match:
chr15    68521889    C    T    exonic    CLN6    GOOD    50    het    4     Not Low
chr7    147183143    A    G    intronic    CNTNAP2    GOOD    382    het    22     Not Low
Missing in Reference but found in IDP:
chr2    51666313    T    C    intergenic    NRXN1,NONE    GOOD    108    het    7     Not Low
chr2    166903445    T    C    exonic    SCN1A    GOOD    400    het    28     Not Low
Missing in IDP but found in Reference:
2    166210776    C    T    exonic    SCN2A    c.[2994C>T]+[=]    3095    23.1    24.56     Not Low
7    148106478    -    GT    intronic    CNTNAP2    c.3716-5_3716-4insGT    4168    28.6    51.01     Low

awk
Code:
awk 'FNR==NR{if($0 ~ /^[[:digit:]]/ || $0 ~ /^chr/){gsub(/[[:alpha:]]/,X,$1)};A[$1]=$2;B[$1]=$0;next} {gsub(/[[:alpha:]]/,X,$1)} ($1 in A){if((A[$1]>=$2 && A[$1]<=$3) || (A[$1]<=$2 && A[$1]>=$2)){print $0 FS "low";next}} END{for(i in A){print B[i]}} ' file1 file2 > out

# 2  
Old 10-17-2016
Hello cmccabe,

Could you please try following.
Code:
awk 'FNR==NR{A[$2]=$3;next} {for(i in A){if($2>=i+0 && $2<=A[i]+0){$0=$0 " low";print;next}};printf("%s%s\n",$0,($1 ~ /Match/ || $1 ~ /Missing/)?"":" Not low")}'  Input_file2   Input_file1

EDIT: Adding a non-one liner form of solution too now.
Code:
awk 'FNR==NR{
                A[$2]=$3;
                next
            }
            {
                for(i in A){
                                if($2>=i+0 && $2<=A[i]+0){
                                                                $0=$0 " low";
                                                                print;
                                                                next
                                                         }
                           };
                printf("%s%s\n",$0,($1 ~ /Match/ || $1 ~ /Missing/)?"":" Not low")
            }
    '  Input_file2   Input_file1

Thanks,
R. Singh

Last edited by RavinderSingh13; 10-17-2016 at 04:36 PM.. Reason: Adding a non-one liner form of solution too now.
This User Gave Thanks to RavinderSingh13 For This Post:
# 3  
Old 10-18-2016
Thank you very much Smilie
# 4  
Old 10-21-2016
Just keep in mind that the code provided by RavinderSingh13 assumes that file2 does never contain 2 different ranges starting by the same number ($2), otherwise only the last range will be retain ( A[$2]=$3 ) which may lead to unwanted result.
So just make sure that this assumption is always true regarding yours cases.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk conditional operators- lookup value in 2nd file

I use conditional operators alot in AWK to print rows from large text files based on values in a certain column. For example: awk -F '\t' '{ if ($1 == "A" || $1 == "C" ) print $0}' OFS="\t" file1.txt > file2.txt In this case every row is printed from file1 to file2 for which the column 1... (5 Replies)
Discussion started by: Geneanalyst
5 Replies

2. Shell Programming and Scripting

awk to lookup stored variable in file and print matching line

The bash bash below extracts the oldest folder from a directory and stores it in filename That result will match a line in bold in input. In the matching line there is an_xxx digit in italics that (once the leading zero is removed) will match a line in link. That is the lint to print in output.... (2 Replies)
Discussion started by: cmccabe
2 Replies

3. Shell Programming and Scripting

awk to lookup section of file in a range of another file

In the below, I am trying to lookup $1 and $2 from file1, in a range search using $1 $2 $3 of file2. If the search key from file1 is found in file2, then the word low is printed in the last field of that line in the updated file1. Only the last section of file1 needs to be searched, but I am not... (6 Replies)
Discussion started by: cmccabe
6 Replies

4. Shell Programming and Scripting

awk to print field from lookup file in output

The below awk uses $3 and $4 in search as the min and max, then takes each $2 value in lookup and compares it. If the value in lookupfalls within the range in searchthen it prints the entire line in lookup/ICODE]. What I can't seem to figure out is how to print the matching $5 from search on that... (4 Replies)
Discussion started by: cmccabe
4 Replies

5. Shell Programming and Scripting

awk to update file if value within range

I have a file (sorted_unknown) with ~1400 $5 values before the - that are "unknown". What I am trying to do is use the text in $2 of (sort_targets) to update those "unknown" values in the (sorted_unknown). In $1 of (sort_targets) there are a set of numbers that can be used to update the "unknown"... (8 Replies)
Discussion started by: cmccabe
8 Replies

6. Shell Programming and Scripting

Combined sed+awk for lookup csv file

have written a combined sed+awk to perform a lookup operation which works but looking to enhance it. looking to match a record using any of the comma separated values + return selected fields from the record - including the field header. so: cat foo make,model,engine,trim,value... (6 Replies)
Discussion started by: jack.bauer
6 Replies

7. UNIX for Dummies Questions & Answers

Help with AWK - Compare a field in a file to lookup file and substitute if only a match

I have the below 2 files: 1) Third field from file1.txt should be compared to the first field of lookup.txt. 2) If match found then third field, file1.txt should be substituted with the second field from lookup.txt. 3)Else just print the line from file1.txt. File1.txt:... (4 Replies)
Discussion started by: venalla_shine
4 Replies

8. Shell Programming and Scripting

Multiple file lookup using awk

I want to lookup filea with fileb,filec and filed. If entry in filea exist in fileb and filec mark Y and then if entry in filea exist in filed mark as Y. Final output should have all the entries from filea. This prints only matching entries from file a in fileb i want all entries from... (9 Replies)
Discussion started by: pinnacle
9 Replies

9. UNIX for Advanced & Expert Users

Clueless about how to lookup and reverse lookup IP addresses under a file!!.pls help

Write a quick shell snippet to find all of the IPV4 IP addresses in any and all of the files under /var/lib/output/*, ignoring whatever else may be in those files. Perform a reverse lookup on each, and format the output neatly, like "IP=192.168.0.1, ... (0 Replies)
Discussion started by: choco4202002
0 Replies

10. Shell Programming and Scripting

file Lookup using awk

Hi All, I have two files file1 and file2(lookup file).I need to map more than one keyfields of file1 with file2.how can we achieve it using awk. file1(max 2.2 million records) -------------------------- 680720|680721|077 680720|680721|978 680721|680722|090 file2(no idea about the... (1 Reply)
Discussion started by: jerome Sukumar
1 Replies
Login or Register to Ask a Question