awk conditional operators- lookup value in 2nd file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk conditional operators- lookup value in 2nd file
# 1  
Old 10-22-2018
awk conditional operators- lookup value in 2nd file

I use conditional operators alot in AWK to print rows from large text files based on values in a certain column. For example:


Code:
awk -F '\t' '{ if ($1 == "A" || $1 == "C" ) print $0}'  OFS="\t" file1.txt > file2.txt


In this case every row is printed from file1 to file2 for which the column 1 value in file1 is either A or C.


This is fine if I'm working with a few conditional values in column 1 (2 in the above example), however, if there are many values to look up, say 40, it becomes cumbersome to write out 40 conditional values. In this case it would be more efficient to place all those conditional values in a separate file say file3. For example, file 3 could look like:


File3


Code:
BY1038
BY1039
BY2222
BY30142
BY30150
BY30161
BY30209
BY30211
CTS10416
CTS10700
CTS10879


Basically I would like to print all file1 rows to file2 for which the column1 value in file1 is one of those listed in file3



File1


Code:
 BY1038    23497067    C->G
 BY1039    14032743    A->T
 BY2222    2720236    A->G
 CTS493    8122429    A->G
 CTS5035    14708870    G->C
 CTS5268    15018138    A->G
 CTS5336    17367361    T->C
 BY30211    7625503    C->A
 CTS10416    16513268    T->C
 CTS10700    16651541    G->T
 CTS10879    14409483    A->C
 CTS11569    17841608    A->G
 CTS12657    15373303    C->T
 CTS150    2913202    G->A
 CTS3585    21225818    A->C
 CTS3837    14043980    T->C
 CTS4014    7064140    G->C
 CTS4652    8891051    C->T
 CTS482    15658416    G->A


Desired output (file2) should look like:


Code:
BY1038    23497067    C->G
 BY1039    14032743    A->T
 BY2222    2720236    A->G
CTS10416    16513268    T->C
 CTS10700    16651541    G->T
 CTS10879    14409483    A->C


Thanks in advance...

------ Post updated at 09:59 AM ------

I modified a code that I was using for something else, and it seems to work fine for printing columns1 and column2 of file1 to file2, however, I need the column3 value also printed.


Code:
awk 'NR == FNR {REP[$1] = $2; next} $1 in REP {$2 = REP[$1]} 1' OFS="\t" file1 file3 > file2


Here is what I get (only 1st 2 columns are printed)




file2


Code:
BY1038
 BY1039
 BY2222
 BY30142
 BY30150
 BY30161
 BY30209
 BY30211
CTS10416    19443432
CTS10700    22712606
CTS10879    22815955

# 2  
Old 10-22-2018
Code:
awk -F '\t' 'FNR==NR {f3[$1];next} $1 in f3' OFS="\t" file3 file1.txt > file2.txt

This User Gave Thanks to vgersh99 For This Post:
# 3  
Old 10-22-2018
Sorry vgersh99, but your code doesn't produce the desired output. Only 2 rows were printed
# 4  
Old 10-22-2018
Quote:
Originally Posted by Geneanalyst
Sorry vgersh99, but your code doesn't produce the desired output. Only 2 rows were printed
please provide the output of cat -vet file1.txt using code tags.
With the properly formatted file1.txt (tab-delimited fields), I get:
Code:
awk -F '\t' 'FNR==NR {f3[$1];next} $1 in f3' OFS="\t" gene3.txt gene1.txt
BY1038  23497067        C->G
BY1039  14032743        A->T
BY2222  2720236 A->G
BY30211 7625503 C->A
CTS10416        16513268        T->C
CTS10700        16651541        G->T
CTS10879        14409483        A->C

This User Gave Thanks to vgersh99 For This Post:
# 5  
Old 10-22-2018
Sorry vgersh99, my bad. My file3 had some leading spaces which after cleaning up produced the correct output.


Thanks for the code!

------ Post updated at 11:21 AM ------

Quote:
Originally Posted by vgersh99
Code:
awk -F '\t' 'FNR==NR {f3[$1];next} $1 in f3' OFS="\t" file3 file1.txt > file2.txt


Do you mind posting an explanation of the steps inside the code
This User Gave Thanks to Geneanalyst For This Post:
# 6  
Old 10-22-2018
Code:
awk -F '\t' '
# when FNR==NR (only true for the FIRST file on cli),
# populate array f3 indexed by $1 (first field) in the file (file3)
# next - advance to the next input record/line in file3 without taking any
# following actions
 FNR==NR {f3[$1];next} 

# we get here for the file(s) following the first file on cli
# if $1 (first field) in the line/record is present in array f3,
# output the whole record in the current/f1 file
# this could be rewritten (long-hand) as $1 in f3 { print $0}
 $1 in f3' OFS="\t" file3 file1.txt

This User Gave Thanks to vgersh99 For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to format file with conditional split

In the awk below I am splitting $7 on the : (colon) then - (hyphen) as array a. The word chr is printed at the start of every $1 line. Next, $4 is split on the > (greater then) as array b. I am not sure how to account for the two other possibilities in $4 so the correct output is printed. Every... (3 Replies)
Discussion started by: cmccabe
3 Replies

2. Shell Programming and Scripting

awk to lookup stored variable in file and print matching line

The bash bash below extracts the oldest folder from a directory and stores it in filename That result will match a line in bold in input. In the matching line there is an_xxx digit in italics that (once the leading zero is removed) will match a line in link. That is the lint to print in output.... (2 Replies)
Discussion started by: cmccabe
2 Replies

3. Shell Programming and Scripting

awk to lookup value in one file in another range

I am trying to update the below awk, kindly provided by @RavinderSingh13, to update each line of file1 with either Low or No Low based on matching $2 of file1 to a range in $2 and $3 of file2. If the $2 value in file1 matches the range in file2 then that line is Low, otherwise it is No Low in the... (3 Replies)
Discussion started by: cmccabe
3 Replies

4. Shell Programming and Scripting

awk to lookup section of file in a range of another file

In the below, I am trying to lookup $1 and $2 from file1, in a range search using $1 $2 $3 of file2. If the search key from file1 is found in file2, then the word low is printed in the last field of that line in the updated file1. Only the last section of file1 needs to be searched, but I am not... (6 Replies)
Discussion started by: cmccabe
6 Replies

5. Shell Programming and Scripting

awk to print field from lookup file in output

The below awk uses $3 and $4 in search as the min and max, then takes each $2 value in lookup and compares it. If the value in lookupfalls within the range in searchthen it prints the entire line in lookup/ICODE]. What I can't seem to figure out is how to print the matching $5 from search on that... (4 Replies)
Discussion started by: cmccabe
4 Replies

6. Shell Programming and Scripting

Combined sed+awk for lookup csv file

have written a combined sed+awk to perform a lookup operation which works but looking to enhance it. looking to match a record using any of the comma separated values + return selected fields from the record - including the field header. so: cat foo make,model,engine,trim,value... (6 Replies)
Discussion started by: jack.bauer
6 Replies

7. UNIX for Dummies Questions & Answers

Help with AWK - Compare a field in a file to lookup file and substitute if only a match

I have the below 2 files: 1) Third field from file1.txt should be compared to the first field of lookup.txt. 2) If match found then third field, file1.txt should be substituted with the second field from lookup.txt. 3)Else just print the line from file1.txt. File1.txt:... (4 Replies)
Discussion started by: venalla_shine
4 Replies

8. Shell Programming and Scripting

Multiple file lookup using awk

I want to lookup filea with fileb,filec and filed. If entry in filea exist in fileb and filec mark Y and then if entry in filea exist in filed mark as Y. Final output should have all the entries from filea. This prints only matching entries from file a in fileb i want all entries from... (9 Replies)
Discussion started by: pinnacle
9 Replies

9. UNIX for Advanced & Expert Users

Clueless about how to lookup and reverse lookup IP addresses under a file!!.pls help

Write a quick shell snippet to find all of the IPV4 IP addresses in any and all of the files under /var/lib/output/*, ignoring whatever else may be in those files. Perform a reverse lookup on each, and format the output neatly, like "IP=192.168.0.1, ... (0 Replies)
Discussion started by: choco4202002
0 Replies

10. Shell Programming and Scripting

file Lookup using awk

Hi All, I have two files file1 and file2(lookup file).I need to map more than one keyfields of file1 with file2.how can we achieve it using awk. file1(max 2.2 million records) -------------------------- 680720|680721|077 680720|680721|978 680721|680722|090 file2(no idea about the... (1 Reply)
Discussion started by: jerome Sukumar
1 Replies
Login or Register to Ask a Question