Retrieve lines that match any occurence in a list of patterns


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Retrieve lines that match any occurence in a list of patterns
# 1  
Old 08-29-2012
Question Retrieve lines that match any occurence in a list of patterns

I have two files. The first containing a header and six columns of data.
Example file 1:
Number SNP ID dbSNP RS ID Chromosome Result_Call Physical Position
787066 SNP_A-8575395 RS6650104 1 NOCALL 564477
786872 SNP_A-8575125 RS10458597 1 AA 564621
787077 SNP_A-8575389 RS8179414 1 NOCALL 565400
787080 SNP_A-8575376 RS9645428 1 NOCALL 566810
920528 SNP_A-8709646 RS12565286 1 AA 721290
710267 SNP_A-8497791 RS12082473 1 AA 740857
I wish to retrieve those lines where the third column (dbSNP RS ID) matches a number from file 2.

Example file 2:

rs10458597
rs12565286
rs12082473
rs3094315
rs2286139
rs11240776
In this example the first, second, and third lines from file 2 match the second, fifth, and sixth data rows in file 1.

The required output would be:

Number SNP ID dbSNP RS ID Chromosome Result_Call Physical Position
786872 SNP_A-8575125 RS10458597 1 AA 564621
920528 SNP_A-8709646 RS12565286 1 AA 721290
710267 SNP_A-8497791 RS12082473 1 AA 740857


I have found this code:
Code:
awk 'NR==FNR{a[$0]=1;next} {n=0;for(i in a){if($0~i){n=1}}} n' file1 file2


But the output contains all lines from file 1. My knowledge of awk is insufficient to see where it is going wrong. The 'trial and error' approach in altering bits of the code have been unsuccesful. Smilie

Any suggestions are highly appreciated.
# 2  
Old 08-29-2012
Code:
awk 'FNR==NR{a[toupper($0)]=1;next}(FNR!=1 && ($3 in a))||FNR==1' file2 file1

These 2 Users Gave Thanks to elixir_sinari For This Post:
# 3  
Old 08-29-2012
Quote:
Originally Posted by elixir_sinari
Code:
awk 'FNR==NR{a[toupper($0)]=1;next}(FNR!=1 && ($3 in a))||FNR==1' file2 file1

little bit shorter.... Smilie

Code:
awk 'FNR==NR{a[toupper($0)]=1;next}($3 in a)||FNR==1' file2 file1

This User Gave Thanks to pamu For This Post:
# 4  
Old 08-29-2012
Thanks elixir_sinari and pamu.
When I run the code I only get the first match. I need all matches in file 1. Any suggestions?

Thanks in advance.
# 5  
Old 08-29-2012
Quote:
Originally Posted by Selftaught
Thanks elixir_sinari and pamu.
When I run the code I only get the first match. I need all matches in file 1. Any suggestions?

Thanks in advance.
I think this is the same as you asked...

Code:
$ awk 'FNR==NR{a[toupper($0)]=1;next}($3 in a)||FNR==1' file2 file1
Number SNP ID dbSNP RS ID Chromosome Result_Call Physical Position
786872 SNP_A-8575125 RS10458597 1 AA 564621
920528 SNP_A-8709646 RS12565286 1 AA 721290
710267 SNP_A-8497791 RS12082473 1 AA 740857

# 6  
Old 08-29-2012
with the elixir_sinari and pamu code you will get all the match in file1
# 7  
Old 08-29-2012
What output do you get? Hope you have mentioned file2 first and then file1.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to print match or non-match and select fields/patterns for non-matches

In the awk below I am trying to output those lines that Match between file1 and file2, those Missing in file1, and those missing in file2. Using each $1,$2,$4,$5 value as a key to match on, that is if those 4 fields are found in both files the match, but if those 4 fields are not found then missing... (0 Replies)
Discussion started by: cmccabe
0 Replies

2. Shell Programming and Scripting

Match 2 different patterns and print the lines

Hi, i have been trying to extract multiple lines based on two different patterns as below:- file1 @jkm|kdo|aas012|192.2.3.1 blablbalablablkabblablabla sjfdsakfjladfjefhaghfagfkafagkjsghfalhfk fhajkhfadjkhfalhflaffajkgfajkghfajkhgfkf jahfjkhflkhalfdhfwearhahfl @jkm|sdf|wud08q|168.2.1.3... (8 Replies)
Discussion started by: redse171
8 Replies

3. UNIX for Advanced & Expert Users

Get the first occurence between two patterns

I have an output file which gives me the timely status of a server. Sample file: March 11 2014 21:10, 1, 2, 3, 4, 5, 6, 7, 8, 9, x, y, z... 21:05, 1, 2, 3, 4, 5, 6, 7, 8, 9, x, y, z... 21:00, 1, 2, 3, 4,... (3 Replies)
Discussion started by: rpm120
3 Replies

4. Shell Programming and Scripting

Print between patterns - first occurence, second occurence etc

I have a file # cat asasas AAAAAA 11 22 33 44 BBBBB NILNILNIL AAAAAA 22 33 44 55 66 77 88 BBBBB NILNILNIL (2 Replies)
Discussion started by: anil510
2 Replies

5. Shell Programming and Scripting

sed print between 2 patterns only last occurence

Hi, I have a file, which contains the following log data. I am trying to print fromt he file the following data: I have tried using sed, but I am getting from the first pattern Thanks for your help. (5 Replies)
Discussion started by: sol_nov
5 Replies

6. Shell Programming and Scripting

How to get line after occurence of sequence of patterns

In the past I needed a help with the problem how to search for pattern after the occurence of another pattern which is described in this thread: https://www.unix.com/shell-programmin...-pattern1.html Now I would need something quite similar, only the pattern which is to be searched must be... (3 Replies)
Discussion started by: sameucho
3 Replies

7. Shell Programming and Scripting

print lines which match multiple patterns

Hi, I have a text file as follows: 11:38:11.054 run1_rdseq avg_2-5 999988.0000 1024.0000 11:50:52.053 run3_rdrand 999988.0000 1135.0 128.0417 11:53:18.050 run4_wrrand avg_2-5 999988.0000 8180.5833 11:55:42.051 run4_wrrand avg_2-5 999988.0000 213.8333 11:55:06.053... (2 Replies)
Discussion started by: annazpereira
2 Replies

8. Shell Programming and Scripting

Removing file lines that each match to a different patterns

I have a very large file (10,000,000 lines), that contains a sample id and a property of that sample. I have another file that contains around 1,000,000 lines with sample ids that I want to remove from the original file (create a new file without these lines). I know how to do this in Perl, but it... (9 Replies)
Discussion started by: Jo_puzzled
9 Replies

9. UNIX for Dummies Questions & Answers

retrieve lines that match a pattern

Hi, I would like to know how can I get lines from a text file that match no more than 2 '>'. Example: Input file: a >cr1 4 a>b b>c a >cr2 5 a>b Output file: a >cr2 5 a>b Thanks in advance (2 Replies)
Discussion started by: fadista
2 Replies

10. Shell Programming and Scripting

sed/awk help to match list of patterns and remove from org file

Hi, From the pattern mentioned below remove lines based on pattern range. Conditions 1 Look For all lines starting with ALTER TABLE and Ending with ; and contains the word MOVE.I wanto to remove these lines from the file sample below. Note : The above pattern list could be found in... (1 Reply)
Discussion started by: rajan_san
1 Replies
Login or Register to Ask a Question