Visit Our UNIX and Linux User Community


Match 2 patterns together


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Match 2 patterns together
# 1  
Old 03-03-2015
Match 2 patterns together

How can I quickly print out lines in a datafile which has presence of both patterns in a row of another file. Maybe awk can do it much faster than bash.


Patternfile

Code:
ID1 PAT11 PAT12
ID1 PAT21 PAT22
ID2 PAT31 PAT32

datafile
Code:
headerline
rgthhrhhhhhtnjttntjjtjtjtjtjtjtjjjtPAT31rf3fffffPAT32efgreggeeeeggge
fgegegPAT11.ewdwd88weded((gfefggegrg!///*...PAT12uuhhggggg
rgthhrhhhhhtnjttntjjtjtjtjtjtjtjjjtPAT41rf3fffffPAT32efgreggeeeeggge
fgegegPAT21.ewdwd88weded((gfefggegrg!///*..ttttuuu.PAT22uuhhggggg
fgegegPAT11.ewdwd88weded((gfefggegrg!///*...PAT12uuhhggggg====

The outputs must be split by the ID (col1) that the patterns belong to.


Outputs
Code:
ID1

headerline
fgegegPAT11.ewdwd88weded((gfefggegrg!///*...PAT12uuhhggggg
fgegegPAT21.ewdwd88weded((gfefggegrg!///*..ttttuuu.PAT22uuhhggggg
fgegegPAT11.ewdwd88weded((gfefggegrg!///*...PAT12uuhhggggg====


ID2

headerline
rgthhrhhhhhtnjttntjjtjtjtjtjtjtjjjtPAT31rf3fffffPAT32efgreggeeeeggge

My attempt is very slow in bash,

Code:
while read pat
do
while read data
do
if  grep -q $pat[1] $data
if  grep -q $pat[2] $data
echo $data >> $pat[0]
fi
fi
done < datafile
done < patfile

# 2  
Old 03-04-2015
Few points,

- You don't need inner loop as grep access files as parameters not string. If you want to pass string, you have to pass as STDIN.
- You seem to be using array, but it doesnt work like this.

As your pattern file is delimited by white-spaces :

Code:
while read id pat1 pat2
do
  echo $id  >> results_file # print ID
  echo >> results_file # print newline
  grep $pat1 datafile | grep $pat2 >>  results_file  # print matching lines
done < patfile

This User Gave Thanks to clx For This Post:
# 3  
Old 03-04-2015
Try
Code:
awk     'FNR==NR        {SP[$1,NR]=$2".*"$3; ID[$1]
                         next
                        }
         FNR==1         {for (i in ID) print > i
                         next
                        }
                        {for (s in SP) if ($0 ~ SP[s]) {split (s, FN, SUBSEP); print > FN[1]}
                        }
        ' patfile datafile

This User Gave Thanks to RudiC For This Post:

Previous Thread | Next Thread
Test Your Knowledge in Computers #907
Difficulty: Easy
The Unix shell command line is a sequence of ASCII text words delimited by curly braces.
True or False?

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Match output fields agains two patterns

I need to print field and the next one if field matches 'patternA' and also print 'patternB' fields. echo "some output" | awk '{for(i=1;i<=NF;i++){if($i ~ /patternA/){print $i, $(i+1)}elif($i ~ /patternB/){print $i}}}' This code returnes me 'syntax error'. Pls advise how to do properly. (2 Replies)
Discussion started by: urello
2 Replies

2. Shell Programming and Scripting

awk to print match or non-match and select fields/patterns for non-matches

In the awk below I am trying to output those lines that Match between file1 and file2, those Missing in file1, and those missing in file2. Using each $1,$2,$4,$5 value as a key to match on, that is if those 4 fields are found in both files the match, but if those 4 fields are not found then missing... (0 Replies)
Discussion started by: cmccabe
0 Replies

3. UNIX for Dummies Questions & Answers

Match patterns from another file and tag

Hi all, I have a file , which has 6 tab delimited fields, with $3 and $4 subfielded with spaces. I wamt to match cols $2,$3,$4 of tmp1 with tmp2, ..and then flag the 5th col if found. tmp1 1756 Xerm XermA XermB XermC XermD AA TT AA GG A 1 1763 Xerm XermA XermB XermC... (3 Replies)
Discussion started by: senhia83
3 Replies

4. Shell Programming and Scripting

Match 2 different patterns and print the lines

Hi, i have been trying to extract multiple lines based on two different patterns as below:- file1 @jkm|kdo|aas012|192.2.3.1 blablbalablablkabblablabla sjfdsakfjladfjefhaghfagfkafagkjsghfalhfk fhajkhfadjkhfalhflaffajkgfajkghfajkhgfkf jahfjkhflkhalfdhfwearhahfl @jkm|sdf|wud08q|168.2.1.3... (8 Replies)
Discussion started by: redse171
8 Replies

5. Shell Programming and Scripting

Retrieve lines that match any occurence in a list of patterns

I have two files. The first containing a header and six columns of data. Example file 1: Number SNP ID dbSNP RS ID Chromosome Result_Call Physical Position 787066 SNP_A-8575395 RS6650104 1 NOCALL 564477 786872 SNP_A-8575125 RS10458597 1 AA ... (13 Replies)
Discussion started by: Selftaught
13 Replies

6. Shell Programming and Scripting

Using AWK to match CSV files with duplicate patterns

Dear awk users, I am trying to use awk to match records across two moderately large CSV files. File1 is a pattern file with 173,200 lines, many of which are repeated. The order in which these lines are displayed is important, and I would like to preserve it. File2 is a data file with 456,000... (3 Replies)
Discussion started by: isuewing
3 Replies

7. Shell Programming and Scripting

Find files that do not match specific patterns

Hi all, I have been searching online to find the answer for getting a list of files that do not match certain criteria but have been unsuccessful. I have a directory that has many jpg files. What I need to do is get a list of the files that do not match both of the following patterns (I have... (21 Replies)
Discussion started by: nikos-koutax
21 Replies

8. Shell Programming and Scripting

script to match patterns in 2 different files.

I am new to shell scripting and need some help. I googled, but couldn't find a similar scenario. Basically, I need to rename a datafile. This is the scenario - I have a file, readonly.txt that has 2 columns - file# and name. I have another file,missing_files.txt that has id and name. Both the... (3 Replies)
Discussion started by: mathews
3 Replies

9. Shell Programming and Scripting

print lines which match multiple patterns

Hi, I have a text file as follows: 11:38:11.054 run1_rdseq avg_2-5 999988.0000 1024.0000 11:50:52.053 run3_rdrand 999988.0000 1135.0 128.0417 11:53:18.050 run4_wrrand avg_2-5 999988.0000 8180.5833 11:55:42.051 run4_wrrand avg_2-5 999988.0000 213.8333 11:55:06.053... (2 Replies)
Discussion started by: annazpereira
2 Replies

10. Shell Programming and Scripting

Removing file lines that each match to a different patterns

I have a very large file (10,000,000 lines), that contains a sample id and a property of that sample. I have another file that contains around 1,000,000 lines with sample ids that I want to remove from the original file (create a new file without these lines). I know how to do this in Perl, but it... (9 Replies)
Discussion started by: Jo_puzzled
9 Replies

Featured Tech Videos