Search and extract matching patterns


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Search and extract matching patterns
# 1  
Old 02-27-2012
Extract patterns from matching line and print them in separate fields

%%%%%

Last edited by lucasvs; 05-01-2012 at 06:21 AM..
# 2  
Old 02-27-2012
Code:
 -o, --only-matching
              Print only the matched (non-empty) parts of a matching line, with each such part on a separate output line.

# 3  
Old 02-27-2012
Quote:
with each such part on a separate output line.
Cannot work in my case:
- I can have a variable number of matching patterns per line in bigdb.tab.
- Also file1 and file2.tab contain hundred of names.
I would not know which name from file1 would go with ones from file2.

That's why I need to separate the matching name into separate fields.
# 4  
Old 02-27-2012
This goes through a lot of loops. Might take a lot of time if "bigdb.tab" is really big enough.

Code:
#! /bin/bash

while read x
do
    while read f1
    do
        echo $x | grep -q -w $f1
        if [ $? -eq 0 ]
        then
            y=`echo "$x" | sed "s/\(.*\)\t.*$f1.*/\1\t$f1/"`
            while read f2
            do
                echo "$x" | grep -q -w $f2
                if [ $? -eq 0 ]
                then
                    echo -e "$y\t$f2" >> output.tab
                    break
                fi
            done < file2.tab
            break
        fi
    done < file1.tab
done < bigdb.tab

# 5  
Old 02-27-2012
Thanks balajesuri !

I don't know what you mean by "big".
bigdb.tab contains about 100,000 lines.
I have about 10 different kind of file1.tab and the same for file2.tab. (up to 500 lines each)

I gonna give a try anyway !
# 6  
Old 02-27-2012
See if this awk fills your requirment :
Code:
NR==FNR {
split($3,a,","); for (i in a) vl[a[i]] = $1 FS $2 FS vl[a[i]]
}
NR!=FNR {
idx=$1
OFS="\t"
if ( idx in vl ) {
	final[vl[idx]] = final[vl[idx]] OFS idx
	}
}
END {
	for ( z in final )
	print z FS final[z]
}

Save it as program.awk and run :
Code:
awk -f program.awk bigdb.tab file1.tab file2.tab

Also, you have a typo in bigdb.tab, i belive K-ent shoud be K_ent

Hope it works for you
Regards
Peasant.
# 7  
Old 02-27-2012
Thanks guys for your help !

@ balajesuri:
It doesn't work properly.
It returns only the entire first matching line, and add a 4th field with the match from file2.tab only

output.tab:
Code:
db1   0001   A_ent,B_ent,C_ent,D_ent   C_ent

---------- Post updated at 07:45 PM ---------- Previous update was at 06:46 PM ----------

@ Peasant
It almost works.
It has to return lines with matches from the 2 files or nothing (if it finds names from 1 of the 2 files only, it should return nothing)

A real example.
bigdb.tab:
Code:
db1	12665591	LFTY2_ent,SNF5_ent,SMRC1_ent,ACL6A_ent,SMRD1_ent,SMRC2_ent,ARI1A_ent,ARI1B_ent,SMRD2_ent,ENL_ent
db2,db3,db1,db4,db5,db6	7682714,16094384,15570572,16713569,15144186,10066823,9183008	LYN_ent,HCLS1_ent
db7	68465376,76987269,3877	AKT1_ent,AKT2_ent,AKT3_ent,ARAF_ent,ARNT2_ent,ARNT_ent,BRAF_ent,CBP_ent,CDC42_ent,CRKL_ent,CRK_ent,CUL2_ent,EGLN1_ent

file1.tab:
Code:
ACL6A_ent
BRAF_ent
YYYP_ent
UYTR_ent

file2.tab:
Code:
SMRD2_ent
HCLS1_ent
ETS1_ent
CUL2_ent

Wanted output.tab:
Code:
db1	12665591	ACL6A_ent	SMRD2_ent
db7	68465376,76987269,3877	BRAF_ent	CUL2_ent

 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extended grep not matching some patterns

i have a file where the hostnames and variables are in same line in below format, am able extract some part variables while otherlike subscriptions and handler is missing. can you please correct me if grep is able to perform this ? cat /tmp/test localhost subscriptions='' handler="genie"... (14 Replies)
Discussion started by: rakeshkumar
14 Replies

2. Shell Programming and Scripting

Delete patterns matching

Delete patterns matching OS version: RHEL 7.3 Shell : Bash I have a file like below (pattern.txt). I need to delete all lines starting with the following words (words separated by comma below) and ) character. LOGGING, NOCOMPRESS, TABLESPACE , PCTFREE, INITRANS, MAXTRANS, STORAGE,... (3 Replies)
Discussion started by: John K
3 Replies

3. Shell Programming and Scripting

How can I extract XML block around matching search string?

I want to extract XML block surrounding search string Ex: print XML block for string "myapp1-ear" surrounded by "<application> .. </application>" Input XML: <?xml version="1.0" encoding="UTF-8"?> <deployment-request> <requestor> <first-name>kchinnam</first-name> ... (16 Replies)
Discussion started by: kchinnam
16 Replies

4. Shell Programming and Scripting

Finding matching patterns in two files

Hi, I have requirement to find the matching patterns of two files in Unix. One file is the log file and the other is the error list file. If any pattern in the log file matches the list of errors in the error list file, then I would need to find the counts of the match. For example, ... (5 Replies)
Discussion started by: Bobby_2000
5 Replies

5. Shell Programming and Scripting

awk extract strings matching multiple patterns

Hi, I wasn't quite sure how to title this one! Here goes: I have some already partially parsed log files, which I now need to extract info from. Because of the way they are originally and the fact they have been partially processed already, I can't make any assumptions on the number of... (8 Replies)
Discussion started by: chrissycc
8 Replies

6. UNIX for Dummies Questions & Answers

Find records with matching patterns

Hi, I need to find records with a search string from a file. Search strings are provided in a file. For eg. search_String.txt file is like below chicago mexico newark sanhose and the file from where the records need to be fetched is given below src_file:... (1 Reply)
Discussion started by: sbhuvana20
1 Replies

7. Shell Programming and Scripting

print lines between 2 matching patterns

Hi Guys, I have file like below, I want to print all lines between test1231233 to its 10 occurrence(till line 41) test1231233 qwe qwe qweq123 test1231233 qwe qwe qweq23 test1231233 qwe qwe qweq123 test1231233 qwe qwe qweq123131 (3 Replies)
Discussion started by: jagnikam
3 Replies

8. Shell Programming and Scripting

Matching patterns

I have a file name in $f. If $f has "-" at the beginning, or "=", or does not have extension ".ry" or ".xt" or ".dat" then cerr would not be empty. Tried the following but having some problems. set cerr = `echo $f | awk '/^-|=|!.ry|!.xt|!.dat/'` (4 Replies)
Discussion started by: kristinu
4 Replies

9. Shell Programming and Scripting

AWK: matching patterns in 2 different files

In a directory, there are two different file extensions (*.txt and *.xyz) having similar names of numerical strings (*). The (*.txt) contains 5000 multiple files and the (*.xyz) also contains 5000 multiple files. Each of the files has around 4000 rows and 8 columns, with several unique string... (5 Replies)
Discussion started by: asanjuan
5 Replies

10. Shell Programming and Scripting

removing certain paragraphs for matching patterns

Hi, I have a log file which might have certain paragraphs. Switch not possible Error code 1234 Process number 678 Log not available Error code 567 Process number 874 ..... ...... ...... Now I create an exception file like this. cat text.exp Error code 1234 Process number 874 (7 Replies)
Discussion started by: kaushys
7 Replies
Login or Register to Ask a Question