|
|||||||
| Forums | Search Forums | Register | Forum Rules | Man Pages | Albums | FAQ | Members | Calendar | Search | Today's Posts | Mark Forums Read |
| UNIX for Dummies Questions & Answers If you're not sure where to post a UNIX or Linux question, post it here. All UNIX and Linux newbies welcome !! |
|
|
|
Thread Tools | Search this Thread | Display Modes |
|
#1
|
|||
|
|||
|
Extract patterns from matching line and print them in separate fields
%%%%%
Last edited by lucasvs; 05-01-2012 at 05:21 AM.. |
| Sponsored Links | ||
|
|
#2
|
|||
|
|||
|
Code:
-o, --only-matching
Print only the matched (non-empty) parts of a matching line, with each such part on a separate output line. |
| Sponsored Links | ||
|
|
#3
|
|||
|
|||
|
Quote:
- I can have a variable number of matching patterns per line in bigdb.tab. - Also file1 and file2.tab contain hundred of names. I would not know which name from file1 would go with ones from file2. That's why I need to separate the matching name into separate fields. |
|
#4
|
||||
|
||||
|
This goes through a lot of loops. Might take a lot of time if "bigdb.tab" is really big enough. Code:
#! /bin/bash
while read x
do
while read f1
do
echo $x | grep -q -w $f1
if [ $? -eq 0 ]
then
y=`echo "$x" | sed "s/\(.*\)\t.*$f1.*/\1\t$f1/"`
while read f2
do
echo "$x" | grep -q -w $f2
if [ $? -eq 0 ]
then
echo -e "$y\t$f2" >> output.tab
break
fi
done < file2.tab
break
fi
done < file1.tab
done < bigdb.tab |
| Sponsored Links | |
|
|
#5
|
|||
|
|||
|
Thanks balajesuri !
I don't know what you mean by "big". bigdb.tab contains about 100,000 lines. I have about 10 different kind of file1.tab and the same for file2.tab. (up to 500 lines each) I gonna give a try anyway ! |
| Sponsored Links | |
|
|
#6
|
||||
|
||||
|
See if this awk fills your requirment : Code:
NR==FNR {
split($3,a,","); for (i in a) vl[a[i]] = $1 FS $2 FS vl[a[i]]
}
NR!=FNR {
idx=$1
OFS="\t"
if ( idx in vl ) {
final[vl[idx]] = final[vl[idx]] OFS idx
}
}
END {
for ( z in final )
print z FS final[z]
}Save it as program.awk and run : Code:
awk -f program.awk bigdb.tab file1.tab file2.tab Also, you have a typo in bigdb.tab, i belive K-ent shoud be K_ent Hope it works for you Regards Peasant. |
| Sponsored Links | |
|
|
#7
|
|||
|
|||
|
Thanks guys for your help ! @ balajesuri: It doesn't work properly. It returns only the entire first matching line, and add a 4th field with the match from file2.tab only output.tab: Code:
db1 0001 A_ent,B_ent,C_ent,D_ent C_ent ---------- Post updated at 07:45 PM ---------- Previous update was at 06:46 PM ---------- @ Peasant It almost works. It has to return lines with matches from the 2 files or nothing (if it finds names from 1 of the 2 files only, it should return nothing) A real example. bigdb.tab: Code:
db1 12665591 LFTY2_ent,SNF5_ent,SMRC1_ent,ACL6A_ent,SMRD1_ent,SMRC2_ent,ARI1A_ent,ARI1B_ent,SMRD2_ent,ENL_ent db2,db3,db1,db4,db5,db6 7682714,16094384,15570572,16713569,15144186,10066823,9183008 LYN_ent,HCLS1_ent db7 68465376,76987269,3877 AKT1_ent,AKT2_ent,AKT3_ent,ARAF_ent,ARNT2_ent,ARNT_ent,BRAF_ent,CBP_ent,CDC42_ent,CRKL_ent,CRK_ent,CUL2_ent,EGLN1_ent file1.tab: Code:
ACL6A_ent BRAF_ent YYYP_ent UYTR_ent file2.tab: Code:
SMRD2_ent HCLS1_ent ETS1_ent CUL2_ent Wanted output.tab: Code:
db1 12665591 ACL6A_ent SMRD2_ent db7 68465376,76987269,3877 BRAF_ent CUL2_ent |
| Sponsored Links | ||
|
![]() |
| Thread Tools | Search this Thread |
| Display Modes | |
More UNIX and Linux Forum Topics You Might Find Helpful
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| print lines between 2 matching patterns | jagnikam | Shell Programming and Scripting | 3 | 08-08-2011 09:39 AM |
| Matching patterns | kristinu | Shell Programming and Scripting | 4 | 04-03-2011 02:43 AM |
| AWK: matching patterns in 2 different files | asanjuan | Shell Programming and Scripting | 5 | 09-14-2010 06:15 AM |
| Matching patterns inside of a variable | xyhua | Homework & Coursework Questions | 0 | 03-15-2010 06:07 PM |
| removing certain paragraphs for matching patterns | kaushys | Shell Programming and Scripting | 7 | 08-19-2008 03:32 PM |
|
|