Go Back   The UNIX and Linux Forums > Top Forums > UNIX for Dummies Questions & Answers


UNIX for Dummies Questions & Answers If you're not sure where to post a UNIX or Linux question, post it here. All UNIX and Linux newbies welcome !!

Closed Thread    
 
Thread Tools Search this Thread Display Modes
    #1  
Old 02-27-2012
Registered User
 
Join Date: Dec 2011
Posts: 84
Thanks: 29
Thanked 0 Times in 0 Posts
Extract patterns from matching line and print them in separate fields

%%%%%

Last edited by lucasvs; 05-01-2012 at 05:21 AM..
Sponsored Links
    #2  
Old 02-27-2012
Registered User
 
Join Date: Apr 2009
Location: /usr/bin/vim
Posts: 946
Thanks: 13
Thanked 37 Times in 35 Posts

Code:
 -o, --only-matching
              Print only the matched (non-empty) parts of a matching line, with each such part on a separate output line.

Sponsored Links
    #3  
Old 02-27-2012
Registered User
 
Join Date: Dec 2011
Posts: 84
Thanks: 29
Thanked 0 Times in 0 Posts
Quote:
with each such part on a separate output line.
Cannot work in my case:
- I can have a variable number of matching patterns per line in bigdb.tab.
- Also file1 and file2.tab contain hundred of names.
I would not know which name from file1 would go with ones from file2.

That's why I need to separate the matching name into separate fields.
    #4  
Old 02-27-2012
balajesuri's Avatar
#! /bin/bash
 
Join Date: Apr 2009
Location: India
Posts: 1,559
Thanks: 14
Thanked 436 Times in 421 Posts
This goes through a lot of loops. Might take a lot of time if "bigdb.tab" is really big enough.


Code:
#! /bin/bash

while read x
do
    while read f1
    do
        echo $x | grep -q -w $f1
        if [ $? -eq 0 ]
        then
            y=`echo "$x" | sed "s/\(.*\)\t.*$f1.*/\1\t$f1/"`
            while read f2
            do
                echo "$x" | grep -q -w $f2
                if [ $? -eq 0 ]
                then
                    echo -e "$y\t$f2" >> output.tab
                    break
                fi
            done < file2.tab
            break
        fi
    done < file1.tab
done < bigdb.tab

Sponsored Links
    #5  
Old 02-27-2012
Registered User
 
Join Date: Dec 2011
Posts: 84
Thanks: 29
Thanked 0 Times in 0 Posts
Thanks balajesuri !

I don't know what you mean by "big".
bigdb.tab contains about 100,000 lines.
I have about 10 different kind of file1.tab and the same for file2.tab. (up to 500 lines each)

I gonna give a try anyway !
Sponsored Links
    #6  
Old 02-27-2012
Peasant's Avatar
Registered User
 
Join Date: Mar 2011
Posts: 509
Thanks: 14
Thanked 104 Times in 102 Posts
See if this awk fills your requirment :

Code:
NR==FNR {
split($3,a,","); for (i in a) vl[a[i]] = $1 FS $2 FS vl[a[i]]
}
NR!=FNR {
idx=$1
OFS="\t"
if ( idx in vl ) {
	final[vl[idx]] = final[vl[idx]] OFS idx
	}
}
END {
	for ( z in final )
	print z FS final[z]
}

Save it as program.awk and run :

Code:
awk -f program.awk bigdb.tab file1.tab file2.tab

Also, you have a typo in bigdb.tab, i belive K-ent shoud be K_ent

Hope it works for you
Regards
Peasant.
Sponsored Links
    #7  
Old 02-27-2012
Registered User
 
Join Date: Dec 2011
Posts: 84
Thanks: 29
Thanked 0 Times in 0 Posts
Thanks guys for your help !

@ balajesuri:
It doesn't work properly.
It returns only the entire first matching line, and add a 4th field with the match from file2.tab only

output.tab:

Code:
db1   0001   A_ent,B_ent,C_ent,D_ent   C_ent

---------- Post updated at 07:45 PM ---------- Previous update was at 06:46 PM ----------

@ Peasant
It almost works.
It has to return lines with matches from the 2 files or nothing (if it finds names from 1 of the 2 files only, it should return nothing)

A real example.
bigdb.tab:

Code:
db1	12665591	LFTY2_ent,SNF5_ent,SMRC1_ent,ACL6A_ent,SMRD1_ent,SMRC2_ent,ARI1A_ent,ARI1B_ent,SMRD2_ent,ENL_ent
db2,db3,db1,db4,db5,db6	7682714,16094384,15570572,16713569,15144186,10066823,9183008	LYN_ent,HCLS1_ent
db7	68465376,76987269,3877	AKT1_ent,AKT2_ent,AKT3_ent,ARAF_ent,ARNT2_ent,ARNT_ent,BRAF_ent,CBP_ent,CDC42_ent,CRKL_ent,CRK_ent,CUL2_ent,EGLN1_ent

file1.tab:

Code:
ACL6A_ent
BRAF_ent
YYYP_ent
UYTR_ent

file2.tab:

Code:
SMRD2_ent
HCLS1_ent
ETS1_ent
CUL2_ent

Wanted output.tab:

Code:
db1	12665591	ACL6A_ent	SMRD2_ent
db7	68465376,76987269,3877	BRAF_ent	CUL2_ent

Sponsored Links
Closed Thread

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
print lines between 2 matching patterns jagnikam Shell Programming and Scripting 3 08-08-2011 09:39 AM
Matching patterns kristinu Shell Programming and Scripting 4 04-03-2011 02:43 AM
AWK: matching patterns in 2 different files asanjuan Shell Programming and Scripting 5 09-14-2010 06:15 AM
Matching patterns inside of a variable xyhua Homework & Coursework Questions 0 03-15-2010 06:07 PM
removing certain paragraphs for matching patterns kaushys Shell Programming and Scripting 7 08-19-2008 03:32 PM



All times are GMT -4. The time now is 01:47 PM.