Find common entries


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Find common entries
# 1  
Old 10-31-2012
Find common entries

Hi all

I have to compare two files and find common entries

First file is like this
Code:
XVY
CVY
ZYN
MNA

In second file I have to search these entries in even number of columns

Code:
5   XVY  7  hdfj 8 CVY 9

if there is common entries then out put shuld be


Code:
5   XVY(approved)  7  hdfj 8 CVY (approved) 9

Kindly let me knw scripting regarding this.
# 2  
Old 10-31-2012
With the number of posts you have made, I would have expected that you would be able to handle something this easy by now. But, the following should do what you want:
Code:
awk 'FNR==NR{c[$1];next} {for(i=2;i<=NF;i+=2)if($i in c)$i=$i" (approved)";print}' f1 f2

# 3  
Old 11-02-2012
Hi


Thanks a lot. I checked this. but I don't knw but because of some issue it's not wrking on my another dataset

I which first file is like this

Code:
Lepirudin 
Cetuximab 
 Tyloxapol 
Trospium 
Dornase Alfa 
Denileukin diftitox 
Etanercept 
Bivalirudin 
Leuprolide 
 M-Cresol 
Adenosine Monotungstate 
Peginterferon alfa-2a 
Alteplase 
Sermorelin 
Interferon alfa-n1 
Darbepoetin alfa 
Urokinase 
Goserelin 
Reteplase 
Epoetin alfa 
Salmon Calcitonin 
Interferon alfa-n3

And, the second file

Code:
FHIT     Adenosine Monotungstate    Not Available,T2D    Ado-P-Ch2-P-Ps-Ado    Not Available,                                                                                                                                                                                                                                                                                                 
CHRM1     Trospium    Sanctura T2D    Oxyphenonium    Antrenyl T2D                                                                                                                                                                                                                                                                                                 
PDE3B     5r-6-4-2-3-Iodobenzyl-3-Oxocyclohex-1-En-1-YlAminoPhenyl-5-Methyl-4,5-Dihydropyridazin-32h-One    Not Available,T1D    Hg9a-9, Nonanoyl-N-Hydroxyethylglucamide    Not Available,                                                                                                                                                                                                                                                                                                 
HSP90AA1     9-Butyl-8-2,5-Dimethoxy-Benzyl-9h-Purin-6-Ylamine    Not Available,T2D    8-2-Chloro-3,4,5-Trimethoxy-Benzyl-2-Fluoro-9-Pent-4-Ylnyl-9h-Purin-6-Ylamine    Not Available,T2D                                                                                                                                                                                                                                                                                                 
ESR1     Chlorotrianisene    Anisene,BD    Conjugated Estrogens    Conestoral,BD                                                                                                                                                                                                                                                                                                 
INS     M-Cresol    Not Available,                                                                                                                                                                                                                                                                                                         
FAH     Acetoacetic Acid    Not Available,BD    4-Hydroxy-Methyl-Phosphinoyl-3-Oxo-Butanoic Acid    Not Available,                                                                                                                                                                                                                                                                                                 
LPL     Tyloxapol    Alevaire,                                                                                                                                                                                                                                                                                                         
ADAM17     3S-1-4-BUT-2-YN-1-YLOXYPHENYLSULFONYLPYRROLIDINE-3-THIOL    Not Available T2D    3-4-but-2-yn-1-yloxyphenylsulfonylpropane-1-thiol    Not Available T2D

And, expected output is to write "approved" if the word of firs file matches in even number of columns I think there is problem of tab separation therefore its adding

Code:
FHIT     Adenosine Monotungstate (approved)    Not Available,T2D     Ado-P-Ch2-P-Ps-Ado    Not Available,                                                                                                                                                                                                                                                                                                     
CHRM1     Trospium (approved)    Sanctura T2D    Oxyphenonium    Antrenyl T2D                                                                                                                                                                                                                                                                                                     
PDE3B      5r-6-4-2-3-Iodobenzyl-3-Oxocyclohex-1-En-1-YlAminoPhenyl-5-Methyl-4,5-Dihydropyridazin-32h-One     Not Available,T1D    Hg9a-9, Nonanoyl-N-Hydroxyethylglucamide    Not  Available,                                                                                                                                                                                                                                                                                                     
HSP90AA1     9-Butyl-8-2,5-Dimethoxy-Benzyl-9h-Purin-6-Ylamine    Not  Available,T2D     8-2-Chloro-3,4,5-Trimethoxy-Benzyl-2-Fluoro-9-Pent-4-Ylnyl-9h-Purin-6-Ylamine     Not Available,T2D                                                                                                                                                                                                                                                                                                     
ESR1     Chlorotrianisene    Anisene,BD    Conjugated Estrogens     Conestoral,BD                                                                                                                                                                                                                                                                                                     
INS     M-Cresol (approved)   Not Available,                                                                                                                                                                                                                                                                                                             
FAH     Acetoacetic Acid    Not Available,BD     4-Hydroxy-Methyl-Phosphinoyl-3-Oxo-Butanoic Acid    Not Available,                                                                                                                                                                                                                                                                                                     
LPL     Tyloxapol (approved)    Alevaire

But received out put is somethign like this. I feel there is problem to tab separation therefore only soem entries show approved and that just after first word not the whole word

Code:
FHIT Adenosine (approved) Monotungstate Not Available,T2D Ado-P-Ch2-P-Ps-Ado Not Available,  
CHRM1     Trospium    Sanctura T2D    Oxyphenonium    Antrenyl T2D                                                                                                                                                                                                                                                                                                 
PDE3B     5r-6-4-2-3-Iodobenzyl-3-Oxocyclohex-1-En-1-YlAminoPhenyl-5-Methyl-4,5-Dihydropyridazin-32h-One    Not Available,T1D    Hg9a-9, Nonanoyl-N-Hydroxyethylglucamide    Not Available,                                                                                                                                                                                                                                                                                                 
HSP90AA1     9-Butyl-8-2,5-Dimethoxy-Benzyl-9h-Purin-6-Ylamine    Not Available,T2D    8-2-Chloro-3,4,5-Trimethoxy-Benzyl-2-Fluoro-9-Pent-4-Ylnyl-9h-Purin-6-Ylamine    Not Available,T2D                                                                                                                                                                                                                                                                                                 
ESR1 Chlorotrianisene Anisene,BD Conjugated (approved) Estrogens Conestoral,BD  
INS     M-Cresol    Not Available,                                                                                                                                                                                                                                                                                                         
FAH     Acetoacetic Acid    Not Available,BD    4-Hydroxy-Methyl-Phosphinoyl-3-Oxo-Butanoic Acid    Not Available,                                                                                                                                                                                                                                                                                                 
LPL     Tyloxapol    Alevaire,                                                                                                                                                                                                                                                                                                         
ADAM17     3S-1-4-BUT-2-YN-1-YLOXYPHENYLSULFONYLPYRROLIDINE-3-THIOL    Not Available T2D    3-4-but-2-yn-1-yloxyphenylsulfonylpropane-1-thiol    Not Available

# 4  
Old 11-02-2012
If you mean that your fields are strictly tab separated with no trailing spaces in the first file, the following snippet should put you on track:

Code:
awk 'FNR==NR{a[$1]=1; next}a[$2]{$2=$2" (approved)"}{print}' FS="\t" file1 file2

# 5  
Old 11-02-2012
Hi

I checked regarding this, but output this time is just as input there is no change this time kindly guide
Code:
FHIT     Adenosine Monotungstate    Not Available,T2D    Ado-P-Ch2-P-Ps-Ado    Not Available,                                                                                                                                                                                                                                                                                                 
CHRM1     Trospium    Sanctura T2D    Oxyphenonium    Antrenyl T2D                                                                                                                                                                                                                                                                                                 
PDE3B     5r-6-4-2-3-Iodobenzyl-3-Oxocyclohex-1-En-1-YlAminoPhenyl-5-Methyl-4,5-Dihydropyridazin-32h-One    Not Available,T1D    Hg9a-9, Nonanoyl-N-Hydroxyethylglucamide    Not Available,                                                                                                                                                                                                                                                                                                 
HSP90AA1     9-Butyl-8-2,5-Dimethoxy-Benzyl-9h-Purin-6-Ylamine    Not Available,T2D    8-2-Chloro-3,4,5-Trimethoxy-Benzyl-2-Fluoro-9-Pent-4-Ylnyl-9h-Purin-6-Ylamine    Not Available,T2D                                                                                                                                                                                                                                                                                                 
ESR1     Chlorotrianisene    Anisene,BD    Conjugated Estrogens    Conestoral,BD                                                                                                                                                                                                                                                                                                 
INS     M-Cresol    Not Available,                                                                                                                                                                                                                                                                                                         
FAH     Acetoacetic Acid    Not Available,BD    4-Hydroxy-Methyl-Phosphinoyl-3-Oxo-Butanoic Acid    Not Available,                                                                                                                                                                                                                                                                                                 
LPL     Tyloxapol    Alevaire,                                                                                                                                                                                                                                                                                                         
ADAM17     3S-1-4-BUT-2-YN-1-YLOXYPHENYLSULFONYLPYRROLIDINE-3-THIOL    Not Available T2D    3-4-but-2-yn-1-yloxyphenylsulfonylpropane-1-thiol    Not Available T2D                                                                                                                                                                                                                                                                                                 
GUCY1A2     Nitric Oxide    INOmax,RA    Isosorbide Mononitrate    Conpin,                                                                                                                                                                                                                                                                                                 
B4GALT1     6-Aminohexyl-Uridine-C1,5'-Diphosphate    Not Available,                                                                                                                                                                                                                                                                                                         
LCK     4-2-Acetylamino-2-3-Carbamoyl-2-Cyclohexylmethoxy-6,7,8,9-Tetrahydro-5h-Benzocyclohepten-5ylcarbamoyl-Ethyl-2-Phosphono-Phenyl-Phosphonic Acid    Not Available,T1D    4-2-Acetylamino-2-1-3-Carbamoyl-4-Cyclohexylmethoxy-Phenyl-Ethylcarbamoyl-Ethyl-2-Phosphono-Phenoxy-Acetic Acid    Not Available,T1D                                                                                                                                                                                                                                                                                                 
GMDS     Guanosine-5'-Diphosphate-Rhamnose    Not Available,

# 6  
Old 11-02-2012
Can you attach the sample files otherwise we will enter an endless loop.

Also realized you want the "approved" stamps on *any* column matching the first file in which case take Don Cragun's solution.
# 7  
Old 11-02-2012
Hi

Please find attached sample first file and second file

I have to search for words in first file in even number columns or any column (it doesn't matter) but chances are only to find match in even number columns of second file.

Mani grover
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Find common files between two directories

I have two directories Dir 1 /home/sid/release1 Dir 2 /home/sid/release2 I want to find the common files between the two directories Dir 1 files /home/sid/release1>ls -lrt total 16 -rw-r--r-- 1 sid cool 0 Jun 19 12:53 File123 -rw-r--r-- 1 sid cool 0 Jun 19 12:53... (5 Replies)
Discussion started by: sidnow
5 Replies

2. Shell Programming and Scripting

Find common words

Hi, I have 10 files which needs to be print common words from those all files. Is there any command to find out. (2 Replies)
Discussion started by: munna_dude
2 Replies

3. Shell Programming and Scripting

Find the common values

Hi, I have two files with the below values. file1 305231921 1.0 ben/Ben_Determination_Appeals 1348791394 2.0 ben/Ben_Determination_Appeals] 1305231921 1.0 ben/Cancel_Refund_Payment_JLRS 1348791394 2.0 ben/Cancel_Refund_Payment_JLRS 1305231921 ... (2 Replies)
Discussion started by: Vikram_Tanwar12
2 Replies

4. Shell Programming and Scripting

Find common numbers and print yes or no

Hi I have 2 files with following data First file, sp|Q676U5|A16L1_HUMAN, Autophagy-related protein 16-1 OS=Homo sapiens GN=ATG16L1 PE=1 SV=2, Maximum coiled-coil residue probability: 0.657 in position 163. Maximum dimeric residue probability: 0.288 in position 163. ... (1 Reply)
Discussion started by: manigrover
1 Replies

5. Shell Programming and Scripting

find common entries and match the number with long sequence and cut that sequence in output

Hi all, I have a file like this ID 3BP5L_HUMAN Reviewed; 393 AA. AC Q7L8J4; Q96FI5; Q9BQH8; Q9C0E3; DT 05-FEB-2008, integrated into UniProtKB/Swiss-Prot. DT 05-JUL-2004, sequence version 1. DT 05-SEP-2012, entry version 71. FT COILED 59 140 ... (1 Reply)
Discussion started by: manigrover
1 Replies

6. Shell Programming and Scripting

Find common entries in 2 list and write data before it

Hi all, I have 2 files: second file I want if entries in one file will match in other file. It shuld wite approve before it so output shuld be (1 Reply)
Discussion started by: manigrover
1 Replies

7. Shell Programming and Scripting

Request to check:find out common entries

I have to compare 2 files which means 2 files with common entries in same column and separate those common entries in a diferent file as well right before those entries common so that I can separat common and Uncommon entries in rows in 2 different files. Is it possible For eg. one file ... (3 Replies)
Discussion started by: manigrover
3 Replies

8. Shell Programming and Scripting

find common data

Hey guys, I have two files. file1 and file2. file1: a,1 b,2 c,343 d,343 e,4343 f,4544 file 2: a, d, e, Now i need to find the common data between these files from file1. i.e a,1 (8 Replies)
Discussion started by: jaituteja
8 Replies

9. Shell Programming and Scripting

To find all common lines from 'n' no. of files

Hi, I have one situation. I have some 6-7 no. of files in one directory & I have to extract all the lines which exist in all these files. means I need to extract all common lines from all these files & put them in a separate file. Please help. I know it could be done with the help of... (11 Replies)
Discussion started by: The Observer
11 Replies
Login or Register to Ask a Question