Compare two sample files and find common


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Compare two sample files and find common
# 8  
Old 11-05-2012
Quote:
Originally Posted by manigrover
Thanks a lot. Can we convert file separated by spaces into column separated file first(second file)
Spacing seems to be the only problem in second file as I tried several options because of spacing nothing is working.
Using my above script it also works if you have spaces in first files.
So don't need to find any workaround for this.
I assume this is the same you want.Smilie
# 9  
Old 11-05-2012
Hi Pamu

actually in my system there is still same outputSmilie

my output is just same as my second file.

Code:
FHIT Adenosine Monotungstate Not Available,T2D Ado-P-Ch2-P-Ps-Ado Not Available,
CHRM1 Trospium Sanctura T2D Oxyphenonium Antrenyl T2D
PDE3B 5r-6-4-2-3-Iodobenzyl-3-Oxocyclohex-1-En-1-YlAminoPhenyl-5-Methyl-4,5-Dihydropyridazin-32h-One Not Available,T1D Hg9a-9, Nonanoyl-N-Hydroxyethylglucamide Not Available,
HSP90AA19-Butyl-8-2,5-Dimethoxy-Benzyl-9h-Purin-6-Ylamine Not Available,T2D 8-2-Chloro-3,4,5-Trimethoxy-Benzyl-2-Fluoro-9-Pent-4-Ylnyl-9h-Purin-6-Ylamine Not Available,T2D
ESR1 Chlorotrianisene Anisene,BD Conjugated Estrogens Conestoral,BD
INS M-Cresol Not Available,
FAH Acetoacetic Acid Not Available,BD 4-Hydroxy-Methyl-Phosphinoyl-3-Oxo-Butanoic Acid Not Available,
LPL Tyloxapol Alevaire,
ADAM17 3S-1-4-BUT-2-YN-1-YLOXYPHENYLSULFONYLPYRROLIDINE-3-THIOL Not Available T2D 3-4-but-2-yn-1-yloxyphenylsulfonylpropane-1-thiol Not Available T2D
GUCY1A2 Nitric Oxide INOmax,RA Isosorbide Mononitrate Conpin,
B4GALT1 6-Aminohexyl-Uridine-C1,5'-Diphosphate Not Available,
LCK 4-2-Acetylamino-2-3-Carbamoyl-2-Cyclohexylmethoxy-6,7,8,9-Tetrahydro-5h-Benzocyclohepten-5ylcarbamoyl-Ethyl-2-Phosphono-Phenyl-Phosphonic Acid Not Available,T1D 4-2-Acetylamino-2-1-3-Carbamoyl-4-Cyclohexylmethoxy-Phenyl-Ethylcarbamoyl-Ethyl-2-Phosphono-Phenoxy-Acetic Acid Not Available,T1D
GMDS Guanosine-5'-Diphosphate-Rhamnose Not Available,
LCT D-Gluconhydroximo-1,5-Lactam Not Available T2D Gluconolactone Not Available T2D
CALM1 3''-Beta-Chloroethyl-2'',4''-Dioxo-3, 5''-Spiro-Oxazolidino-4-Deacetoxy-Vinblastine Not Available T2D Prenylamine Bismethin,
RET 4-BROMO-2-FLUORO-N-4E-6-METHOXY-7-1-METHYLPIPERIDIN-4-YLMETHOXYQUINAZOLIN-41H-YLIDENEANILINE Not Available,
CYP1A2 2-PHENYL-4H-BENZOHCHROMEN-4-ONE Not Available,
PPARA Clofibrate Amotril,CD Gemfibrozil Bolutol,
TGFBR1 4-3-Pyridin-2-Yl-1h-Pyrazol-4-YlQuinoline Not Available,T2D Naphthyridine Inhibitor Not Available,T2D
PPARD 11E-OCTADEC-11-ENOIC ACID Not Available T2D 2S-2-3-2-fluoro-4-trifluoromethylphenylcarbonylaminomethyl-4-methoxybenzylbutanoic acid Not Available T2D
CSNK1G3 2Z-4-AMINO-2-4-METHOXYPHENYLIMINO-2,3-DIHYDRO-1,3-THIAZOL-5-YL4-METHOXYPHENYLMETHANONE Not Available T2D 4-AMINO-2-3-CHLOROANILINO-1,3-THIAZOL-5-YL4-FLUOROPHENYLMETHANONE Not Available,

Today is surely not a good day for me!
# 10  
Old 11-05-2012
I checked basically it seems there are irregular spaces between columns in second file. Thats seems to be the main issue......Smilie
# 11  
Old 11-05-2012
Quote:
Originally Posted by manigrover
actually in my system there is still same outputSmilie
Today is surely not a good day for me!
Using your sample files

Code:
$ awk 'NR==FNR{X[$0]=$0;next}{s=$1;$1="";for(i in X){if($0 ~ i){gsub(i,i" (matched)",$0)}};$0=s""$0}1' file1 file2
FHIT Adenosine (matched) Monotungstate Not Available,T2D Ado-P-Ch2-P-Ps-Ado Not Available,
CHRM1 Trospium (matched) Sanctura T2D Oxyphenonium (matched) Antrenyl T2D
PDE3B 5r-6-4-2-3-Iodobenzyl-3-Oxocyclohex-1-En-1-YlAminoPhenyl-5-Methyl-4,5-Dihydropyridazin-32h-One Not Available,T1D Hg9a-9, Nonanoyl-N-Hydroxyethylglucamide Not Available,
HSP90AA19-Butyl-8-2,5-Dimethoxy-Benzyl-9h-Purin-6-Ylamine Not Available,T2D 8-2-Chloro-3,4,5-Trimethoxy-Benzyl-2-Fluoro-9-Pent-4-Ylnyl-9h-Purin-6-Ylamine Not Available,T2D
ESR1 Chlorotrianisene (matched) Anisene,BD Conjugated Estrogens (matched) Conestoral,BD
INS M-Cresol Not Available,
FAH Acetoacetic Acid Not Available,BD 4-Hydroxy-Methyl-Phosphinoyl-3-Oxo-Butanoic Acid Not Available,
LPL Tyloxapol (matched) Alevaire,
ADAM17 3S-1-4-BUT-2-YN-1-YLOXYPHENYLSULFONYLPYRROLIDINE-3-THIOL Not Available T2D 3-4-but-2-yn-1-yloxyphenylsulfonylpropane-1-thiol Not Available T2D
GUCY1A2 Nitric Oxide (matched) INOmax,RA Isosorbide Mononitrate (matched) Conpin,
B4GALT1 6-Aminohexyl-Uridine-C1,5'-Diphosphate Not Available,
LCK 4-2-Acetylamino-2-3-Carbamoyl-2-Cyclohexylmethoxy-6,7,8,9-Tetrahydro-5h-Benzocyclohepten-5ylcarbamoyl-Ethyl-2-Phosphono-Phenyl-Phosphonic Acid Not Available,T1D 4-2-Acetylamino-2-1-3-Carbamoyl-4-Cyclohexylmethoxy-Phenyl-Ethylcarbamoyl-Ethyl-2-Phosphono-Phenoxy-Acetic Acid Not Available,T1D
GMDS Guanosine-5'-Diphosphate-Rhamnose Not Available,
LCT D-Gluconhydroximo-1,5-Lactam Not Available T2D Gluconolactone Not Available T2D
CALM1 3''-Beta-Chloroethyl-2'',4''-Dioxo-3, 5''-Spiro-Oxazolidino-4-Deacetoxy-Vinblastine (matched) Not Available T2D Prenylamine Bismethin,
RET 4-BROMO-2-FLUORO-N-4E-6-METHOXY-7-1-METHYLPIPERIDIN-4-YLMETHOXYQUINAZOLIN-41H-YLIDENEANILINE Not Available,
CYP1A2 2-PHENYL-4H-BENZOHCHROMEN-4-ONE Not Available,
PPARA Clofibrate (matched) Amotril,CD Gemfibrozil (matched) Bolutol,
TGFBR1 4-3-Pyridin-2-Yl-1h-Pyrazol-4-YlQuinoline Not Available,T2D Naphthyridine Inhibitor Not Available,T2D
PPARD 11E-OCTADEC-11-ENOIC ACID Not Available T2D 2S-2-3-2-fluoro-4-trifluoromethylphenylcarbonylaminomethyl-4-methoxybenzylbutanoic acid Not Available T2D
CSNK1G3 2Z-4-AMINO-2-4-METHOXYPHENYLIMINO-2,3-DIHYDRO-1,3-THIAZOL-5-YL4-METHOXYPHENYLMETHANONE Not Available T2D 4-AMINO-2-3-CHLOROANILINO-1,3-THIAZOL-5-YL4-FLUOROPHENYLMETHANONE Not Available,
NR3C1 Flunisolide (matched) Aerobid T2D Diflorasone (matched) Florone T2D
CTSD 1h-Benoximidazole-2-Carboxylic Acid Not Available T2D N-Aminoethylmorpholine Not Available T2D
TLL2 Carbobenzoxy-Pro-Lys-Phe-YPo2-Ala-Pro-Ome Not Available,
TYR Monobenzone (matched) AgeRite Alba,
HSD11B1 3,3-dimethylpiperidin-1-yl6-3-fluoro-4-methylphenylpyridin-2-ylmethanone Not Available,RA 5S-2-1S-1-4-fluorophenylethylamino-5-1-hydroxy-1-methylethyl-5-methyl-1,3-thiazol-45H-one Not Available,RA
C5 Eculizumab (matched) Soliris,
FGF1 Sucrose Octasulfate Not Available T2D Naphthalene Trisulfonate Not Available T2D
SORD Cp-166572, 2-Hydroxymethyl-4-4-N,N-Dimethylaminosulfonyl-1-Piperazino-Pyrimidine Not Available,
EGFR Gefitinib (matched) Iressa,T2D Panitumumab (matched) Vectibix,T2D
EPHB4 N-5-chloro-1,3-benzodioxol-4-yl-6-methoxy-7-3-piperidin-1-ylpropoxyquinazolin-4-amine Not Available T2D N'-5-CHLORO-1,3-BENZODIOXOL-4-YL-N-3,4,5- TRIMETHOXYPHENYLPYRIMIDINE-2,4-DIAMINE Not Available T2D
TPR N-1s-4-Bis2-ChloroethylAmino-1-Methylbutyl-N-6-Chloro-2-Methoxy-9-AcridinylAmine Not Available T2D Trypanothione Not Available,
CCL5 Heparin (matched) Disaccharide I-S Not Available,T1D Heparin (matched) Disaccharide Iii-S Not Available,

Here i can see many matched patterns.

Don't know what's wrong going with your machine..Smilie

Quote:
Originally Posted by Priyanka Chopra
I checked basically it seems there are irregular spaces between columns in second file. Thats seems to be the main issue......Smilie
@Priyanka,
Spaces in second file doesn't matter much while substituting single word. like Adenosin,
But it may fail if we are trying to substitute word with spaces like good man(first file) with good man(second file).
so here it will fail.

Last edited by pamu; 11-05-2012 at 08:18 AM..
# 12  
Old 11-05-2012
Hi Pamu

Thanks for reply. Although I tried in different machines at my place and still it's not working.

But I found that Adenosine monotungstatate is not present in first file but still it shows matched becasue adenosine is present which will be wrong in my case as I have to match whole word present in first file with whole word in second file columns.

Let me know if you come across any solution.
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Compare two files and print based on common variable value.

Hi All, i have below two files. FILE: NAME="/dev/sda" TYPE="disk" SIZE="60G" OWNER="root" GROUP="disk" MODE="brw-rw----" PKNAME="" MOUNTPOINT="" NAME="/dev/sda1" TYPE="part" SIZE="500M" OWNER="root" GROUP="disk" MODE="brw-rw----" PKNAME="/dev/sda" MOUNTPOINT="/boot" NAME="/dev/sda2"... (3 Replies)
Discussion started by: balu1234
3 Replies

2. Shell Programming and Scripting

Find common files between two directories

I have two directories Dir 1 /home/sid/release1 Dir 2 /home/sid/release2 I want to find the common files between the two directories Dir 1 files /home/sid/release1>ls -lrt total 16 -rw-r--r-- 1 sid cool 0 Jun 19 12:53 File123 -rw-r--r-- 1 sid cool 0 Jun 19 12:53... (5 Replies)
Discussion started by: sidnow
5 Replies

3. Shell Programming and Scripting

Compare multiple files, identify common records and combine unique values into one file

Good morning all, I have a problem that is one step beyond a standard awk compare. I would like to compare three files which have several thousand records against a fourth file. All of them have a value in each row that is identical, and one value in each of those rows which may be duplicated... (1 Reply)
Discussion started by: nashton
1 Replies

4. Shell Programming and Scripting

Compare multiple files, and extract items that are common to ALL files only

I have this code awk 'NR==FNR{a=$1;next} a' file1 file2 which does what I need it to do, but for only two files. I want to make it so that I can have multiple files (for example 30) and the code will return only the items that are in every single one of those files and ignore the ones... (7 Replies)
Discussion started by: castrojc
7 Replies

5. Shell Programming and Scripting

Compare a common field in two files and append a column from File 1 in File2

Hi Friends, I am new to Shell Scripting and need your help in the below situation. - I have two files (File 1 and File 2) and the contents of the files are mentioned below. - "Application handle" is the common field in both the files. (NOTE :- PLEASE REFER TO THE ATTACHMENT "Compare files... (2 Replies)
Discussion started by: Santoshbn
2 Replies

6. UNIX for Dummies Questions & Answers

compare two files based on common field in unix

I have two files in UNIX. 1st file is Entity and Second File is References. 1st File has only one column named Entity ID and 2nd file has two columns Entity ID | Person ID. I want to produce a output file where entity id's are matching in both the files. Entity File 624197 624252 624264... (4 Replies)
Discussion started by: PRS
4 Replies

7. UNIX for Dummies Questions & Answers

find common lines using just one column to compare and result with all columns

Hi. If we have this file A B C 7 8 9 1 2 10 and this other file A C D F 7 9 2 3 9 2 3 4 The result i´m looking for is intersection with A B C D F so the answer here will be (10 Replies)
Discussion started by: alcalina
10 Replies

8. Shell Programming and Scripting

Files common in two sets ??? How to find ??

Suppose we have 2 set of files set 1 set 2 ------ ------ abc hgb def ppp mgh vvv nmk sdf hgb ... (1 Reply)
Discussion started by: skyineyes
1 Replies

9. Shell Programming and Scripting

To find all common lines from 'n' no. of files

Hi, I have one situation. I have some 6-7 no. of files in one directory & I have to extract all the lines which exist in all these files. means I need to extract all common lines from all these files & put them in a separate file. Please help. I know it could be done with the help of... (11 Replies)
Discussion started by: The Observer
11 Replies
Login or Register to Ask a Question