Request to check: compare two files , match same entries, write data before it


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Request to check: compare two files , match same entries, write data before it
# 1  
Old 07-22-2012
Request to check: compare two files , match same entries, write data before it

Hi all,

I have 2 files:Column1 of first file has to be matched with column 3 of second file

first file contain DATA like this in 2 columns one with gene name second with whether CAD,HT,RA T2Dor any one
Code:
column 1    column2
ARFGEF2 CAD
DDEF2 CAD
PSCD3 CAD
PSCD4 CAD
CAMK1 CAD,HT,HT,HT,HT,HT,HT,HT,HT,HT,HT,HT,HT
HSP90AA1 CAD,CAD,CAD,T2D,T2D
KDR CAD,CD,CD
VEGF CAD,CAD,CAD,CAD,T2D,T2D,T2D
CTNNA3 CAD,HT,T2D
PTPRM CAD,T2D
RAC2 CAD,CAD,T1D,T1D
SMAD3 CAD,T2D,T2D,T2D,T2D,T2D,T2D,T2D
SORBS1 CAD,CAD,CAD
CD36 CAD
IRS1 CAD,CAD,CAD
IRS2 CAD,CAD,CAD,CAD
MTFMT CAD,CAD,CAD,T1D,T1D,T1D
SARS CAD
GNPDA2 CAD
NANS CAD
SRD5A1 CAD

The second file contain data like this:3 columns
column1 for drug name column2 drug name column3 gene names

Column

Code:
Lepirudin Refludan F2
Cetuximab Erbitux FCGR2A FCGR2B FCGR2C EGFR FCGR3B C1R C1QA C1QB C1QC FCGR3A C1S FCGR1A
Dornase Alfa Pulmozyme Not Available
Denileukin diftitox Ontak IL2RA IL2RB IL2RG
Etanercept Enbrel C1S C1R C1QA C1QB C1QC TNF TNFRSF1B FCGR1A FCGR3A FCGR2A FCGR2B FCGR2C LTA FCGR3B
Bivalirudin Angiomax F2
Leuprolide Eligard GNRHR
Peginterferon alfa-2a Pegasys IFNAR2 IFNAR1
Alteplase Activase (Genentech Inc) PLG FGA PLAUR SERPINE1
Sermorelin Geref GHRHR
Interferon alfa-n1 Wellferon (GlaxoSmithKline) IFNAR2 IFNAR1
Darbepoetin alfa Aranesp EPOR
Urokinase Abbokinase NID1 PLG PLAUR PLAU PLAT SERPINE1 SERPINB2 SERPINA5 LRP2 ST14
Goserelin Zoladex LHCGR GNRHR
Reteplase Retavase (Centocor) PLG FGA PLAUR SERPINE1
Epoetin alfa Epogen EPOR
Salmon Calcitonin Calcimar CALCR
Interferon alfa-n3 Alferon (Interferon Sciences Inc.) IFNAR1 IFNAR2
Pegfilgrastim Neulasta (Amgen Inc.) CSF3R ELANE
Sargramostim Immunex CSF2RA IL3RA CSF2RB SDC2 PRG2
Secretin SecreFlo SCTR
Peginterferon alfa-2b PEG-Intron    (Schering Corp) IFNAR1 IFNAR2Lepirudin Refludan
Asparaginase Elspar (Merck & Co. Inc) Not Available
Thyrotropin Alfa Thyrogen (Genzyme Inc) TSHR
Antihemophilic Factor Advate LRP1 MCFD2 F10 F9 VWF PHYH ASGR2 HSPA5 CALR CANX LMAN1
Anakinra Kineret (Amgen Inc) IL1R1


Column1 of first file has to be matched with column 3 of second file because they both containgene names and then if any one is similar I have to put column 2 of first file in front of it as well as colimn1 and column2 of second file in front of it
so output will be like

OUtput contain 4 columns:
Code:
AGFRA     CAD,HT         Lepirudin     Refludan


Last edited by Scrutinizer; 07-22-2012 at 04:30 AM.. Reason: quote tags => code tags
# 2  
Old 07-22-2012
The sample of the second file has more than 3 columns. How are the columns in the real file separated?
# 3  
Old 07-22-2012
request to check

Hi

The second file is text file containing data as it is as mentioned here. I m not sure whethere there are more than one columns but yeah there is space beween different gene names and it seems there are more than one column.because of some problem in Im not able to check this now.Mani
# 4  
Old 07-22-2012
Not only between gene names. Take for example line 3 of the 2nd file. Here column2 appears to consist of two words. How do we know which word belongs to which column?
# 5  
Old 07-22-2012
Quote:
Hi

I got the second file using this coding


Try thisCode:
Code:
awk 'k>0 {if (a[k] && k==2) {print a[1]" "a[2]" "a[3]; a[1]=a[2]=a[3]="";} a[k]=a[k]?a[k]" "$0:$0; k=0;} /^# Drug_Target_.*_Gene_Name/ {k=3;} /^# Generic/ {k=1;} /^# Brand_Name/ {k=2;} END {if (a[1]) print a[1]" "a[2]" "a[3];}' drug_bank.dat

But i do want that data shuld be arranged in second file in 3 columns only even if there is space between two words in 2column and 3 row

Whether u can get something from this coding. High thanks fo that Mani

Last edited by Scrutinizer; 07-22-2012 at 07:54 AM.. Reason: code tags
# 6  
Old 07-23-2012
Request to check: new file with 3 separated rows rows

Hi all

Now I have got the second file with 3 columns for sure

like this:

Kindly check it.Pelase let me know scripting for above question
Code:
   	 	 	 	 	body, div, table, thead, tbody, tfoot, tr, th, td, p { font-family: "Liberation Sans"; font-size: x-small; } 	   	 	 		 			    	 	 	 	 	body, div, table, thead, tbody, tfoot, tr, th, td, p { font-family: "Liberation Sans"; font-size: x-small; } 	   	 	 		 			Lepirudin:Refludan:F2
Cetuximab:Erbitux:FCGR2A,FCGR2B,FCGR2C,EGFR,FCGR3B,C1R,C1QA,C1QB,C1QC,FCGR3A,C1S,FCGR1A
Dornase Alfa:Pulmozyme:Not Available
Denileukin diftitox:Ontak:IL2RA,IL2RB,IL2RG
Etanercept:Enbrel:C1S,C1R,C1QA,C1QB,C1QC,TNF,TNFRSF1B,FCGR1A,FCGR3A,FCGR2A,FCGR2B,FCGR2C,LTA,FCGR3B
Bivalirudin:Angiomax:F2
Leuprolide:Eligard:GNRHR
Peginterferon alfa-2a:Pegasys:IFNAR2,IFNAR1
Alteplase:Activase (Genentech Inc):PLG,FGA,PLAUR,SERPINE1
Sermorelin:Geref:GHRHR
Interferon alfa-n1:Wellferon (GlaxoSmithKline):IFNAR2,IFNAR1
Darbepoetin alfa:Aranesp:EPOR
Urokinase:Abbokinase:NID1,PLG,PLAUR,PLAU,PLAT,SERPINE1,SERPINB2,SERPINA5,LRP2,ST14
Goserelin:Zoladex:LHCGR,GNRHR
Reteplase:Retavase (Centocor):PLG,FGA,PLAUR,SERPINE1
Epoetin alfa:Epogen:EPOR
Salmon Calcitonin:Calcimar:CALCR
Interferon alfa-n3:Alferon (Interferon Sciences Inc.):IFNAR1,IFNAR2
Pegfilgrastim:Neulasta (Amgen Inc.):CSF3R,ELANE
Sargramostim:Immunex:CSF2RA,IL3RA,CSF2RB,SDC2,PRG2
Secretin:SecreFlo:SCTR
Peginterferon alfa-2b:PEG-Intron    (Schering Corp):IFNAR1,IFNAR2
Asparaginase:Elspar (Merck & Co. Inc):Not Available
Thyrotropin Alfa:Thyrogen (Genzyme Inc):TSHR
Antihemophilic Factor:Advate:LRP1,MCFD2,F10,F9,VWF,PHYH,ASGR2,HSPA5,CALR,CANX,LMAN1
Anakinra:Kineret (Amgen Inc):IL1R1
Gramicidin D:Neosporin:
Intravenous Immunoglobulin:Civacir:C4B,C5,FCGR1A,FCGR1B,FCGR2A,FCGR2B,FCGR2C,FCGR3A,FCGR3B,C3,C4A
Anistreplase:Eminase (Wulfing Pharma GmbH):PLG,FGA,PLAUR,SERPINE1
Insulin recombinant:Novolin R (Novo Nordisk):LRP2,IGFBP7,SYTL4,INSR,IGF1R,RB1,CTSD,IDE,PCSK2,CPE,PCSK1,NOV
Tenecteplase:TNKase (Genentech Inc):CANX,LRP1,PLG,FGA,PLAUR,SERPINE1,SERPINB2,CLEC3B,KRT8,ANXA2,CALR
Menotropins:Repronex:FSHR,LHCGR
Interferon gamma-1b:Actimmune:IFNGR1,IFNGR2
Interferon Alfa-2a, Recombinant:Roferon A (Hoffmann-La Roche Inc):IFNAR1,IFNAR2
Desmopressin:Adiuretin:AVPR2,AVPR1A,AVPR1B
Coagulation factor VIIa:NovoSeven (Novo Nordisk):F10,HPN,TFPI,GGCX,F7,F3
Oprelvekin:Neumega:IL11RA
Palifermin:Kepivance (Amgen Inc):FGFR2,NRP1,FGFR1,FGFR4,FGFR3,HSPG2
Glucagon recombinant:GlucaGen (Novo Nordisk):GCGR,GLP2R,GLP1R
Aldesleukin:Proleukin:IL2RB,IL2RA,IL2RG


Last edited by Scrutinizer; 07-24-2012 at 12:47 AM.. Reason: code tags instead of quote tags
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Data match 2 files based on first 2 columns matching only and join if match

Hi, i have 2 files , the data i need to match is in masterfile and i need to pull out column 3 from master if column 1 and 2 match and output entire row to new file I have tried with join and awk and i keep getting blank outputs or same file is there an easier way than what i am... (4 Replies)
Discussion started by: axis88
4 Replies

2. UNIX for Dummies Questions & Answers

Compare data - Match first column and compare second

Hi guys, looking for some help with a way to compare data in two files but with some conditions. example, File 1 consists of site1,10.1.1.1 site2,20.2.2.2 site3,30.3.3.3 File 2 contains site1,l0.1.1.1 site2,50.1.1.1 site3,30.3.3.3 site4,40.1.1.1 I want to be able to match the... (1 Reply)
Discussion started by: mutley2202
1 Replies

3. Shell Programming and Scripting

Compare 2 files of csv file and match column data and create a new csv file of them

Hi, I am newbie in shell script. I need your help to solve my problem. Firstly, I have 2 files of csv and i want to compare of the contents then the output will be written in a new csv file. File1: SourceFile,DateTimeOriginal /home/intannf/foto/IMG_0713.JPG,2015:02:17 11:14:07... (8 Replies)
Discussion started by: refrain
8 Replies

4. Shell Programming and Scripting

Compare two files and write data to second file using awk

Hi Guys, I wanted to compare a delimited file and positional file, for a particular key files and if it matches then append the positional file with some data. Example: Delimited File -------------- Byer;Amy;NONE1;A5218257;E5218257 Byer;Amy;NONE1;A5218260;E5218260 Positional File... (3 Replies)
Discussion started by: Ajay Venkatesan
3 Replies

5. Shell Programming and Scripting

Request to check:Fetch certain entries

Hi all. Kindly check it it's urgent!! I have one big file from which which I have to fetch certain data I have attached a small part of this file. from the attached file, I have to fetch and arrange data in 3 columns 1 Generic name 2. Brand names 3. Drug... (10 Replies)
Discussion started by: manigrover
10 Replies

6. Shell Programming and Scripting

Compare 2 files and match column data and align data from 3 column

Hello experts, Please help me in achieving this in an easier way possible. I have 2 csv files with following data: File1 08/23/2012 12:35:47,JOB_5330 08/23/2012 12:35:47,JOB_5330 08/23/2012 12:36:09,JOB_5340 08/23/2012 12:36:14,JOB_5340 08/23/2012 12:36:22,JOB_5350 08/23/2012... (5 Replies)
Discussion started by: asnandhakumar
5 Replies

7. Shell Programming and Scripting

Request to check:remove duplicates and write sytematically

Hi all I have a file with following input It contains 5 columns gene name drug drug ID disease approved Now the same gene is repeated many times with different data in column2,3 ,4,5 I want to arrange dat in such a way that there shuld be one entry in the column(no... (2 Replies)
Discussion started by: manigrover
2 Replies

8. Shell Programming and Scripting

Request to check remove duplicates but write before it

Hi alll I have a file with following kind input I want in output duplicates should not be there but there should be numbering mentioned before that like (4 Replies)
Discussion started by: manigrover
4 Replies

9. Shell Programming and Scripting

Request to check: find common and write before it

Hi all, I have 2 big files with such kind of inputs File I File II I want the output file shuld contain Please let me knw scripting regarind this (1 Reply)
Discussion started by: manigrover
1 Replies

10. Shell Programming and Scripting

Request to check: Not printing all entries

Dear all, I am facing one problem in my input file there are many Entries which starts from *FIELD * AV (checked the attached file) I want all the entries in the output file which start from *FIELD * AV I have written this programm but its not printing all the entries with *FIELD... (4 Replies)
Discussion started by: manigrover
4 Replies
Login or Register to Ask a Question