Sponsored Content
Top Forums Shell Programming and Scripting Match look up file and find result Post 302698345 by manigrover on Monday 10th of September 2012 06:29:51 AM
Old 09-10-2012
Match look up file and find result

Hi

I ahve a lookup file wiht seven words
Code:
CD
HT
CAD
HT
T1D
T2D
BD

another file contain data like this

Code:
CHRM1    P11229    Pirenzepine    DAP000492    Peptic ulcer disease    Approved T2D
CHRM1    P11229    Glycopyrrolate    DAP001116    Anesthetic    Approved T2D
CHRM1    P11229    Clidinium    DAP001117    Abdominal/stomach pain    Approved T2D
CHRM1    P11229    Dicyclomine    DAP001118    Irritable bowel syndrome    Approved T2D
CHRM1    P11229    Ethopropazine    DAP001119    Parkinson's disease    Approved T2D
CHRM1    P11229    Cycrimine    DAP001120    Parkinson's disease    Approved T2D
CHRM1    P11229    Benztropine    DAP001121    Parkinson's disease    Approved T2D
CHRM1    P11229    Trihexyphenidyl    DAP001122    Parkinson's disease    Approved T2D
CHRM1    P11229    Propantheline    DAP001123    Excessive sweating (hyperhidrosis)    Approved T2D
CHRM1    P11229    Oxyphenonium    DAP001124    Spasm    Approved T2D
CHRM1    P11229    Biperiden    DAP001125    Parkinson's disease    Approved T2D
CHRM1    P11229    Talsaclidine isomer    DCL000268    Alzheimer's disease    Discontinued T2D
CHRM1    P11229    Sabcomeline hydrochloride    DCL000279    Cardiovascular diseases    Phase IIa T2D
CHRM1    P11229    Talsaclidine fumarate    DCL000303    Alzheimer's disease    Discontinued T2D
CHRM1    P11229    Xanomeline tartrate    DCL000328    Alzheimer's disease    Phase II T2D
CHRM1    P11229    GSK573719    DCL000381    Chronic Obstructive Pulmonary Disease (COPD)    Phase II T2D
CHRM1    P11229    GSK961081    DCL000397    Chronic Obstructive Pulmonary Disease (COPD)    Phase II completed T2D
CHRM1    P11229    GSK1034702    DCL000402    Schizophrenia, Dementia    Phase I completed T2D
CHRM1    P11229    Darotropium    DCL000514    COPD    Suspended in Phase II in GSK 2009 Report T2D
CHRM1    P11229    Darotropium + 642444    DCL000515    COPD    Phase III T2D
CHRM1    P11229    Revatropate    DCL000957    Chronic obstructive pulmonary disease    Discontinued in Phase I T2D
FLT1    P17948    Sorafenib    DAP000006    Advanced renal cell carcinoma    Launched CAD
FLT1    P17948    Sorafenib    DAP000006    Hepatocellular carcinoma, NSCLC, melanoma    Phase III CAD
FLT1    P17948    Sorafenib    DAP000006    Myelodyspalstic syndrome, AML, head & neck cancer, breast, colon, ovarian, pancreatic cancer    Phase II CAD
FLT1    P17948    Ranibizumab    DAP001260    Age-related macular degeneration    Approved CAD
FLT1    P17948    Ranibizumab    DAP001260    Diabetic macular edema and retinal vein occlusion    Phase III CAD
FLT1    P17948    Telbermin    DCL001016    Diabetic foot ulcers    Discontinued in Phase II CAD
KDR    P35968    Sunitinib    DAP000005    Advanced renal cell carcinoma    Launched CAD,CD,CD
KDR    P35968    Sunitinib    DAP000005    Advanced renal cell carcinoma    Phase II CAD,CD,CD
KDR    P35968    Pazopanib HCl    DAP001550    Renal cell carcinoma    Approved CAD,CD,CD
KDR    P35968    CYC116    DCL000010    Solid Tumors    Terminated in Phase I CAD,CD,CD
KDR    P35968    XL999    DCL000011    Advanced Malignancies    Phase I CAD,CD,CD
KDR    P35968    CT-322    DCL000096    Cancer/Tumors    Phase I CAD,CD,CD
KDR    P35968    CT-322    DCL000096    Macular Degeneration    Preclinical CAD,CD,CD
KDR    P35968    XL647    DCL000263    Cancer    Phase I completed CAD,CD,CD
KDR    P35968    XL647    DCL000263    Carcinoma, Non-Small-Cell Lung    Phase II completed CAD,CD,CD
KDR    P35968    XL880    DCL000265    Solid Tumors    Phase I CAD,CD,CD
KDR    P35968    XL880    DCL000265    Gastric Cancer, Renal Cell Carcinoma, Squamous Cell Cancer of the Head and Neck    Phase II CAD,CD,CD
KDR    P35968    SU-6668    DCL000342    Advanced solid tumours    Discontinued CAD,CD,CD

[/CODE]
I am using following code

Code:
awk -F'\t' 'FNR==NR{a[$0]=1;next} {
gsub(/Approved */,"",$6)
n=split($6,b,",")
$6=""
for(i=1;i<=n;i++)
 if(b[i] in a)
  print $0, "Approved" > "file_" b[i] ".txt"
}' OFS='\t' lookupfile mainfile

But I m receiving seven file but output doesnot contain allt he data according to second input file


For eg one part of the output for T2D file is
Code:
Code:
CHRM1    P11229    Pirenzepine    DAP000492    Peptic ulcer disease        Approved
CHRM1    P11229    Glycopyrrolate    DAP001116    Anesthetic        Approved
CHRM1    P11229    Clidinium    DAP001117    Abdominal/stomach pain        Approved
CHRM1    P11229    Dicyclomine    DAP001118    Irritable bowel syndrome        Approved
CHRM1    P11229    Ethopropazine    DAP001119    Parkinson's disease        Approved
CHRM1    P11229    Cycrimine    DAP001120    Parkinson's disease        Approved
CHRM1    P11229    Benztropine    DAP001121    Parkinson's disease        Approved
CHRM1    P11229    Trihexyphenidyl    DAP001122    Parkinson's disease        Approved
CHRM1    P11229    Propantheline    DAP001123    Excessive sweating (hyperhidrosis)        Approved
CHRM1    P11229    Oxyphenonium    DAP001124    Spasm        Approved
CHRM1    P11229    Biperiden    DAP001125    Parkinson's disease        Approved
But, the expected output is
Code:
CHRM1    P11229    Pirenzepine    DAP000492    Peptic ulcer disease    Approved 
CHRM1    P11229    Glycopyrrolate    DAP001116    Anesthetic    Approved 
CHRM1    P11229    Clidinium    DAP001117    Abdominal/stomach pain    Approved 
CHRM1    P11229    Dicyclomine    DAP001118    Irritable bowel syndrome    Approved 
CHRM1    P11229    Ethopropazine    DAP001119    Parkinson's disease    Approved 
CHRM1    P11229    Cycrimine    DAP001120    Parkinson's disease    Approved 
CHRM1    P11229    Benztropine    DAP001121    Parkinson's disease    Approved 
CHRM1    P11229    Trihexyphenidyl    DAP001122    Parkinson's disease    Approved 
CHRM1    P11229    Propantheline    DAP001123    Excessive sweating (hyperhidrosis)    Approved 
CHRM1    P11229    Oxyphenonium    DAP001124    Spasm    Approved 
CHRM1    P11229    Biperiden    DAP001125    Parkinson's disease    Approved 
CHRM1    P11229    Talsaclidine isomer    DCL000268    Alzheimer's disease    Discontinued 
CHRM1    P11229    Sabcomeline hydrochloride    DCL000279    Cardiovascular diseases    Phase IIa 
CHRM1    P11229    Talsaclidine fumarate    DCL000303    Alzheimer's disease    Discontinued 
CHRM1    P11229    Xanomeline tartrate    DCL000328    Alzheimer's disease    Phase II 
CHRM1    P11229    GSK573719    DCL000381    Chronic Obstructive Pulmonary Disease (COPD)    Phase II 
CHRM1    P11229    GSK961081    DCL000397    Chronic Obstructive Pulmonary Disease (COPD)    Phase II completed 
CHRM1    P11229    GSK1034702    DCL000402    Schizophrenia, Dementia    Phase I completed 
CHRM1    P11229    Darotropium    DCL000514    COPD    Suspended in Phase II in GSK 2009 Report 
CHRM1    P11229    Darotropium + 642444    DCL000515    COPD    Phase III 
CHRM1    P11229    Revatropate    DCL000957    Chronic obstructive pulmonary disease    Discontinued in Phase I

So in out put its showing only those lines which cotain word "approved" on right hand side but others should also be there

---------- Post updated 09-10-12 at 05:29 AM ---------- Previous update was 09-09-12 at 11:56 PM ----------

Hi

Whether I will be able to get result after editing "approved" word but I have to choose many other words in the following code to make it worthwile

Code:
awk -F'\t' 'FNR==NR{a[$0]=1;next} {
gsub(/Approved */,"",$6)
n=split($6,b,",")
$6=""
for(i=1;i<=n;i++)
 if(b[i] in a)
  print $0, "Approved" > "file_" b[i] ".txt"
}' OFS='\t' lookupfile mainfile


Last edited by manigrover; 09-10-2012 at 07:21 AM..
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

result of find

Hey, I am using 'find' to check the existence of a file which is created today, and this is what I have find . -name $filename -mtime +0 -exec ls {} \; my problem is I need to know what the above command actually get anything, so can anyone give me some pointer on how to do... (1 Reply)
Discussion started by: mpang_
1 Replies

2. Shell Programming and Scripting

Outputting formatted Result log file from old 30000 lines result log<help required>

Well I have a 3000 lines result log file that contains all the machine data when it does the testing... It has 3 different section that i am intrsted in 1) starting with "20071126 11:11:11 Machine Header 1" 1000 lines... "End machine header 1" 2) starting with "20071126 12:12:12 Machine... (5 Replies)
Discussion started by: vikas.iet
5 Replies

3. Shell Programming and Scripting

Find match in two diff file - local srv and remote server

Perl Guru.... I need to compare two diff file (file1.abc will locate in current server and file2.abc will locate in remote server), basically the script will look for match in both file and only will send out email if there is no match and also give me list of unmatch and dups as well. So... (0 Replies)
Discussion started by: amir07
0 Replies

4. Shell Programming and Scripting

How to find first match and last match in a file

Hi All, I have a below file: ================== 02:53 pravin-root 02:53 pravin-root 03:05 pravin-root 02:55 pravin1-root 02:59 pravin1-root ================== How do I find the first and last value of column 1. For example, how do I find 02:53 is the first time stamp and 03:05 is... (3 Replies)
Discussion started by: praving5
3 Replies

5. UNIX Desktop Questions & Answers

find result

When searching for some files which match some specific criteria with find from the root directory, I got a listing of a bunch of files that say "Permission Denied". How can do my search and not show the files that I don't have the permission to list? Thanks, (3 Replies)
Discussion started by: Pouchie1
3 Replies

6. Shell Programming and Scripting

Find diff bet 2 files and store result in another file

Hi I want to compare 2 files. The files have the same amount of rows and columns. So each line must be compare against the other and if one differs from the other, the result of both must be stored in a seperate file. I am doing this in awk. Here is my file1: Blocks... (2 Replies)
Discussion started by: ladyAnne
2 Replies

7. Shell Programming and Scripting

Compare two files and find match and print the header of the second file

Hi, I have two input files; file1 and file2. I compare them based on matched values in 1 column and print selected columns of the second file (file2). I got the result but the header was not printed. i want the header of file2 to be printed together with the result. Then i did below codes:- ... (3 Replies)
Discussion started by: redse171
3 Replies

8. Shell Programming and Scripting

awk parse result that match data from file

i run command that return this result,example : gigabitethernet2/2/4:NotPresent, gigabitethernet2/1/17:UP, gigabitethernet2/1/10:UP, gigabitethernet2/1/5:UP, gigabitethernet2/1/9:UP, gigabitethernet2/1/36:DOWN, gigabitethernet2/1/33:DOWN, gigabitethernet2/1/8:UP,... (19 Replies)
Discussion started by: wanttolearn1
19 Replies

9. UNIX for Beginners Questions & Answers

Match file and find count

Hi All, I have transaction in one file.I want to match that to another file and find the number of time the transaction is available on the other file.I need to take each record from TRANSFILE and match that with SPEND FILE and find the number of counts of the transaction TRANSFILE: ... (4 Replies)
Discussion started by: arunkumar_mca
4 Replies

10. UNIX for Beginners Questions & Answers

How to find particular file-name in file and get result in table in mail?

We have 100 linux servers, All send logs to both centralize server(server1 and serverb). all send logs every day and stores in /syslog folder with hostname.log file. I need to prepare script to check every day from both centralize server(server1 and serverb) and send mail in table format. ... (1 Reply)
Discussion started by: yash_message
1 Replies
All times are GMT -4. The time now is 11:54 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy