Fetch specific entries


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Fetch specific entries
# 1  
Old 12-10-2012
Fetch specific entries

Hi Guys

This time my input sample from a Big file like this



In

Code:
TTDS00002	UniProt ID	P11229
TTDS00002	Name	Muscarinic acetylcholine receptor M1
TTDS00002	Type of target	Successful target
TTDS00002	Synonyms	M1 receptor
TTDS00002	Disease	Alzheimer's disease
TTDS00002	Disease	Bronchospasm (histamine induced)
TTDS00002	Disease	Cognitive deficits
TTDS00002	Disease	Schizophrenia
TTDS00002	Function	The muscarinic acetylcholine receptor mediates various cellular responses, including inhibition of adenylate cyclase, breakdown of phosphoinositides and modulation of potassium channels through the action of G proteins.
TTDS00002	Sequence	MNTSAPPAVSPNITVLAPGKGPWQVAFIGITTGLLSLATVTGNLLVLISFKVNTELKTVNNYFLLSLACADLIIGTFSMNLYTTYLLMGHWALGTLACDLWLALDYVASNASVMNLLLISFDRYFSVTRPLSYRAKRTPRRAALMIGLAWLVSFVLWAPAILFWQYLVGERTVLAGQCYIQFLSQPIITFGTAMAAFYLPVTVMCTLYWRIYRETENRARELAALQGSETPGKGGGSSSSSERSQPGAEGSPETPPGRCCRCCRAPRLLQAYSWKEEEEEDEGSMESLTSSEGEEPGSEVVIKMPMVDPEAQAPTKQPPRSSPNTVKRPTKKGRDRAGKGQKPRGKEQLAKRKTFSLVKEKKAARTLSAILLAFILTWTPYNIMVLVSTFCKDCVPETLWELGYWLCYVNSTINPMCYALCNKAFRDTFRLLLLCRWDKRRWRKIPKRPGSVHRTPSRQC
TTDS00002	BioChemical Class	G-protein coupled receptor (rhodopsin family)
TTDS00002	Pathway	Calcium signaling pathway
TTDS00002	Pathway	Neuroactive ligand-receptor interaction
TTDS00002	Pathway	Regulation of actin cytoskeleton
TTDS00002	Related US Patent	6,288,068
TTDS00002	Related US Patent	6,294,554
TTDS00002	Related US Patent	6,627,645
TTDS00002	Drug(s)	Pirenzepine	DAP000492	Peptic ulcer disease	Approved
TTDS00002	Drug(s)	Glycopyrrolate	DAP001116	Anesthetic	Approved
TTDS00002	Drug(s)	Clidinium	DAP001117	Abdominal/stomach pain	Approved
TTDS00002	Drug(s)	Dicyclomine	DAP001118	Irritable bowel syndrome	Approved
TTDS00002	Drug(s)	Ethopropazine	DAP001119	Parkinson's disease	Approved
TTDS00002	Drug(s)	Cycrimine	DAP001120	Parkinson's disease	Approved
TTDS00002	Drug(s)	Benztropine	DAP001121	Parkinson's disease	Approved
TTDS00002	Drug(s)	Trihexyphenidyl	DAP001122	Parkinson's disease	Approved
TTDS00002	Drug(s)	Propantheline	DAP001123	Excessive sweating (hyperhidrosis)	Approved
TTDS00002	Drug(s)	Oxyphenonium	DAP001124	Spasm	Approved
TTDS00002	Drug(s)	Biperiden	DAP001125	Parkinson's disease	Approved
TTDS00002	Antagonist	Pirenzepine	DAP000492
TTDS00002	Antagonist	Glycopyrrolate	DAP001116
TTDS00002	Antagonist	Clidinium	DAP001117
TTDS00002	Antagonist	Dicyclomine	DAP001118
TTDS00002	Antagonist	Ethopropazine	DAP001119
TTDS00002	Antagonist	Benztropine	DAP001121
TTDS00002	Antagonist	Trihexyphenidyl	DAP001122
TTDS00002	Antagonist	Propantheline	DAP001123
TTDS00002	Antagonist	Oxyphenonium	DAP001124
TTDS00002	Antagonist	Biperiden	DAP001125
TTDS00002	Binder	Cycrimine	DAP001120
TTDS00002	Drug(s)	Talsaclidine isomer	DCL000268	Alzheimer's disease	Discontinued
TTDS00002	Drug(s)	Sabcomeline hydrochloride	DCL000279	Cardiovascular diseases	Phase IIa
TTDS00002	Drug(s)	Talsaclidine fumarate	DCL000303	Alzheimer's disease	Discontinued
TTDS00002	Drug(s)	Xanomeline tartrate	DCL000328	Alzheimer's disease	Phase II
TTDS00002	Drug(s)	GSK573719	DCL000381	Chronic Obstructive Pulmonary Disease (COPD)	Phase II
TTDS00002	Drug(s)	GSK961081	DCL000397	Chronic Obstructive Pulmonary Disease (COPD)	Phase II completed
TTDS00002	Drug(s)	GSK1034702	DCL000402	Schizophrenia, Dementia	Phase I completed
TTDS00002	Drug(s)	Darotropium	DCL000514	COPD	Suspended in Phase II in GSK 2009 Report
TTDS00002	Drug(s)	Darotropium + 642444	DCL000515	COPD	Phase III
TTDS00002	Drug(s)	Revatropate	DCL000957	Chronic obstructive pulmonary disease	Discontinued in Phase I
TTDS00002	Antagonist	Revatropate	DCL000957
TTDS00002	Agonist	Talsaclidine isomer	DCL000268
TTDS00002	Agonist	Sabcomeline hydrochloride	DCL000279
TTDS00002	Agonist	Talsaclidine fumarate	DCL000303
TTDS00002	Agonist	Xanomeline tartrate	DCL000328
TTDS00002	Agonist	GSK573719	DCL000381
TTDS00002	Agonist	GSK961081	DCL000397
TTDS00002	Agonist	GSK1034702	DCL000402
TTDS00002	Agonist	Darotropium	DCL000514
TTDS00002	Agonist	Darotropium + 642444	DCL000515
TTDS00002	Multitarget	GSK961081	DCL000397
TTDS00002	Multitarget	Revatropate	DCL000957
TTDS00002	Agonist	77-LH-28-1	DNC000099
TTDS00002	Agonist	AC-260584	DNC000137
TTDS00002	Agonist	AC-42	DNC000138
TTDS00002	Agonist	AF150(S)	DNC000165
TTDS00002	Agonist	AF267B	DNC000166
TTDS00002	Agonist	LY-593039	DNC000910
TTDS00002	Agonist	NGX-267	DNC001012
TTDS00002	Agonist	Sabcomeline	DNC001264
TTDS00002	Agonist	WAY-132983	DNC001510
TTDS00002	Inhibitor	Arecoline	DNC002508
TTDS00002	Inhibitor	Acetic acid 8-aza-bicyclo[3.2.1]oct-6-yl ester	DNC003640
TTDS00002	Inhibitor	Benzoic acid 8-aza-bicyclo[3.2.1]oct-6-yl ester	DNC003654
TTDS00002	Inhibitor	Propionic acid 8-aza-bicyclo[3.2.1]oct-6-yl ester	DNC003659
TTDS00002	Inhibitor	3-Methyl-7-pyrrolidin-1-yl-hept-5-yn-2-one	DNC004147
TTDS00002	Inhibitor	2-Methyl-6-pyrrolidin-1-yl-hex-4-ynal oxime	DNC004159
TTDS00002	Inhibitor	ISOCLOZAPINE	DNC004166
TTDS00002	Inhibitor	SB-202026	DNC004272
TTDS00002	Inhibitor	HIMBACINE	DNC004995
TTDS00002	Inhibitor	RR(17)PZ	DNC005944
TTDS00002	Inhibitor	Bo(15)PZ	DNC005945
TTDS00002	Inhibitor	DIFLUOROBENZTROPINE	DNC005986
TTDS00002	Inhibitor	BI-1356	DNC007901
TTDS00002	Inhibitor	FM1-10	DNC008187
TTDS00002	Inhibitor	FM1-43	DNC008188
TTDS00002	Inhibitor	A-987306	DNC008996
TTDS00002	Inhibitor	GNF-PF-5618	DNC009476
TTDS00002	Inhibitor	CREMASTRINE	DNC009504
TTDS00002	Inhibitor	1,1-diphenyl-2-(3-tropanyl)ethanol	DNC009866
TTDS00002	Inhibitor	R-dimethindene	DNC009877
TTDS00002	Inhibitor	Tiotropium Bromide	DNC009882
TTDS00002	Inhibitor	XANOMELINE	DNC011170
TTDS00002	Inhibitor	4-(4-butylpiperidin-1-yl)-1-o-tolylbutan-1-one	DNC011171
TTDS00002	Inhibitor	1-Methyl-1-(4-pyrrolidin-1-yl-but-2-ynyl)-urea	DNC011427
TTDS00002	Inhibitor	ISOLOXAPINE	DNC011498
TTDS00002	Inhibitor	1'-Benzyl-3-phenyl-[3,4']bipiperidinyl-2,6-dione	DNC011500
TTDS00002	Inhibitor	CARAMIPEN	DNC011755
TTDS00002	Inhibitor	FLUMEZAPINE	DNC011857
TTDS00002	Inhibitor	AMINOBENZTROPINE	DNC011950
TTDS00002	Inhibitor	2-(4-Diethylamino-but-2-ynyl)-isoindole-1,3-dione	DNC012005
TTDS00002	Inhibitor	3-Tetrazol-2-yl-1-aza-bicyclo[2.2.2]octane	DNC012098
TTDS00002	Inhibitor	SULFOARECOLINE	DNC012122
TTDS00002	Inhibitor	6-Dimethylamino-2-methyl-hex-4-ynal oxime	DNC012306
TTDS00002	Inhibitor	7-Pyrrolidin-1-yl-hept-5-yn-2-one	DNC012322
TTDS00002	Inhibitor	7-Dimethylamino-3-methyl-hept-5-yn-2-one	DNC012323
TTDS00002	Inhibitor	7-Pyrrolidin-1-yl-hept-5-yn-2-one oxime	DNC012330
TTDS00002	Inhibitor	7-Dimethylamino-hept-5-yn-2-one	DNC012350
TTDS00002	Inhibitor	7-Dimethylamino-hept-5-yn-2-one oxime	DNC012351
TTDS00002	Inhibitor	N-(4-Dimethylamino-but-2-ynyl)-N-methyl-acetamide	DNC012363
TTDS00002	Inhibitor	ACECLIDINE	DNC012502
TTDS00002	Inhibitor	N-methoxyquinuclidine-3-carboximidoyl fluoride	DNC012588
TTDS00002	Inhibitor	BRL-55473	DNC012594
TTDS00002	Inhibitor	N-methoxyquinuclidine-3-carboximidoyl chloride	DNC012616
TTDS00002	Inhibitor	2,8-Dimethyl-1-oxa-8-aza-spiro[4.5]decan-3-one	DNC012765
TTDS00002	Inhibitor	3alpha-(bis-chloro-phenylmethoxy)tropane	DNC013136
TTDS00002	Inhibitor	3-(3-benzylamino)-piperidin-2-one	DNC013219
TTDS00002	Target Validation	TTDS00002




I want extract above bold data in such a way so that I can divide into seven columns in such a way that I can arrange entries like this. My sample expected out put for above input include in which I want to include seven columns like this uniprot id, drug name, drug Id, disease, approved/phase and agonist/inhibitor/antagonist words.

If approved, disease is not given for a drug then it will remain blank! Like the in th elast drugs dont have disease name and approved/phase status mentioned in front.

Code:
P11229    Talsaclidine isomer    DCL000268    Alzheimer's disease  Approved Discontinued  agonist     
 P11229   Sabcomeline hydrochloride    DCL000279    Cardiovascular diseases    Phase IIa    Inhibitor 
P11229    Talsaclidine fumarate    DCL000303    Alzheimer's disease        Phase I   agonist
P11229    Xanomeline tartrate    DCL000328    Alzheimer's disease    Phase II          Antagonist


Last edited by Priyanka Chopra; 12-10-2012 at 09:03 PM..
# 2  
Old 12-10-2012
Not easy to understand your question.
Please update your question in a easy way to understand if possible.

Cheers,
# 3  
Old 12-10-2012
Hi Joseph

I have tried to update the question let me know if you have any question.
# 4  
Old 12-10-2012
Just for simply start, could you please explain how to get the last column (agonist/inhibitor/antagonist)?
Second question, what's the difference for two Drug(s) sessions?
# 5  
Old 12-10-2012
still don't understand. however you might use the following command to get the output. Please note that you might need to add more code into it.
Code:
awk '/Drug/ {print "P11229 " $0} /Anta/ {print "P11229 " $0}' input.txt

Cheers,
# 6  
Old 12-10-2012
Hi

Thanks for reply.

@rdc

1. how to get the last column (agonist/inhibitor/antagonist)?

Although I struggled with this as well. The one thing is it's mentioned before drug name and after TTDs number.

2.what's the difference for two Drug(s) sessions?
I think two drug sessions are different in terms of name of drugs and disease associated with it. They both are necessary to be included because they contain disease name and approval status as well.

Then, even after that the drug s which do not contain disease name and approval that should also be included for same P112229.
# 7  
Old 12-10-2012
more questions.

1. from the output, seems only the first line has 7 columns, the rest 3 lines have only 6 column.
2. from the output, focus on second line, why you get "Inhibitor "
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

How to fetch specific data from a file.?

Hi , I have a file which contains 2 days logs(here it is 24 and 25) I want to list data only for date 25 fron the file. please suggest me how should i get this. file content mentioned below 17-05-24 Name Succ Fail 00:00:29 ... (5 Replies)
Discussion started by: scriptor
5 Replies

2. Shell Programming and Scripting

Fetch entries with specific pattern

Hi all, I have following sample input file which is a part of big file: ID AINX_HUMAN Reviewed; 499 AA. AC Q16352; B1AQK0; Q9BRC5; DT 30-MAY-2000, integrated into UniProtKB/Swiss-Prot. DT 23-JAN-2002, sequence version 2. DT 28-NOV-2012, entry version 123.... (2 Replies)
Discussion started by: kareena
2 Replies

3. Shell Programming and Scripting

Request to check:Fetch certain entries

Hi all. Kindly check it it's urgent!! I have one big file from which which I have to fetch certain data I have attached a small part of this file. from the attached file, I have to fetch and arrange data in 3 columns 1 Generic name 2. Brand names 3. Drug... (10 Replies)
Discussion started by: manigrover
10 Replies

4. Shell Programming and Scripting

Fetch entries in front of specific word till next word

Hi all I have following file which I have to edit for research purpose file:///tmp/moz-screenshot.png body, div, table, thead, tbody, tfoot, tr, th, td, p { font-family: "Liberation Sans"; font-size: x-small; } Drug: KRP-104 QD Drug: Placebo Drug: Metformin|Drug:... (15 Replies)
Discussion started by: Priyanka Chopra
15 Replies

5. Shell Programming and Scripting

Search specific name in a file and fetch specific entries

Hi all, I have 2 files, One file contain data like this FHIT CS CHRM1 PDE3A PDE3B HSP90AA1 PTK2 HTR1A ESR1 PARP1 PLA2G1B These names are mentioned in the second file(Please see attached second file) as (7 Replies)
Discussion started by: manigrover
7 Replies

6. Shell Programming and Scripting

Urgent request to consider:Search specific name in a file and fetch specific entries

Hi all, I have 2 files, One file contain data like this FHIT CS CHRM1 PDE3A PDE3B HSP90AA1 PTK2 HTR1A ESR1 PARP1 PLA2G1B These names are mentioned in the second file(Please see attached second file) as # Drug_Target_X_Gene_Name:(Where X can be any number (1-1000) (1 Reply)
Discussion started by: manigrover
1 Replies

7. Shell Programming and Scripting

How to fetch specific fields

Dear Friends, Please provide some commands to fecth specific filed (data yellow color) from below data.. Input data 2648: 1;20120707;3591|4;20290107;90|5;20290107;3|9;20120705;0|10;20120705;0|16;20290113;15|29;20120705;0 2658: 1;20120722;0|4;20290422;1200|9;20120705;0|10;20120705;0 2646:... (4 Replies)
Discussion started by: suresh3566
4 Replies

8. Shell Programming and Scripting

fetch last line no form file which is match with specific pattern by grep command

Hi i have a file which have a pattern like this Nov 10 session closed Nov 10 Nov 9 08:14:27 EST5EDT 2010 on tty . Nov 10 Oct 19 02:14:21 EST5EDT 2010 on pts/tk . Nov 10 afrtetryytr Nov 10 session closed Nov 10 Nov 10 03:21:04 EST5EDT 2010 Dec 8 Nov 10 05:03:02 EST5EDT 2010 ... (13 Replies)
Discussion started by: Himanshu_soni
13 Replies

9. Shell Programming and Scripting

How to fetch a specific line from file

Hi, I have text file in the following strucher . The files contain hondreds of lines. value1;value2;value3;value4 I would like to get back the line with lowest date (values4 field). In this case its line number 3. groupa;Listener;1;20110120162018 groupb;Database;0;20110201122641... (4 Replies)
Discussion started by: yoavbe
4 Replies

10. Shell Programming and Scripting

To fetch specific words from a file

Hi All, I have a file like this,(This is a sql output file) cat query_file 200000029 12345 10001 0.2 0 I want to fetch the values 200000029,10001,0.2 .I tried using the below code but i could get... (2 Replies)
Discussion started by: girish.raos
2 Replies
Login or Register to Ask a Question