Fetch entries in front of specific word till next word


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Fetch entries in front of specific word till next word
# 1  
Old 11-02-2012
Fetch entries in front of specific word till next word

Hi all

I have following file which I have to edit for research purpose
Code:
[IMG]file:///tmp/moz-screenshot.png[/IMG]    	 	 	 	 	body, div, table, thead, tbody, tfoot, tr, th, td, p { font-family: "Liberation Sans"; font-size: x-small; } 	   	 	 		 			Drug: KRP-104 QD Drug: Placebo Drug: Metformin|Drug: Placebo Drug: Metformin|Drug: KRP-104 BID Drug: Placebo Drug: Metformin    Phase 2
Drug: Dapagliflozin    Phase 1
Drug: MK-3102|Drug: Matching placebo to MK-3102|Drug: Basal medication    Phase 3
Dietary Supplement: Vitamin C|Drug: glyburide    Phase 1
Drug: Insulin glargine new formulation (HOE901)|Drug: Insulin glargine (HOE901)    Phase 3
Drug: Pioglitazone|Drug: Placebo|Drug: Pioglitazone|Drug: Placebo    Phase 4
Drug: Metformin HCl and Colesevelam Placebo|Drug: Metformin HCl tablets and Colesevelam tablets|Drug: Colesevelam placebo|Drug: Colesevelam    Phase 3
Drug: Insulin-Levemir|Drug: Exenatide-Bayetta|Drug: Insulin-Levemir and Exenatide-Bayetta|Device: SenseWear Pro3® armband|Device: DexCom CGM    Phase 4
Drug: exenatide once weekly|Drug: metformin|Drug: sitagliptin|Drug: pioglitazone    Phase 3
Drug: intensive insulin group|Drug: Oral AntiDiabetic Drug (glimepiride and metformin)    Phase 4
Drug: LY2189265|Drug: Sulfonylureas (SU)|Drug: Biguanides|Drug: Thiazolidinedione (TZD)|Drug: alpha-glucosidase inhibitor (a-GI)|Drug: Glinides    Phase 3
Drug: Insulin glargine new formulation (HOE901)|Drug: Insulin glargine (HOE901)    Phase 3
Drug: placebo|Drug: exenatide|Drug: exenatide    Phase 3
Drug: Vildagliptin (LAF237)|Drug: Voglibose|Drug: Vildagliptin and Voglibose    Phase 4
Drug: pioglitazone|Drug: insulin glargine    Phase 4
Drug: GSK189075 oral tablets|Drug: metformin tablets    Phase 1
Drug: Vildagliptin|Drug: Metformin|Drug: Vildagliptin + Metformin    Phase 3
Drug: Insulin glargine plus insulin analogues    Phase 4
Drug: Glipizide|Drug: Metformin    Phase 4
Drug: vildagliptin|Drug: Metformin Comparator    Phase 3
Drug: Dapagliflozin|Drug: Placebo matching Dapagliflozin    Phase 3
Drug: vildagliptin|Drug: Gliclazide    Phase 3
Drug: GSK1614235|Drug: Sitagliptin|Other: Placebo    Phase 1
Drug: Pioglitazone (Actos)|Drug: Anti-diabetic agent other than pioglitazone or rosiglitazone    Phase 1
Drug: Vildagliptin 100 mg qd|Drug: Metformin 1500 mg daily    Phase 3
Drug: Alogliptin and glimepiride|Drug: Alogliptin and glimepiride|Drug: Alogliptin and metformin|Drug: Alogliptin and metformin    Phase 2|Phase 3


I have to separate entries in a different file in such a way that it contains only names of drugs and phase in front of it so that expeected output is words after Drug: till next Drug: will start in the same row

as for first row expected output mentioend here


Code:
KRP-104 QD  Phase 2
Placebo         Phase 2
Metformin     Phase 2
Placebo         Phase 2
Metformin      Phase 2
KRP-104 BID    Phase 2
 Placebo            Phase 2
Metformin    Phase 2

# 2  
Old 11-02-2012
Assuming you want only last Phase of the drug.

try

Code:
awk -F "Drug:" '{gsub("\\|","",$0);n=split($NF,P," +");for(i=2;i<NF;i++){print $i,P[n-1],P[n]};print $NF}' file


Last edited by pamu; 11-02-2012 at 06:40 AM..
# 3  
Old 11-02-2012
Since the last line of your example shows that there can be more than 1 phase :

Code:
sed "s/^Drug: //;s/Drug: /|/g;s/ *||* */|/g;s/    /@/g" yourfile | awk -F"@" '{n=split($1,d,"|");m=split($2,p,"|");for(i=1;i<=m;i++) for(j=1;j<=n;j++) {print d[j]":"p[i]}}'

or in awk only (and handling the case of a line like Drug: Drug1|Drug: Drug2|Drug: Drug3 Phase 1|Phase 421|Phase 69)

Code:
awk -F"    " '{
s="Drug: "
sub(s,z)
gsub("[|]*"s,"|")
n=split($1,d,"|")
m=split($2,p,"|")
for(i=1;i<=m;i++)
    for(j=1;j<=n;j++)
        print d[j]":"p[i]
}' yourfile


Last edited by ctsgnb; 11-02-2012 at 07:35 AM..
# 4  
Old 11-04-2012
Hi all

Thanks for reply.

But it doesnt seem to be working properly. I think there I have to explain a bit more.

As my input file is like this,

Code:
Drug: Dapagliflozin    Phase 1
Drug: MK-3102|Drug: Matching placebo to MK-3102|Drug: Basal medication    Phase 3
Dietary Supplement: Vitamin C|Drug: glyburide    Phase 1
Drug: Insulin glargine new formulation (HOE901)|Drug: Insulin glargine (HOE901)    Phase 3
Drug: Pioglitazone|Drug: Placebo|Drug: Pioglitazone|Drug: Placebo    Phase 4
Drug: Metformin HCl and Colesevelam Placebo|Drug: Metformin HCl tablets  and Colesevelam tablets|Drug: Colesevelam placebo|Drug: Colesevelam     Phase 3
Drug: Insulin-Levemir|Drug: Exenatide-Bayetta|Drug: Insulin-Levemir and  Exenatide-Bayetta|Device: SenseWear Pro3® armband|Device: DexCom CGM     Phase 4
Drug: exenatide once weekly|Drug: metformin|Drug: sitagliptin|Drug: pioglitazone    Phase 3
Drug: intensive insulin group|Drug: Oral AntiDiabetic Drug (glimepiride and metformin)    Phase 4
Drug: LY2189265|Drug: Sulfonylureas (SU)|Drug: Biguanides|Drug:  Thiazolidinedione (TZD)|Drug: alpha-glucosidase inhibitor (a-GI)|Drug:  Glinides    Phase 3
Drug: Insulin glargine new formulation (HOE901)|Drug: Insulin glargine (HOE901)    Phase 3
Drug: placebo|Drug: exenatide|Drug: exenatide    Phase 3
Drug: Vildagliptin (LAF237)|Drug: Voglibose|Drug: Vildagliptin and Voglibose    Phase 4
Drug: pioglitazone|Drug: insulin glargine    Phase 4
Drug: GSK189075 oral tablets|Drug: metformin tablets    Phase 1
Drug: Vildagliptin|Drug: Metformin|Drug: Vildagliptin + Metformin    Phase 3
Drug: Insulin glargine plus insulin analogues    Phase 4
Drug: Glipizide|Drug: Metformin    Phase 4
Drug: vildagliptin|Drug: Metformin Comparator    Phase 3
Drug: Dapagliflozin|Drug: Placebo matching Dapagliflozin    Phase 3
Drug: vildagliptin|Drug: Gliclazide    Phase 3
Drug: GSK1614235|Drug: Sitagliptin|Other: Placebo    Phase 1
Drug: Pioglitazone (Actos)|Drug: Anti-diabetic agent other than pioglitazone or rosiglitazone    Phase 1
Drug: Vildagliptin 100 mg qd|Drug: Metformin 1500 mg daily    Phase 3
Drug: Alogliptin and glimepiride|Drug: Alogliptin and glimepiride|Drug:  Alogliptin and metformin|Drug: Alogliptin and metformin    Phase 2|Phase  3

I have to fragment each sentence in a way that Drugs get separated with phase mentioned in front of it

here is example for line 2

Drug: MK-3102|Drug: Matching placebo to MK-3102|Drug: Basal medication Phase 3

expected output
Code:
MK-3102                                               Phase 3
Matching placebo to MK-3102            Phase 3
Basal medication                                   Phase

3


For lastline expected output is

Drug: Alogliptin and glimepiride|Drug: Alogliptin and glimepiride|Drug: Alogliptin and metformin|Drug: Alogliptin and metformin Phase 2|Phase 3

Code:
Alogliptin and glimepiride         Phase 2|Phase  3
Alogliptin and glimepiride          Phase 2|Phase  3
Alogliptin and metformin           Phase 2|Phase  3
Alogliptin and metformin           Phase 2|Phase  3

so anyhting between Drug: and the symbol | get separated with phase mentioned in front of line in second column.

Although I dont want duplicates as present in second line but if it is there I can manage. But good nto to have duplicates
Code:
Alogliptin and glimepiride         Phase 2|Phase  3
Alogliptin and metformin           Phase 2|Phase  3

But I can use another programm fo rthat lateron but separation is a bit difficult
# 5  
Old 11-05-2012
Slight modification to ctsgnb's code..Smilie


Code:
sed "s/^Drug: //;s/Drug: /|/g;s/ *||* */|/g;s/    /@/g" file | awk -F"@" '{n=split($1,d,"|");for(j=1;j<=n;j++) {print d[j]":"p[i],$2}}'

# 6  
Old 11-05-2012
Hi all,

Thanks for reply.

There seems to be soem error stillas the out put is like this
Code:
Drug: Ramipril: 
Drug: Placebo: 
Drug: Placebo    Phase 3: 
Drug: Etanercept: 
Drug: Placebo    Phase 1: 
Phase 2: 
Drug: 1,25-dihydroxy-vitamin D3 (calcitriol): 
Drug: placebo    Phase 2: 
Drug: Pro insulin peptide: 
Drug: Pro insulin peptide: 
Drug: Saline    Phase 1: 
Phase 2: 
Procedure: Islet transplant: 
Drug: Deoxyspergualin: 
Drug: Antithymocyte globulin: 
Drug: Daclizumab or basiliximab: 
Drug: Sirolimus: 
Drug: Tacrolimus: 
Drug: Etanercept    Phase 2: 
Drug: TAK-329: 
Drug: TAK-329: 
Drug: Insulin: 
Drug: Placebo    Phase 1: 
Drug: Exenatide: 
Drug: Rapid and long acting insulin: 
Drug: long acting insulin + rapid acting + 1.25 mcg Exenatide    Phase 4: 
Drug: Insulin glargine (HOE901): 
Drug: NPH insulin    Phase 3: 
Procedure: Islet transplant: 
Drug: Belatacept: 
Drug: Basiliximab: 
Drug: Mycophenolate Mofetil    Phase 2: 
Drug: Insulin glargine new formulation (HOE901): 
Drug: Insulin glargine (HOE901)    Phase 2: 
Drug: Insulin glargine new formulation (HOE901): 
Drug: Insulin glargine (HOE901)    Phase 3: 
Drug: Insulin glargine new formulation (HOE901): 
Drug: Insulin glargine (HOE901) (Lantus)    Phase 3: 
Drug: insulin detemir: 
Drug: insulin NPH: 
Drug: insulin aspart    Phase 3: 
Procedure: Islet Transplant:

# 7  
Old 11-05-2012
Might there is difference between your input file and given file..

try this


Code:
sed "s/^Drug: //;s/Drug: /|/g;s/ *||* */|/g;s/ *Phase/@Phase/" file | awk -F"@" '{n=split($1,d,"|");for(j=1;j<=n;j++) {print d[j]":"p[i],$2}}'

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Search for a specific word and print only the word from the input file

Hi, I have a sample file as shown below, I am looking for sed or any command which prints the complete word only from the input file. Ex: $ cat "sample.log" I am searching for a word which is present in this file We can do a pattern search using grep but I need to cut only the word which... (1 Reply)
Discussion started by: mohan_kumarcs
1 Replies

2. Shell Programming and Scripting

Merge lines till a particular word

Hi Experts, I have a requirement like, I have to search between 2 words (<deviceDetails> and </deviceDetails>) and merge all lines in between into 1 line. Example: <deviceDetails><subscriberName>#UNKNOWN#</subscriberName> <customerNumber>#UNKNOWN#</customerNumber>... (5 Replies)
Discussion started by: satyaatcgi
5 Replies

3. Shell Programming and Scripting

Need a word which just comes next to after grep of a specific word

Hi, Below is an example : ST1 PREF: int1 AVAIL: int2 ST2 PREF :int1 AVAIL: int2 I need int1 to come in preferred variable while programming and int2 in available variable Please help me doing so Best regards, Vishal (10 Replies)
Discussion started by: Vishal_dba
10 Replies

4. Shell Programming and Scripting

Match the word or words and fetch the entries

Hi all, I have 7 words Now I have 1 file which contain data in large number of rows and columns and 6th column contain any of these words or may be more than one words among above 7 words: I want script should search for the above mentioned 7 words in the 6th column ... (9 Replies)
Discussion started by: manigrover
9 Replies

5. UNIX for Dummies Questions & Answers

Find EXACT word in files, just the word: no prefix, no suffix, no 'similar', just the word

I have a file that has the words I want to find in other files (but lets say I just want to find my words in a single file). Those words are IDs, so if my word is ZZZ4, outputs like aaZZZ4, ZZZ4bb, aaZZZ4bb, ZZ4, ZZZ, ZyZ4, ZZZ4.8 (or anything like that) WON'T BE USEFUL. I need the whole word... (6 Replies)
Discussion started by: chicchan
6 Replies

6. UNIX for Dummies Questions & Answers

How to print line starts with specific word and contains specific word using sed?

Hi, I have gone through may posts and dint find exact solution for my requirement. I have file which consists below data and same file have lot of other data. <MAPPING DESCRIPTION ='' ISVALID ='YES' NAME='m_TASK_UPDATE' OBJECTVERSION ='1'> <MAPPING DESCRIPTION ='' ISVALID ='NO'... (11 Replies)
Discussion started by: tmalik79
11 Replies

7. Shell Programming and Scripting

Bash take word after specific point and till next space?

Hello, I have an output like Interface Chipset Driver wlan0 Intel 4965/5xxx iwlagn - and I want to take only the 'wlan0' string. This can be done by a="Interface Chipset Driver wlan0 Intel 4965/5xxx iwlagn - " b=${a:25:6} echo $bThe thing is that wlan0 can be something else, like eth0 or... (2 Replies)
Discussion started by: hakermania
2 Replies

8. Shell Programming and Scripting

Grep out specific word and only that word

ok, so this is proving to be kind of difficult even though it should not be. say for instance I want to grep out ONLY the word fkafal from the below output, how do I do it? echo ajfjf fjfjf iafjga fkafal foeref afoafahfia | grep -w "fkafal" If i run the above command, i get back all the... (4 Replies)
Discussion started by: SkySmart
4 Replies

9. Shell Programming and Scripting

Want to add a word in front a of each line of a file

Hi, Can anybody help me how to add a word in front of a line in a file.Actually it is bit tricky to add a word. i will give a sample for this: Input : 1110001 ABC DEF 1110001 EFG HIJ 1110001 KLM NOP 1110002 QRS RST 1110002 UVW XYZ Output: %HD% 1110001 ABC DEF %DT% 1110001 EFG HIJ... (4 Replies)
Discussion started by: apjneeraj
4 Replies

10. Shell Programming and Scripting

Adding a word in front of a word of each line.

Adding a word in front of a word of each line.In that line only one word will be there. pl help:( (4 Replies)
Discussion started by: Ramesh Vellanki
4 Replies
Login or Register to Ask a Question