Remove bracket part entires and separate entries after comma


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Remove bracket part entires and separate entries after comma
# 1  
Old 11-15-2012
Remove bracket part entires and separate entries after comma

Hi all


This time my input conatin 3 columns:

Code:
 ERCC1 (PA155)    Platinum compounds (PA164713176)    Allele A is not associated with response to Platinum compounds in women with Ovarian Neoplasms as compared to allele C .
CES1 (PA107)    methylphenidate (PA450464)    Genotype CT is not associated with response to methylphenidate in children with Attention Deficit Disorder with Hyperactivity as compared to genotype CC .
BCKDK (PA134899581),PRSS53 (PA165450635),VKORC1 (PA133787052)    warfarin (PA451906)    Genotype CT is associated with increased dose of warfarin in people with Thromboembolism as compared to genotype TT .
CFH (PA29261)    bevacizumab (PA130232992)    Genotype GG is associated with decreased response to bevacizumab in people with Macular Degeneration as compared to genotype AA .
ADH1A (PA24570)    cytarabine (PA449177);fludarabine (PA449655);gemtuzumab ozogamicin (PA164749431);idarubicin (PA449961)    Genotypes CT + TT are associated with increased resistance to cytarabine, fludarabine, gemtuzumab ozogamicin and idarubicin in people with Leukemia, Myeloid, Acute as compared to genotype CC .
CYP2C19 (PA124)    clopidogrel (PA449053)    Genotype CT is associated with increased response to clopidogrel in people with Coronary Artery Disease as compared to genotype CC .

The epected output are 3 columns

Code:
ERCC1    Platinum compounds   Allele A is not  associated with response to Platinum compounds in women with Ovarian  Neoplasms as compared to allele C .
CES1    methylphenidate    Genotype CT is not  associated with response to methylphenidate in children with Attention  Deficit Disorder with Hyperactivity as compared to genotype CC .
BCKDK   warfarin     Genotype CT is associated with increased dose of  warfarin in people with Thromboembolism as compared to genotype TT .
PRSS53  warfarin     Genotype CT is associated with increased dose of  warfarin in people with Thromboembolism as compared to genotype TT .
VKORC1   warfarin     Genotype CT is associated with increased dose of  warfarin in people with Thromboembolism as compared to genotype TT .
CFH         bevacizumab    Genotype GG is associated  with decreased response to bevacizumab in people with Macular  Degeneration as compared to genotype AA .
ADH1A     cytarabine  Genotypes CT + TT are associated with increased resistance to  cytarabine, fludarabine, gemtuzumab ozogamicin and idarubicin in people  with Leukemia, Myeloid, Acute as compared to genotype CC 
ADH1A  fludarabine  Genotypes CT + TT are associated with increased resistance to  cytarabine, fludarabine, gemtuzumab ozogamicin and idarubicin in people  with Leukemia, Myeloid, Acute as compared to genotype CC 
ADH1A  gemtuzumab ozogamicin  Genotypes CT + TT are associated with increased resistance to  cytarabine, fludarabine, gemtuzumab ozogamicin and idarubicin in people  with Leukemia, Myeloid, Acute as compared to genotype CC 
ADH1A  idarubicin     Genotypes CT + TT are associated with increased resistance to  cytarabine, fludarabine, gemtuzumab ozogamicin and idarubicin in people  with Leukemia, Myeloid, Acute as compared to genotype CC .
CYP2C19    clopidogrel   Genotype CT is associated  with increased response to clopidogrel in people with Coronary Artery  Disease as compared to genotype CC .

first I have to remove entries within brackets () starting from PAxxx and if in first column if an entry is present after comma in first column and after semicolon in second column, I want to make a duplicate row fo rthis

Last edited by Priyanka Chopra; 11-15-2012 at 07:05 AM..
# 2  
Old 11-15-2012
What happens when there is a comma in the first column AND a semicolon in the second column?
Are the columns TAB-separated?
What have you tried so far?
# 3  
Old 11-15-2012
Hi

Kindly find attached file which describes about comma in first column and semicolon in second column

I have tried to remove brackets but wasnt abel to remove entries and not at all done with comma and semicolon separation

Code:
awk -v p="[()]" '{
for(i=1;i<=NF;i++)
 if(gsub(p,"",$i))
  if(++a[$i]>1) 
   $i=""
for(i in a)
 delete a[i]
}1' infile


Last edited by Scrutinizer; 11-15-2012 at 07:15 AM..
# 4  
Old 11-15-2012
What do you want to do with this:

Code:
PRSS53 (PA165450635),VKORC1 (PA133787052)       warfarin (PA451906)     Allele T is associated with increased dose of warfarin .
PRSS53 (PA165450635),VKORC1 (PA133787052)       warfarin (PA451906)     Genotype AA is associated with increased dose of warfarin .
        warfarin (PA451906)     Allele T is associated with increased dose of warfarin .
        warfarin (PA451906)     Allele T is associated with increased dose of warfarin .
        warfarin (PA451906)     Allele G is associated with increased dose of warfarin .
        warfarin (PA451906)     Allele G is associated with increased dose of warfarin .

and with this:
Code:
UGT1A1 (PA420),UGT1A10 (PA37174),UGT1A3 (PA37178),UGT1A4 (PA37179),UGT1A5 (PA37180),UGT1A6 (PA37181),UGT1A7 (PA37182),UGT1A8 (PA37183),UGT1A9 (PA419)	fluorouracil (PA128406956);irinotecan (PA450085);leucovorin (PA450198)	Genotype (TA)7/(TA)7 is associated with increased likelihood of Neutropenia when treated with fluorouracil, irinotecan and leucovo

# 5  
Old 11-15-2012
Code:
PRSS53 (PA165450635),VKORC1 (PA133787052)       warfarin (PA451906)     Allele T is associated with increased dose of warfarin .
PRSS53 (PA165450635),VKORC1 (PA133787052)       warfarin (PA451906)     Genotype AA is associated with increased dose of warfarin .
        warfarin (PA451906)     Allele T is associated with increased dose of warfarin .
        warfarin (PA451906)     Allele T is associated with increased dose of warfarin .
        warfarin (PA451906)     Allele G is associated with increased dose of warfarin .
        warfarin (PA451906)     Allele G is associated with increased dose of warfarin .

This one I want o remove bracket part and make duplicate row for entry after comma

Code:
PRSS53   warfarin   Allele T is associated with increased dose of warfarin .

VKORC1    warfarin   Allele T is associated with increased dose of warfarin .


PRSS53   warfarin   Allele T is associated with increased dose of warfarin .

VKORC1    warfarin   Allele T is associated with increased dose of warfarin .

ANd next one in same way but I appreciat u reminde me abu this:


Code:
UGT1A1  fluorouracil Genotype (TA)7/(TA)7 is associated with increased likelihood of Neutropenia when treated with fluorouracil, irinotecan and leucovo

UGT1A10 fluorouracil Genotype (TA)7/(TA)7 is associated with increased likelihood of Neutropenia when treated with fluorouracil, irinotecan and leucovo
UGT1A3 fluorouracil  Genotype (TA)7/(TA)7 is associated with increased likelihood of Neutropenia when treated with fluorouracil, irinotecan and leucovo
UGT1A4 fluorouracil  Genotype (TA)7/(TA)7 is associated with increased likelihood of Neutropenia when treated with fluorouracil, irinotecan and leucovo
UGT1A5 fluorouracil  Genotype (TA)7/(TA)7 is associated with increased likelihood of Neutropenia when treated with fluorouracil, irinotecan and leucovo
UGT1A6  fluorouracil  Genotype (TA)7/(TA)7 is associated with increased likelihood of Neutropenia when treated with fluorouracil, irinotecan and leucovo
UGT1A7 fluorouracil  Genotype (TA)7/(TA)7 is associated with increased likelihood of Neutropenia when treated with fluorouracil, irinotecan and leucovo
UGT1A8 fluorouracil  Genotype (TA)7/(TA)7 is associated with increased likelihood of Neutropenia when treated with fluorouracil, irinotecan and leucovo
UGT1A9 fluorouracil  Genotype (TA)7/(TA)7 is associated with increased likelihood of Neutropenia when treated with fluorouracil, irinotecan and leucovo


UGT1A1  irinotecan         Genotype (TA)7/(TA)7 is associated with increased likelihood of Neutropenia when treated with fluorouracil, irinotecan and leucovo

UGT1A3 irinotecan Genotype (TA)7/(TA)7 is associated with increased likelihood of Neutropenia when treated with fluorouracil, irinotecan and leucovo

UGT1A10 irinotecan  Genotype (TA)7/(TA)7 is associated with increased likelihood of Neutropenia when treated with fluorouracil, irinotecan and leucovo

UGT1A4   irinotecan  Genotype (TA)7/(TA)7 is associated with increased likelihood of Neutropenia when treated with fluorouracil, irinotecan and leucovo

UGT1A7     irinotecan  Genotype (TA)7/(TA)7 is associated with increased likelihood of Neutropenia when treated with fluorouracil, irinotecan and leucovo

UGT1A5      irinotecan    Genotype (TA)7/(TA)7 is associated with increased likelihood of Neutropenia when treated with fluorouracil, irinotecan and leucovo

UGT1A6          irinotecan     Genotype (TA)7/(TA)7 is associated with increased likelihood of Neutropenia when treated with fluorouracil, irinotecan and leucovo

UGT1A1        leucovorin     Genotype (TA)7/(TA)7 is associated with increased likelihood of Neutropenia when treated with fluorouracil, irinotecan and leucovo	

UGT1A3      leucovorin 	 Genotype (TA)7/(TA)7 is associated with increased likelihood of Neutropenia when treated with fluorouracil, irinotecan and leucovo

UGT1A10      leucovorin 	Genotype (TA)7/(TA)7 is associated with increased likelihood of Neutropenia when treated with fluorouracil, irinotecan and leucovo	
 
UGT1A4  leucovorin 	 Genotype (TA)7/(TA)7 is associated with increased likelihood of Neutropenia when treated with fluorouracil, irinotecan and leucovorin	

UGT1A7     leucovorin 	 Genotype (TA)7/(TA)7 is associated with increased likelihood of Neutropenia when treated with fluorouracil, irinotecan and leucovo	

UGT1A5        leucovorin  Genotype (TA)7/(TA)7 is associated with increased likelihood of Neutropenia when treated with fluorouracil, irinotecan and leucovo	

UGT1A6   leucovorin  Genotype (TA)7/(TA)7 is associated with increased likelihood of Neutropenia when treated with fluorouracil, irinotecan and leucovo

---------- Post updated at 10:37 PM ---------- Previous update was at 08:57 AM ----------

Can anyone please check above question and code which I tried as I need it urgently.
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Remove bracket including text inside with sed

Hello, I could not remove brackets with text contents myfile: Please remove the bracket with text I wish to remove: I tried: sed 's/\//' myfile It gives: Please remove the bracket with text A1 I expect: Please remove the bracket with text Many thanks Boris (2 Replies)
Discussion started by: baris35
2 Replies

2. Shell Programming and Scripting

How to remove contents from file which are under bracket?

hello Friend, In hostgroup file, i have define lots of hostgroups. I need to remove few of them without manually editing file. Need script or syntax. I want to search particular on hostgroup_members and delete hostgoup defination of it. for example. define hostgroup{ hostgroup_name... (8 Replies)
Discussion started by: ghpradeep
8 Replies

3. UNIX for Advanced & Expert Users

Comma separate issue in UNIX

In awk the field seprator is not working properly, I am trying to cut the fields from the file based on the delimiter example comma (,) awk -F, "{print {$1 FS $3 FS $5 FS FS $2}}" Sample.csv But i am not getting desired output can anyone help me how to check real ascii comma there in my... (9 Replies)
Discussion started by: rspwilliam
9 Replies

4. Shell Programming and Scripting

Separate Entries after comma

Hi All I need help to separate entries after commas in my I have 2 columns in my file like this Ramush, Shyam, Mohan First Ram, Mohan, Kaavya Second, Fourth Kavi, Ram, Shaym, Mohan Third I ahve to separate entries after comma in a separate row... (9 Replies)
Discussion started by: kareena
9 Replies

5. Shell Programming and Scripting

Remove bracket part

Hi I have to remove in a file in first column whatever is written in brackets with brackets so one file hgfd 123 gfhdj 483 jdgfdg 34738 the output shuld be hgfd 123 gfhdj 483 jdgfdg 34738 (9 Replies)
Discussion started by: manigrover
9 Replies

6. Shell Programming and Scripting

ksh, difference between double bracket and single bracket

Can somebody tell me the difference between double brackets and single brackets, when doing a test. I have always been acustomed to using single brackets and have not encountered any issues to date. Why would somebody use double brackets. Ie if ] vs if Thanks to... (2 Replies)
Discussion started by: BeefStu
2 Replies

7. Shell Programming and Scripting

using diff to on two file but ignoring the last comma separate value

Hi guys I have two file which I sdiff. ie file 1: AA,12,34,56,,789,101,,6666 file 2: AA,12,34,56,,789,101,,7777 The last comma separated value will always change from one day to the next. Is there another unix utility I can use that will sdiff two files but ignore the last comma... (1 Reply)
Discussion started by: wny201
1 Replies

8. Solaris

How to remove setfacl entires for a file

I gave the permission for a user using Setfacl as folllows setfacl -m u:user:rwx,m:rwx /home/master To a home path of a master user When i try to remove the same it showing as $ setfacl -r s /home/master usage: setfacl -f aclfile file ... setfacl -d... (2 Replies)
Discussion started by: GIC1986
2 Replies

9. Shell Programming and Scripting

count data separate by comma

hi experts, i have some problem with count data which separate by comma, below sample data : 01,011222823b6d,011222823f29,0028a5,002993,6212345678, 659111111111,6598204507,6281105008,6596197849,_,525016160836958,_, ffffffff,000000000000000000000000,_,_,_,fd,fd,ff,00,1,0028a5-002993,_,... (10 Replies)
Discussion started by: bucci
10 Replies
Login or Register to Ask a Question