Separate certain entries from a very big file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Separate certain entries from a very big file
# 1  
Old 05-13-2012
Separate certain entries from a very big file

Hello

I have to separate certain entries from a Big file with so many drugs and description

I want to seaprate only Drug name which is mentioned as

Code:
#BEGIN_DRUGCARD DB00001

(means first drug description initiated) ..same way DB00002...and so on

and in description I have to separate
Code:
Code:
# Drug_Target_1_Name:
# Drug_Target_1_GenBank_ID_Gene: # Drug_Target_1_GenBank_ID_Protein:

or 2,3, if also mentioned.

So that in the out put file

I will get
Code:
#BEGIN_DRUGCARD DB00001             Drug_Target_1_Name(whole name                                                     is mentioned
                                                               # Drug_Target_1_GenBank_ID_Gene:
# Drug_Target_1_GenBank_ID_Protein:

And, than


#BEGIN_DRUGCARD DB00001 same number of targets mentioned with Gen Bank ID of geen and protein

Please let me know any programm if possible I have attached a sample file.Kindly check it

Thanks
Mani
# 2  
Old 05-13-2012
Try this:

Code:
awk '/^#BEGIN_/||/^# Drug_Target_[1-9]/' infile

# 3  
Old 05-13-2012
Request to check: how to find exact entires before that and put in next column

Hello

Thanks for the reply and help regarding scripts.. after running the above mentioned script I m gettign following result

awk '/^#BEGIN_/||/^# Drug_Target_[1-9]/' infile

Code:
#BEGIN_DRUGCARD DB00001
# Drug_Target_1_Cellular_Location:
# Drug_Target_1_Chromosome_Location:
# Drug_Target_1_Drug_References:
# Drug_Target_1_Essentiality:
# Drug_Target_1_GenAtlas_ID:
# Drug_Target_1_GenBank_ID_Gene:
# Drug_Target_1_GenBank_ID_Protein:
# Drug_Target_1_GeneCard_ID:
# Drug_Target_1_Gene_Name:
# Drug_Target_1_Gene_Sequence:
# Drug_Target_1_General_Function:
# Drug_Target_1_General_References:
# Drug_Target_1_HGNC_ID:
# Drug_Target_1_HPRD_ID:
# Drug_Target_1_ID:
# Drug_Target_1_Locus:
# Drug_Target_1_Molecular_Weight:
# Drug_Target_1_Name:
# Drug_Target_1_Number_of_Residues:
# Drug_Target_1_PDB_ID:
# Drug_Target_1_Pathway:
# Drug_Target_1_Pfam_Domain_Function:
# Drug_Target_1_Protein_Sequence:
# Drug_Target_1_Reaction:
# Drug_Target_1_Signals:
# Drug_Target_1_Specific_Function:
# Drug_Target_1_SwissProt_ID:
# Drug_Target_1_SwissProt_Name:
# Drug_Target_1_Synonyms:
# Drug_Target_1_Theoretical_pI:
# Drug_Target_1_Transmembrane_Regions:
#BEGIN_DRUGCARD DB00002
# Drug_Target_10_Cellular_Location:
# Drug_Target_10_Chromosome_Location:
# Drug_Target_10_Drug_References:
# Drug_Target_10_Essentiality:
# Drug_Target_10_GenAtlas_ID:
# Drug_Target_10_GenBank_ID_Gene:
# Drug_Target_10_GenBank_ID_Protein:
# Drug_Target_10_GeneCard_ID:
# Drug_Target_10_Gene_Name:
# Drug_Target_10_Gene_Sequence:
# Drug_Target_10_General_Function:
# Drug_Target_10_General_References:
# Drug_Target_10_HGNC_ID:
# Drug_Target_10_HPRD_ID:
# Drug_Target_10_ID:
# Drug_Target_10_Locus:
# Drug_Target_10_Molecular_Weight:
# Drug_Target_10_Name:
# Drug_Target_10_Number_of_Residues:
# Drug_Target_10_PDB_ID:
# Drug_Target_10_Pathway:
# Drug_Target_10_Pfam_Domain_Function:
# Drug_Target_10_Protein_Sequence:
# Drug_Target_10_Reaction:
# Drug_Target_10_Signals:
# Drug_Target_10_Specific_Function:
# Drug_Target_10_SwissProt_ID:
# Drug_Target_10_SwissProt_Name:
# Drug_Target_10_Synonyms:
# Drug_Target_10_Theoretical_pI:
# Drug_Target_10_Transmembrane_Regions:
# Drug_Target_11_Cellular_Location:
# Drug_Target_11_Chromosome_Location:
# Drug_Target_11_Drug_References:
# Drug_Target_11_Essentiality:
# Drug_Target_11_GenAtlas_ID:
# Drug_Target_11_GenBank_ID_Gene:
# Drug_Target_11_GenBank_ID_Protein:
# Drug_Target_11_GeneCard_ID:
# Drug_Target_11_Gene_Name:
# Drug_Target_11_Gene_Sequence:
# Drug_Target_11_General_Function:
# Drug_Target_11_General_References:
# Drug_Target_11_HGNC_ID:
# Drug_Target_11_HPRD_ID:
# Drug_Target_11_ID:
# Drug_Target_11_Locus:


But I want output shuld contain the entries mentioned after genbank ID and Genbank protein and proteinf name

so output can be


DRUGCARD DB00001 Drug_Target_1_GenBank_ID_Gene:0000(wahtever number)

# Drug_Target_1_GenBank_ID_ProteinSmiliewhatever ID)


# Drug_Target_1_Gene_Name: (the name mentioned)

And if I can get in different column these entries than it will be very easy to recoginse and arrange whole list of all Drug cards.

Please let me know if u have any idea.

Thanks
Mani
# 4  
Old 05-14-2012
How about this then:

Code:
awk '
/^#*BEGIN_/{gsub(/^#*BEGIN_/,"",$0);gsub(/\n/,"",$0);N=$0;A=x}
A&&/^ Drug_Target_[1-9]*_(Gene_Name|GenBank_ID_Protein)/&&gsub(/\n/,"",$0) { print "# " $0 }
N&&/^ Drug_Target_[1-9]*_GenBank_ID/{gsub(/\n/,"",$0);print N,$0; N=x;A=1}' RS='\n#' infile

Login or Register to Ask a Question

Previous Thread | Next Thread

8 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Print the overlapping entries in 2 files to separate file

I have two files that contain overlapping positions. i want to put them together each overlapping entries in both files in to a new file (the entries of first file first and the entries of second file next) followed by blank line then next overlapping entries and so on. input1 chr1 22 ... (10 Replies)
Discussion started by: raj_k
10 Replies

2. Shell Programming and Scripting

Match first column and separate entries

Hi I have 2 big files containing following information: file 1 12345 345634 217341 87234693 8236493 file 2: 12345 1237 (6 Replies)
Discussion started by: kaav06
6 Replies

3. Shell Programming and Scripting

Separate Entries after comma

Hi All I need help to separate entries after commas in my I have 2 columns in my file like this Ramush, Shyam, Mohan First Ram, Mohan, Kaavya Second, Fourth Kavi, Ram, Shaym, Mohan Third I ahve to separate entries after comma in a separate row... (9 Replies)
Discussion started by: kareena
9 Replies

4. Shell Programming and Scripting

separate old entries

Hi I have a file Stomach qwe wer qwew Liver sdfjk shdf jkasfhd I want expected out shuld be in such a way that bold letters shuld comein front of non bold letter qwe Stomach wer Stomach qwew Stomach sdfjk Liver shdf Liver... (8 Replies)
Discussion started by: manigrover
8 Replies

5. Shell Programming and Scripting

Count and separate entries with N/A mentioned in front

Hi all, I have afile with following data I want to separate, count the entries with N/A in front of it so I will have all the entries with N/A in front seprate file . so output shuld be (7 Replies)
Discussion started by: manigrover
7 Replies

6. Shell Programming and Scripting

Extract certain entries from big file:Request to check

Hi all I have a big file which I have attached here. And, I have to fetch certain entries and arrange in 5 columns Name Drug DAP ID disease approved or notIn the attached file data is arranged with tab separated columns in this way: and other data is... (2 Replies)
Discussion started by: manigrover
2 Replies

7. UNIX for Dummies Questions & Answers

How big is too big a config.log file?

I have a 5000 line config.log file with several "maybe" errors. Any reccomendations on finding solvable problems? (2 Replies)
Discussion started by: NeedLotsofHelp
2 Replies

8. UNIX for Dummies Questions & Answers

How to view a big file(143M big)

1 . Thanks everyone who read the post first. 2 . I have a log file which size is 143M , I can not use vi open it .I can not use xedit open it too. How to view it ? If I want to view 200-300 ,how can I implement it 3 . Thanks (3 Replies)
Discussion started by: chenhao_no1
3 Replies
Login or Register to Ask a Question