Sponsored Content
Top Forums Shell Programming and Scripting Separate certain entries from a very big file Post 302639969 by manigrover on Sunday 13th of May 2012 10:33:03 PM
Old 05-13-2012
Request to check: how to find exact entires before that and put in next column

Hello

Thanks for the reply and help regarding scripts.. after running the above mentioned script I m gettign following result

awk '/^#BEGIN_/||/^# Drug_Target_[1-9]/' infile

Code:
#BEGIN_DRUGCARD DB00001
# Drug_Target_1_Cellular_Location:
# Drug_Target_1_Chromosome_Location:
# Drug_Target_1_Drug_References:
# Drug_Target_1_Essentiality:
# Drug_Target_1_GenAtlas_ID:
# Drug_Target_1_GenBank_ID_Gene:
# Drug_Target_1_GenBank_ID_Protein:
# Drug_Target_1_GeneCard_ID:
# Drug_Target_1_Gene_Name:
# Drug_Target_1_Gene_Sequence:
# Drug_Target_1_General_Function:
# Drug_Target_1_General_References:
# Drug_Target_1_HGNC_ID:
# Drug_Target_1_HPRD_ID:
# Drug_Target_1_ID:
# Drug_Target_1_Locus:
# Drug_Target_1_Molecular_Weight:
# Drug_Target_1_Name:
# Drug_Target_1_Number_of_Residues:
# Drug_Target_1_PDB_ID:
# Drug_Target_1_Pathway:
# Drug_Target_1_Pfam_Domain_Function:
# Drug_Target_1_Protein_Sequence:
# Drug_Target_1_Reaction:
# Drug_Target_1_Signals:
# Drug_Target_1_Specific_Function:
# Drug_Target_1_SwissProt_ID:
# Drug_Target_1_SwissProt_Name:
# Drug_Target_1_Synonyms:
# Drug_Target_1_Theoretical_pI:
# Drug_Target_1_Transmembrane_Regions:
#BEGIN_DRUGCARD DB00002
# Drug_Target_10_Cellular_Location:
# Drug_Target_10_Chromosome_Location:
# Drug_Target_10_Drug_References:
# Drug_Target_10_Essentiality:
# Drug_Target_10_GenAtlas_ID:
# Drug_Target_10_GenBank_ID_Gene:
# Drug_Target_10_GenBank_ID_Protein:
# Drug_Target_10_GeneCard_ID:
# Drug_Target_10_Gene_Name:
# Drug_Target_10_Gene_Sequence:
# Drug_Target_10_General_Function:
# Drug_Target_10_General_References:
# Drug_Target_10_HGNC_ID:
# Drug_Target_10_HPRD_ID:
# Drug_Target_10_ID:
# Drug_Target_10_Locus:
# Drug_Target_10_Molecular_Weight:
# Drug_Target_10_Name:
# Drug_Target_10_Number_of_Residues:
# Drug_Target_10_PDB_ID:
# Drug_Target_10_Pathway:
# Drug_Target_10_Pfam_Domain_Function:
# Drug_Target_10_Protein_Sequence:
# Drug_Target_10_Reaction:
# Drug_Target_10_Signals:
# Drug_Target_10_Specific_Function:
# Drug_Target_10_SwissProt_ID:
# Drug_Target_10_SwissProt_Name:
# Drug_Target_10_Synonyms:
# Drug_Target_10_Theoretical_pI:
# Drug_Target_10_Transmembrane_Regions:
# Drug_Target_11_Cellular_Location:
# Drug_Target_11_Chromosome_Location:
# Drug_Target_11_Drug_References:
# Drug_Target_11_Essentiality:
# Drug_Target_11_GenAtlas_ID:
# Drug_Target_11_GenBank_ID_Gene:
# Drug_Target_11_GenBank_ID_Protein:
# Drug_Target_11_GeneCard_ID:
# Drug_Target_11_Gene_Name:
# Drug_Target_11_Gene_Sequence:
# Drug_Target_11_General_Function:
# Drug_Target_11_General_References:
# Drug_Target_11_HGNC_ID:
# Drug_Target_11_HPRD_ID:
# Drug_Target_11_ID:
# Drug_Target_11_Locus:


But I want output shuld contain the entries mentioned after genbank ID and Genbank protein and proteinf name

so output can be


DRUGCARD DB00001 Drug_Target_1_GenBank_ID_Gene:0000(wahtever number)

# Drug_Target_1_GenBank_ID_ProteinSmiliewhatever ID)


# Drug_Target_1_Gene_Name: (the name mentioned)

And if I can get in different column these entries than it will be very easy to recoginse and arrange whole list of all Drug cards.

Please let me know if u have any idea.

Thanks
Mani
 

8 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

How to view a big file(143M big)

1 . Thanks everyone who read the post first. 2 . I have a log file which size is 143M , I can not use vi open it .I can not use xedit open it too. How to view it ? If I want to view 200-300 ,how can I implement it 3 . Thanks (3 Replies)
Discussion started by: chenhao_no1
3 Replies

2. UNIX for Dummies Questions & Answers

How big is too big a config.log file?

I have a 5000 line config.log file with several "maybe" errors. Any reccomendations on finding solvable problems? (2 Replies)
Discussion started by: NeedLotsofHelp
2 Replies

3. Shell Programming and Scripting

Extract certain entries from big file:Request to check

Hi all I have a big file which I have attached here. And, I have to fetch certain entries and arrange in 5 columns Name Drug DAP ID disease approved or notIn the attached file data is arranged with tab separated columns in this way: and other data is... (2 Replies)
Discussion started by: manigrover
2 Replies

4. Shell Programming and Scripting

Count and separate entries with N/A mentioned in front

Hi all, I have afile with following data I want to separate, count the entries with N/A in front of it so I will have all the entries with N/A in front seprate file . so output shuld be (7 Replies)
Discussion started by: manigrover
7 Replies

5. Shell Programming and Scripting

separate old entries

Hi I have a file Stomach qwe wer qwew Liver sdfjk shdf jkasfhd I want expected out shuld be in such a way that bold letters shuld comein front of non bold letter qwe Stomach wer Stomach qwew Stomach sdfjk Liver shdf Liver... (8 Replies)
Discussion started by: manigrover
8 Replies

6. Shell Programming and Scripting

Separate Entries after comma

Hi All I need help to separate entries after commas in my I have 2 columns in my file like this Ramush, Shyam, Mohan First Ram, Mohan, Kaavya Second, Fourth Kavi, Ram, Shaym, Mohan Third I ahve to separate entries after comma in a separate row... (9 Replies)
Discussion started by: kareena
9 Replies

7. Shell Programming and Scripting

Match first column and separate entries

Hi I have 2 big files containing following information: file 1 12345 345634 217341 87234693 8236493 file 2: 12345 1237 (6 Replies)
Discussion started by: kaav06
6 Replies

8. Shell Programming and Scripting

Print the overlapping entries in 2 files to separate file

I have two files that contain overlapping positions. i want to put them together each overlapping entries in both files in to a new file (the entries of first file first and the entries of second file next) followed by blank line then next overlapping entries and so on. input1 chr1 22 ... (10 Replies)
Discussion started by: raj_k
10 Replies
mlocate.db(5)							File Formats Manual						     mlocate.db(5)

NAME
mlocate.db - a mlocate database DESCRIPTION
A mlocate database starts with a file header: 8 bytes for a magic number ("mlocate" like a C literal), 4 bytes for the configuration block size in big endian, 1 byte for file format version (0), 1 byte for the "require visibility" flag (0 or 1), 2 bytes padding, and a NUL-terminated path name of the root of the database. The header is followed by a configuration block, included to ensure databases are not reused if some configuration changes could affect their contents. The size of the configuration block in bytes is stored in the file header. The configuration block is a sequence of vari- able assignments, ordered by variable name. Each variable assignment consists of a NUL-terminated variable name and an ordered list of NUL-terminated values. The value list is terminated by one more NUL character. The ordering used is defined by the strcmp () function. Currently defined variables are: prune_bind_mounts A single entry, the value of PRUNE_BIND_MOUNTS; one of the strings 0 or 1. prunefs The value of PRUNEFS, each entry is converted to uppercase. prunepaths The value of PRUNEPATHS. The rest of the file until EOF describes directories and their contents. Each directory starts with a header: 8 bytes for directory time (seconds) in big endian, 4 bytes for directory time (nanoseconds) in big endian (0 if unknown, less than 1,000,000,000), 4 bytes padding, and a NUL-terminated path name of the the directory. Directory contents, a sequence of file entries sorted by name, follow. Directory time is the maximum of st_ctime and st_mtime of the directory. updatedb(8) uses the original data if the directory time in the database and in the file system match exactly. Directory time equal to 0 always causes rescanning of the directory: this is necessary to handle directories which were being updated while building the database. Each file entry starts with a single byte, marking its type: 0 A non-directory file. Followed by a NUL-terminated file (not path) name. 1 A subdirectory. Followed by a NUL-terminated file (not path) name. 2 Marks the end of the current directory. locate(1) only reports file entries, directory names are not reported because they are reported as an entry in their parent directory. The only exception is the root directory of the database, which is stored in the file header. AUTHOR
Miloslav Trmac <mitr@redhat.com> SEE ALSO
locate(1), updatedb.conf(5), updatedb(8) mlocate Jan 2007 mlocate.db(5)
All times are GMT -4. The time now is 08:51 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy