Separate certain entries from a very big file Post: 302639969

Sponsored Content

Top Forums Shell Programming and Scripting Separate certain entries from a very big file Post 302639969 by manigrover on Sunday 13th of May 2012 10:33:03 PM

05-13-2012

Banned

Request to check: how to find exact entires before that and put in next column

Hello

Thanks for the reply and help regarding scripts.. after running the above mentioned script I m gettign following result

awk '/^#BEGIN_/||/^# Drug_Target_[1-9]/' infile

Code:

#BEGIN_DRUGCARD DB00001
# Drug_Target_1_Cellular_Location:
# Drug_Target_1_Chromosome_Location:
# Drug_Target_1_Drug_References:
# Drug_Target_1_Essentiality:
# Drug_Target_1_GenAtlas_ID:
# Drug_Target_1_GenBank_ID_Gene:
# Drug_Target_1_GenBank_ID_Protein:
# Drug_Target_1_GeneCard_ID:
# Drug_Target_1_Gene_Name:
# Drug_Target_1_Gene_Sequence:
# Drug_Target_1_General_Function:
# Drug_Target_1_General_References:
# Drug_Target_1_HGNC_ID:
# Drug_Target_1_HPRD_ID:
# Drug_Target_1_ID:
# Drug_Target_1_Locus:
# Drug_Target_1_Molecular_Weight:
# Drug_Target_1_Name:
# Drug_Target_1_Number_of_Residues:
# Drug_Target_1_PDB_ID:
# Drug_Target_1_Pathway:
# Drug_Target_1_Pfam_Domain_Function:
# Drug_Target_1_Protein_Sequence:
# Drug_Target_1_Reaction:
# Drug_Target_1_Signals:
# Drug_Target_1_Specific_Function:
# Drug_Target_1_SwissProt_ID:
# Drug_Target_1_SwissProt_Name:
# Drug_Target_1_Synonyms:
# Drug_Target_1_Theoretical_pI:
# Drug_Target_1_Transmembrane_Regions:
#BEGIN_DRUGCARD DB00002
# Drug_Target_10_Cellular_Location:
# Drug_Target_10_Chromosome_Location:
# Drug_Target_10_Drug_References:
# Drug_Target_10_Essentiality:
# Drug_Target_10_GenAtlas_ID:
# Drug_Target_10_GenBank_ID_Gene:
# Drug_Target_10_GenBank_ID_Protein:
# Drug_Target_10_GeneCard_ID:
# Drug_Target_10_Gene_Name:
# Drug_Target_10_Gene_Sequence:
# Drug_Target_10_General_Function:
# Drug_Target_10_General_References:
# Drug_Target_10_HGNC_ID:
# Drug_Target_10_HPRD_ID:
# Drug_Target_10_ID:
# Drug_Target_10_Locus:
# Drug_Target_10_Molecular_Weight:
# Drug_Target_10_Name:
# Drug_Target_10_Number_of_Residues:
# Drug_Target_10_PDB_ID:
# Drug_Target_10_Pathway:
# Drug_Target_10_Pfam_Domain_Function:
# Drug_Target_10_Protein_Sequence:
# Drug_Target_10_Reaction:
# Drug_Target_10_Signals:
# Drug_Target_10_Specific_Function:
# Drug_Target_10_SwissProt_ID:
# Drug_Target_10_SwissProt_Name:
# Drug_Target_10_Synonyms:
# Drug_Target_10_Theoretical_pI:
# Drug_Target_10_Transmembrane_Regions:
# Drug_Target_11_Cellular_Location:
# Drug_Target_11_Chromosome_Location:
# Drug_Target_11_Drug_References:
# Drug_Target_11_Essentiality:
# Drug_Target_11_GenAtlas_ID:
# Drug_Target_11_GenBank_ID_Gene:
# Drug_Target_11_GenBank_ID_Protein:
# Drug_Target_11_GeneCard_ID:
# Drug_Target_11_Gene_Name:
# Drug_Target_11_Gene_Sequence:
# Drug_Target_11_General_Function:
# Drug_Target_11_General_References:
# Drug_Target_11_HGNC_ID:
# Drug_Target_11_HPRD_ID:
# Drug_Target_11_ID:
# Drug_Target_11_Locus:

But I want output shuld contain the entries mentioned after genbank ID and Genbank protein and proteinf name

so output can be

DRUGCARD DB00001 Drug_Target_1_GenBank_ID_Gene:0000(wahtever number)

# Drug_Target_1_GenBank_ID_Protein Smilie

whatever ID)

# Drug_Target_1_Gene_Name: (the name mentioned)

And if I can get in different column these entries than it will be very easy to recoginse and arrange whole list of all Drug cards.

Please let me know if u have any idea.

Thanks
Mani

manigrover

View Public Profile for manigrover

Find all posts by manigrover

8 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

How to view a big file(143M big)

1 . Thanks everyone who read the post first. 2 . I have a log file which size is 143M , I can not use vi open it .I can not use xedit open it too. How to view it ? If I want to view 200-300 ,how can I implement it 3 . Thanks

2. UNIX for Dummies Questions & Answers

How big is too big a config.log file?

I have a 5000 line config.log file with several "maybe" errors. Any reccomendations on finding solvable problems?

3. Shell Programming and Scripting

Extract certain entries from big file:Request to check

Hi all I have a big file which I have attached here. And, I have to fetch certain entries and arrange in 5 columns Name Drug DAP ID disease approved or notIn the attached file data is arranged with tab separated columns in this way: and other data is...

4. Shell Programming and Scripting

Count and separate entries with N/A mentioned in front

Hi all, I have afile with following data I want to separate, count the entries with N/A in front of it so I will have all the entries with N/A in front seprate file . so output shuld be

5. Shell Programming and Scripting

separate old entries

Hi I have a file Stomach qwe wer qwew Liver sdfjk shdf jkasfhd I want expected out shuld be in such a way that bold letters shuld comein front of non bold letter qwe Stomach wer Stomach qwew Stomach sdfjk Liver shdf Liver...

6. Shell Programming and Scripting

Separate Entries after comma

Hi All I need help to separate entries after commas in my I have 2 columns in my file like this Ramush, Shyam, Mohan First Ram, Mohan, Kaavya Second, Fourth Kavi, Ram, Shaym, Mohan Third I ahve to separate entries after comma in a separate row...

7. Shell Programming and Scripting

Match first column and separate entries

Hi I have 2 big files containing following information: file 1 12345 345634 217341 87234693 8236493 file 2: 12345 1237

8. Shell Programming and Scripting

Print the overlapping entries in 2 files to separate file

I have two files that contain overlapping positions. i want to put them together each overlapping entries in both files in to a new file (the entries of first file first and the entries of second file next) followed by blank line then next overlapping entries and so on. input1 chr1 22 ...

LEARN ABOUT CENTOS

mlocate.db

mlocate.db(5)							File Formats Manual						     mlocate.db(5)

NAME

       mlocate.db - a mlocate database

DESCRIPTION

       A  mlocate  database  starts  with  a file header: 8 bytes for a magic number ("mlocate" like a C literal), 4 bytes for the configuration
       block size in big endian, 1 byte for file format version (0), 1 byte for the "require visibility" flag (0 or 1), 2  bytes  padding,  and  a
       NUL-terminated path name of the root of the database.

       The  header  is	followed  by a configuration block, included to ensure databases are not reused if some configuration changes could affect
       their contents.	The size of the configuration block in bytes is stored in the file header.  The configuration block is a sequence of vari-
       able  assignments,  ordered  by	variable name.	Each variable assignment consists of a NUL-terminated variable name and an ordered list of
       NUL-terminated values.  The value list is terminated by one more NUL character.	The ordering used is defined by the strcmp () function.

       Currently defined variables are:

       prune_bind_mounts
	      A single entry, the value of PRUNE_BIND_MOUNTS; one of the strings 0 or 1.

       prunefs
	      The value of PRUNEFS, each entry is converted to uppercase.

       prunepaths
	      The value of PRUNEPATHS.

       The rest of the file until EOF describes directories and their contents.  Each directory starts with a header: 8 bytes for  directory  time
       (seconds)  in  big endian, 4 bytes for directory time (nanoseconds) in big endian (0 if unknown, less than 1,000,000,000), 4 bytes padding,
       and a NUL-terminated path name of the the directory.  Directory contents, a sequence of file entries sorted by name, follow.

       Directory time is the maximum of st_ctime and st_mtime of the directory.  updatedb(8) uses the original data if the directory time  in  the
       database  and  in the file system match exactly.  Directory time equal to 0 always causes rescanning of the directory: this is necessary to
       handle directories which were being updated while building the database.

       Each file entry starts with a single byte, marking its type:

       0      A non-directory file.  Followed by a NUL-terminated file (not path) name.

       1      A subdirectory.  Followed by a NUL-terminated file (not path) name.

       2      Marks the end of the current directory.

       locate(1) only reports file entries, directory names are not reported because they are reported as an entry in their parent directory.  The
       only exception is the root directory of the database, which is stored in the file header.

AUTHOR

       Miloslav Trmac <mitr@redhat.com>

SEE ALSO

       locate(1), updatedb.conf(5), updatedb(8)

mlocate 							     Jan 2007							     mlocate.db(5)

8 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

How to view a big file(143M big)

Discussion started by: chenhao_no1

2. UNIX for Dummies Questions & Answers

How big is too big a config.log file?

Discussion started by: NeedLotsofHelp

3. Shell Programming and Scripting

Extract certain entries from big file:Request to check

Discussion started by: manigrover

4. Shell Programming and Scripting

Count and separate entries with N/A mentioned in front

Discussion started by: manigrover

5. Shell Programming and Scripting

separate old entries

Discussion started by: manigrover

6. Shell Programming and Scripting

Separate Entries after comma

Discussion started by: kareena

7. Shell Programming and Scripting

Match first column and separate entries

Discussion started by: kaav06

8. Shell Programming and Scripting

Print the overlapping entries in 2 files to separate file

Discussion started by: raj_k

LEARN ABOUT CENTOS

mlocate.db