Extract all content that match exactly only specific word


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Extract all content that match exactly only specific word
Prev   Next
# 1  
Old 04-12-2010
Extract all content that match exactly only specific word

Input:
Code:
21      templeta        parent  35718   36554   .       -       .       ID=parent_cluster_50.21.11; Name=Partial%20parent%20for%20training%20set;
21      templeta        kids    35718   36554   .       -       .       ID=_52; Parent=parent_cluster_5085.21.11;
21      templeta        location        35840   36073   .       -       .       ID=_5285.location4; Parent=_5285
21      templeta        pattern 35840   36073   .       -       0       ID=_52.cds4; Parent=_5285
21      templeta        location        35718   35778   .       -       .       ID=_5285.location5; Parent=_5285
21      templeta        pattern 35758   35778   .       -       0       ID=_52.cds5; Parent=_5285
21      templeta        length  35718   35757   .       -       .       ID=_52.utr3p1; Parent=_5285

21      templeta        parent  43191   43851   .       +       .       ID=parent_cluster_5086.21.12; Name=Partial%20parent%20for%20training%20set;
21      templeta        kids    43191   43851   .       +       .       ID=_5286; Parent=parent_cluster_5086.21.12;
21      templeta        length  43191   43192   .       +       .       ID=_5286.utr5p1; Parent=_5286
21      templeta        location        43191   43851   .       +       .       ID=_5286.location1; Parent=_5286
21      templeta        pattern 43193   43819   .       +       0       ID=_5286.cds1; Parent=_5286; 5_prime_partial=true
21      templeta        length  43820   43851   .       +       .       ID=_5286.utr3p1; Parent=_5286

22      templeta        parent  4204    4962    .       -       .       ID=parent_cluster_5087.22.1; Name=Partial%20parent%20for%20training%20set;
22      templeta        kids    4204    4962    .       -       .       ID=_5287; Parent=parent_cluster_5087.22.1;
22      templeta        length  4876    4962    .       -       .       ID=_5287.utr5p1; Parent=_5287
22      templeta        location        4204    4962    .       -       .       ID=_5287.location1; Parent=_5287
22      templeta        pattern 4204    4875    .       -       0       ID=_5287.cds1; Parent=_5287; 3_prime_partial=true

Desired output:
Code:
21      templeta        parent  35718   36554   .       -       .       ID=parent_cluster_50.21.11; Name=Partial%20parent%20for%20training%20set;
21      templeta        kids    35718   36554   .       -       .       ID=_52; Parent=parent_cluster_5085.21.11;
21      templeta        location        35840   36073   .       -       .       ID=_5285.location4; Parent=_5285
21      templeta        pattern 35840   36073   .       -       0       ID=_52.cds4; Parent=_5285
21      templeta        location        35718   35778   .       -       .       ID=_5285.location5; Parent=_5285
21      templeta        pattern 35758   35778   .       -       0       ID=_52.cds5; Parent=_5285
21      templeta        length  35718   35757   .       -       .       ID=_52.utr3p1; Parent=_5285

Awk code that I have tried:
Code:
awk 'BEGIN {RS=""; FS="\n"}  {for (i=1;i<=NF;i++) {if ($i~/ID=_52/) {print $_}}}' input_file

Output I get:
Code:
21      templeta        parent  35718   36554   .       -       .       ID=parent_cluster_50.21.11; Name=Partial%20parent%20for%20training%20set;
21      templeta        kids    35718   36554   .       -       .       ID=_52; Parent=parent_cluster_5085.21.11;
21      templeta        location        35840   36073   .       -       .       ID=_5285.location4; Parent=_5285
21      templeta        pattern 35840   36073   .       -       0       ID=_52.cds4; Parent=_5285
21      templeta        location        35718   35778   .       -       .       ID=_5285.location5; Parent=_5285
21      templeta        pattern 35758   35778   .       -       0       ID=_52.cds5; Parent=_5285
21      templeta        length  35718   35757   .       -       .       ID=_52.utr3p1; Parent=_5285

21      templeta        parent  43191   43851   .       +       .       ID=parent_cluster_5086.21.12; Name=Partial%20parent%20for%20training%20set;
21      templeta        kids    43191   43851   .       +       .       ID=_5286; Parent=parent_cluster_5086.21.12;
21      templeta        length  43191   43192   .       +       .       ID=_5286.utr5p1; Parent=_5286
21      templeta        location        43191   43851   .       +       .       ID=_5286.location1; Parent=_5286
21      templeta        pattern 43193   43819   .       +       0       ID=_5286.cds1; Parent=_5286; 5_prime_partial=true
21      templeta        length  43820   43851   .       +       .       ID=_5286.utr3p1; Parent=_5286

22      templeta        parent  4204    4962    .       -       .       ID=parent_cluster_5087.22.1; Name=Partial%20parent%20for%20training%20set;
22      templeta        kids    4204    4962    .       -       .       ID=_5287; Parent=parent_cluster_5087.22.1;
22      templeta        length  4876    4962    .       -       .       ID=_5287.utr5p1; Parent=_5287
22      templeta        location        4204    4962    .       -       .       ID=_5287.location1; Parent=_5287
22      templeta        pattern 4204    4875    .       -       0       ID=_5287.cds1; Parent=_5287; 3_prime_partial=true

My purpose is plan to use awk or any other programming language to extract those content that match exactly only "ID=_52" word instead of extract all the content that slightly match to "ID=_52" like "ID_05286", "ID_05287" .
Thanks for any advice.
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to match file1 and extract specific tag values

File2 is tab-delimeted and I am trying to use $2 in file1 (space delimeted) as a search term in file2. If it is found then the AF= in and the FDP= values from file2 are extracted and printed next to the file1 line. I commented the awk before I added the lines in bold the current output resulted. I... (7 Replies)
Discussion started by: cmccabe
7 Replies

2. UNIX for Dummies Questions & Answers

Match columns and write specific word

Hi all I have another question as of now. I have two files One file contain data like this Serendipity glamerus Shenpurity In another file these entries are present in different columns like this from 2 column onwards SRN Serendipity Non serendipity ... (1 Reply)
Discussion started by: Priyanka Chopra
1 Replies

3. UNIX for Dummies Questions & Answers

How to print line starts with specific word and contains specific word using sed?

Hi, I have gone through may posts and dint find exact solution for my requirement. I have file which consists below data and same file have lot of other data. <MAPPING DESCRIPTION ='' ISVALID ='YES' NAME='m_TASK_UPDATE' OBJECTVERSION ='1'> <MAPPING DESCRIPTION ='' ISVALID ='NO'... (11 Replies)
Discussion started by: tmalik79
11 Replies

4. Shell Programming and Scripting

Extract portion of log info based on specific word

Hi Gurus, I'm using HP-UX B.11.23 operating system. I've been trying to extract a specific wording for example: "A tool used by tp produced warnings" from my below log data, but could not find a way to solve it. My intention is, if the log contain the word: "A tool used by tp produced... (9 Replies)
Discussion started by: superHonda123
9 Replies

5. Shell Programming and Scripting

Extract specific content from data and rename its header problem asking

Input file 1: >pattern_5 GAATTCGTTCATGTAGGTTGASDASFGDSGRTYRYGHDGSDFGSDGGDSGSDGSDFGSDF ATTTAATTATGATTCATACGTCATATGTTATTATTCAATCGTATAAAATTATGTGACCTT SDFSDGSDFKSDAFLKJASLFJASKLFSJAKJFHASJKFHASJKFHASJKFHSJAKFHAW >pattern_1 AAGTCTTAAGATATCACCGTCGATTAGGTTTATACAGCTTTTGTGTTATTTAAATTTGAC... (10 Replies)
Discussion started by: patrick87
10 Replies

6. Shell Programming and Scripting

Way to extract detail and its content above specific value problem asking

Input file: >position_10 sample:68711 coords:5453-8666 number:3 type:complete len:344 MSINQYSSDFHYHSLMWQQQQQQQQHQNDVVEEKEALFEKPLTPSDVGKLNRLVIPKQHA ERYFPLAAAAADAVEKGLLLCFEDEEGKPWRFRYSYWNSSQSYVLTKGWSRYVKEKHLDA NRTS* >position_4 sample:68711 coords:553-866 number:4 type:partial len:483... (7 Replies)
Discussion started by: patrick87
7 Replies

7. Shell Programming and Scripting

Extract all the content after a specific data

My input: >seq_1 DSASSTRRARRRRTPRTPSLRSRRSDVTCS >seq_3 RMRLRRWRKSCSERS*RRSN >seq_8 RTTGLSERPRLPTTASRSISSRWTR >seq_10 NELPLEKGSLDSISIE >seq_9 PNQGDAREPQAHLPRRQGPRDRPLQAYA+ QVQHRRHDHSRTQH*LCRRRQREDCDRLHR >seq_4 DRGKGQAGCRRPQEGEALVRRCS>seq_6 FA*GLAAQDGEA*SGRG My output: Extract all... (22 Replies)
Discussion started by: patrick87
22 Replies

8. Shell Programming and Scripting

Extract specific content from a file

My input file: >sequence_1 ASSSSSSSSSSSDDDDDDDDDDDCCCCCCC ASDSFDFFDFDFFWERERERERFSDFESFSFD >sequence_2 ASDFDFDFFDDFFDFDSFDSFDFSDFSDFDSFASDSADSADASD ASDFFDFDFASFASFASFAFSFFSDASFASFASFAFS >sequence_3 VEDFGSDGSDGSDGSDGSDGSDGSDG dDFSDFSDFSDFSDFSDFSDFSDFSDF SDGFDGSFDGSGSDGSDGSDGSDGSDG My... (22 Replies)
Discussion started by: patrick87
22 Replies

9. Shell Programming and Scripting

Shell script or command help to extract specific contents from a long list of content

Hi, I got a long list of contents: >sequence_1 ASSSSSSSSSSSDDDDDDDDDDDCCCCCCC ASDSFDFFDFDFFWERERERERFSDFESFSFD >sequence_2 ASDFDFDFFDDFFDFDSFDSFDFSDFSDFDSFASDSADSADASD ASDFFDFDFASFASFASFAFSFFSDASFASFASFAFS >sequence_3 VEDFGSDGSDGSDGSDGSDGSDGSDG dDFSDFSDFSDFSDFSDFSDFSDFSDF... (2 Replies)
Discussion started by: patrick87
2 Replies

10. UNIX for Dummies Questions & Answers

How to search files containing a specific word in the content

Hi all, Lets say I have 3 files a.txt and b.txt and c.txt. a.txt has the following text ==================== apple is good for health b.txt has the following text ==================== apple is pomme in french c.txt has the following text ==================== orange has citric acid... (1 Reply)
Discussion started by: amjath78
1 Replies
Login or Register to Ask a Question