Sponsored Content
Top Forums Shell Programming and Scripting Extract all content that match exactly only specific word Post 302412242 by patrick87 on Monday 12th of April 2010 05:49:07 AM
Old 04-12-2010
Extract all content that match exactly only specific word

Input:
Code:
21      templeta        parent  35718   36554   .       -       .       ID=parent_cluster_50.21.11; Name=Partial%20parent%20for%20training%20set;
21      templeta        kids    35718   36554   .       -       .       ID=_52; Parent=parent_cluster_5085.21.11;
21      templeta        location        35840   36073   .       -       .       ID=_5285.location4; Parent=_5285
21      templeta        pattern 35840   36073   .       -       0       ID=_52.cds4; Parent=_5285
21      templeta        location        35718   35778   .       -       .       ID=_5285.location5; Parent=_5285
21      templeta        pattern 35758   35778   .       -       0       ID=_52.cds5; Parent=_5285
21      templeta        length  35718   35757   .       -       .       ID=_52.utr3p1; Parent=_5285

21      templeta        parent  43191   43851   .       +       .       ID=parent_cluster_5086.21.12; Name=Partial%20parent%20for%20training%20set;
21      templeta        kids    43191   43851   .       +       .       ID=_5286; Parent=parent_cluster_5086.21.12;
21      templeta        length  43191   43192   .       +       .       ID=_5286.utr5p1; Parent=_5286
21      templeta        location        43191   43851   .       +       .       ID=_5286.location1; Parent=_5286
21      templeta        pattern 43193   43819   .       +       0       ID=_5286.cds1; Parent=_5286; 5_prime_partial=true
21      templeta        length  43820   43851   .       +       .       ID=_5286.utr3p1; Parent=_5286

22      templeta        parent  4204    4962    .       -       .       ID=parent_cluster_5087.22.1; Name=Partial%20parent%20for%20training%20set;
22      templeta        kids    4204    4962    .       -       .       ID=_5287; Parent=parent_cluster_5087.22.1;
22      templeta        length  4876    4962    .       -       .       ID=_5287.utr5p1; Parent=_5287
22      templeta        location        4204    4962    .       -       .       ID=_5287.location1; Parent=_5287
22      templeta        pattern 4204    4875    .       -       0       ID=_5287.cds1; Parent=_5287; 3_prime_partial=true

Desired output:
Code:
21      templeta        parent  35718   36554   .       -       .       ID=parent_cluster_50.21.11; Name=Partial%20parent%20for%20training%20set;
21      templeta        kids    35718   36554   .       -       .       ID=_52; Parent=parent_cluster_5085.21.11;
21      templeta        location        35840   36073   .       -       .       ID=_5285.location4; Parent=_5285
21      templeta        pattern 35840   36073   .       -       0       ID=_52.cds4; Parent=_5285
21      templeta        location        35718   35778   .       -       .       ID=_5285.location5; Parent=_5285
21      templeta        pattern 35758   35778   .       -       0       ID=_52.cds5; Parent=_5285
21      templeta        length  35718   35757   .       -       .       ID=_52.utr3p1; Parent=_5285

Awk code that I have tried:
Code:
awk 'BEGIN {RS=""; FS="\n"}  {for (i=1;i<=NF;i++) {if ($i~/ID=_52/) {print $_}}}' input_file

Output I get:
Code:
21      templeta        parent  35718   36554   .       -       .       ID=parent_cluster_50.21.11; Name=Partial%20parent%20for%20training%20set;
21      templeta        kids    35718   36554   .       -       .       ID=_52; Parent=parent_cluster_5085.21.11;
21      templeta        location        35840   36073   .       -       .       ID=_5285.location4; Parent=_5285
21      templeta        pattern 35840   36073   .       -       0       ID=_52.cds4; Parent=_5285
21      templeta        location        35718   35778   .       -       .       ID=_5285.location5; Parent=_5285
21      templeta        pattern 35758   35778   .       -       0       ID=_52.cds5; Parent=_5285
21      templeta        length  35718   35757   .       -       .       ID=_52.utr3p1; Parent=_5285

21      templeta        parent  43191   43851   .       +       .       ID=parent_cluster_5086.21.12; Name=Partial%20parent%20for%20training%20set;
21      templeta        kids    43191   43851   .       +       .       ID=_5286; Parent=parent_cluster_5086.21.12;
21      templeta        length  43191   43192   .       +       .       ID=_5286.utr5p1; Parent=_5286
21      templeta        location        43191   43851   .       +       .       ID=_5286.location1; Parent=_5286
21      templeta        pattern 43193   43819   .       +       0       ID=_5286.cds1; Parent=_5286; 5_prime_partial=true
21      templeta        length  43820   43851   .       +       .       ID=_5286.utr3p1; Parent=_5286

22      templeta        parent  4204    4962    .       -       .       ID=parent_cluster_5087.22.1; Name=Partial%20parent%20for%20training%20set;
22      templeta        kids    4204    4962    .       -       .       ID=_5287; Parent=parent_cluster_5087.22.1;
22      templeta        length  4876    4962    .       -       .       ID=_5287.utr5p1; Parent=_5287
22      templeta        location        4204    4962    .       -       .       ID=_5287.location1; Parent=_5287
22      templeta        pattern 4204    4875    .       -       0       ID=_5287.cds1; Parent=_5287; 3_prime_partial=true

My purpose is plan to use awk or any other programming language to extract those content that match exactly only "ID=_52" word instead of extract all the content that slightly match to "ID=_52" like "ID_05286", "ID_05287" .
Thanks for any advice.
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

How to search files containing a specific word in the content

Hi all, Lets say I have 3 files a.txt and b.txt and c.txt. a.txt has the following text ==================== apple is good for health b.txt has the following text ==================== apple is pomme in french c.txt has the following text ==================== orange has citric acid... (1 Reply)
Discussion started by: amjath78
1 Replies

2. Shell Programming and Scripting

Shell script or command help to extract specific contents from a long list of content

Hi, I got a long list of contents: >sequence_1 ASSSSSSSSSSSDDDDDDDDDDDCCCCCCC ASDSFDFFDFDFFWERERERERFSDFESFSFD >sequence_2 ASDFDFDFFDDFFDFDSFDSFDFSDFSDFDSFASDSADSADASD ASDFFDFDFASFASFASFAFSFFSDASFASFASFAFS >sequence_3 VEDFGSDGSDGSDGSDGSDGSDGSDG dDFSDFSDFSDFSDFSDFSDFSDFSDF... (2 Replies)
Discussion started by: patrick87
2 Replies

3. Shell Programming and Scripting

Extract specific content from a file

My input file: >sequence_1 ASSSSSSSSSSSDDDDDDDDDDDCCCCCCC ASDSFDFFDFDFFWERERERERFSDFESFSFD >sequence_2 ASDFDFDFFDDFFDFDSFDSFDFSDFSDFDSFASDSADSADASD ASDFFDFDFASFASFASFAFSFFSDASFASFASFAFS >sequence_3 VEDFGSDGSDGSDGSDGSDGSDGSDG dDFSDFSDFSDFSDFSDFSDFSDFSDF SDGFDGSFDGSGSDGSDGSDGSDGSDG My... (22 Replies)
Discussion started by: patrick87
22 Replies

4. Shell Programming and Scripting

Extract all the content after a specific data

My input: >seq_1 DSASSTRRARRRRTPRTPSLRSRRSDVTCS >seq_3 RMRLRRWRKSCSERS*RRSN >seq_8 RTTGLSERPRLPTTASRSISSRWTR >seq_10 NELPLEKGSLDSISIE >seq_9 PNQGDAREPQAHLPRRQGPRDRPLQAYA+ QVQHRRHDHSRTQH*LCRRRQREDCDRLHR >seq_4 DRGKGQAGCRRPQEGEALVRRCS>seq_6 FA*GLAAQDGEA*SGRG My output: Extract all... (22 Replies)
Discussion started by: patrick87
22 Replies

5. Shell Programming and Scripting

Way to extract detail and its content above specific value problem asking

Input file: >position_10 sample:68711 coords:5453-8666 number:3 type:complete len:344 MSINQYSSDFHYHSLMWQQQQQQQQHQNDVVEEKEALFEKPLTPSDVGKLNRLVIPKQHA ERYFPLAAAAADAVEKGLLLCFEDEEGKPWRFRYSYWNSSQSYVLTKGWSRYVKEKHLDA NRTS* >position_4 sample:68711 coords:553-866 number:4 type:partial len:483... (7 Replies)
Discussion started by: patrick87
7 Replies

6. Shell Programming and Scripting

Extract specific content from data and rename its header problem asking

Input file 1: >pattern_5 GAATTCGTTCATGTAGGTTGASDASFGDSGRTYRYGHDGSDFGSDGGDSGSDGSDFGSDF ATTTAATTATGATTCATACGTCATATGTTATTATTCAATCGTATAAAATTATGTGACCTT SDFSDGSDFKSDAFLKJASLFJASKLFSJAKJFHASJKFHASJKFHASJKFHSJAKFHAW >pattern_1 AAGTCTTAAGATATCACCGTCGATTAGGTTTATACAGCTTTTGTGTTATTTAAATTTGAC... (10 Replies)
Discussion started by: patrick87
10 Replies

7. Shell Programming and Scripting

Extract portion of log info based on specific word

Hi Gurus, I'm using HP-UX B.11.23 operating system. I've been trying to extract a specific wording for example: "A tool used by tp produced warnings" from my below log data, but could not find a way to solve it. My intention is, if the log contain the word: "A tool used by tp produced... (9 Replies)
Discussion started by: superHonda123
9 Replies

8. UNIX for Dummies Questions & Answers

How to print line starts with specific word and contains specific word using sed?

Hi, I have gone through may posts and dint find exact solution for my requirement. I have file which consists below data and same file have lot of other data. <MAPPING DESCRIPTION ='' ISVALID ='YES' NAME='m_TASK_UPDATE' OBJECTVERSION ='1'> <MAPPING DESCRIPTION ='' ISVALID ='NO'... (11 Replies)
Discussion started by: tmalik79
11 Replies

9. UNIX for Dummies Questions & Answers

Match columns and write specific word

Hi all I have another question as of now. I have two files One file contain data like this Serendipity glamerus Shenpurity In another file these entries are present in different columns like this from 2 column onwards SRN Serendipity Non serendipity ... (1 Reply)
Discussion started by: Priyanka Chopra
1 Replies

10. Shell Programming and Scripting

awk to match file1 and extract specific tag values

File2 is tab-delimeted and I am trying to use $2 in file1 (space delimeted) as a search term in file2. If it is found then the AF= in and the FDP= values from file2 are extracted and printed next to the file1 line. I commented the awk before I added the lines in bold the current output resulted. I... (7 Replies)
Discussion started by: cmccabe
7 Replies
All times are GMT -4. The time now is 10:49 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy