Extract all content that match exactly only specific word


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Extract all content that match exactly only specific word
# 1  
Old 04-12-2010
Extract all content that match exactly only specific word

Input:
Code:
21      templeta        parent  35718   36554   .       -       .       ID=parent_cluster_50.21.11; Name=Partial%20parent%20for%20training%20set;
21      templeta        kids    35718   36554   .       -       .       ID=_52; Parent=parent_cluster_5085.21.11;
21      templeta        location        35840   36073   .       -       .       ID=_5285.location4; Parent=_5285
21      templeta        pattern 35840   36073   .       -       0       ID=_52.cds4; Parent=_5285
21      templeta        location        35718   35778   .       -       .       ID=_5285.location5; Parent=_5285
21      templeta        pattern 35758   35778   .       -       0       ID=_52.cds5; Parent=_5285
21      templeta        length  35718   35757   .       -       .       ID=_52.utr3p1; Parent=_5285

21      templeta        parent  43191   43851   .       +       .       ID=parent_cluster_5086.21.12; Name=Partial%20parent%20for%20training%20set;
21      templeta        kids    43191   43851   .       +       .       ID=_5286; Parent=parent_cluster_5086.21.12;
21      templeta        length  43191   43192   .       +       .       ID=_5286.utr5p1; Parent=_5286
21      templeta        location        43191   43851   .       +       .       ID=_5286.location1; Parent=_5286
21      templeta        pattern 43193   43819   .       +       0       ID=_5286.cds1; Parent=_5286; 5_prime_partial=true
21      templeta        length  43820   43851   .       +       .       ID=_5286.utr3p1; Parent=_5286

22      templeta        parent  4204    4962    .       -       .       ID=parent_cluster_5087.22.1; Name=Partial%20parent%20for%20training%20set;
22      templeta        kids    4204    4962    .       -       .       ID=_5287; Parent=parent_cluster_5087.22.1;
22      templeta        length  4876    4962    .       -       .       ID=_5287.utr5p1; Parent=_5287
22      templeta        location        4204    4962    .       -       .       ID=_5287.location1; Parent=_5287
22      templeta        pattern 4204    4875    .       -       0       ID=_5287.cds1; Parent=_5287; 3_prime_partial=true

Desired output:
Code:
21      templeta        parent  35718   36554   .       -       .       ID=parent_cluster_50.21.11; Name=Partial%20parent%20for%20training%20set;
21      templeta        kids    35718   36554   .       -       .       ID=_52; Parent=parent_cluster_5085.21.11;
21      templeta        location        35840   36073   .       -       .       ID=_5285.location4; Parent=_5285
21      templeta        pattern 35840   36073   .       -       0       ID=_52.cds4; Parent=_5285
21      templeta        location        35718   35778   .       -       .       ID=_5285.location5; Parent=_5285
21      templeta        pattern 35758   35778   .       -       0       ID=_52.cds5; Parent=_5285
21      templeta        length  35718   35757   .       -       .       ID=_52.utr3p1; Parent=_5285

Awk code that I have tried:
Code:
awk 'BEGIN {RS=""; FS="\n"}  {for (i=1;i<=NF;i++) {if ($i~/ID=_52/) {print $_}}}' input_file

Output I get:
Code:
21      templeta        parent  35718   36554   .       -       .       ID=parent_cluster_50.21.11; Name=Partial%20parent%20for%20training%20set;
21      templeta        kids    35718   36554   .       -       .       ID=_52; Parent=parent_cluster_5085.21.11;
21      templeta        location        35840   36073   .       -       .       ID=_5285.location4; Parent=_5285
21      templeta        pattern 35840   36073   .       -       0       ID=_52.cds4; Parent=_5285
21      templeta        location        35718   35778   .       -       .       ID=_5285.location5; Parent=_5285
21      templeta        pattern 35758   35778   .       -       0       ID=_52.cds5; Parent=_5285
21      templeta        length  35718   35757   .       -       .       ID=_52.utr3p1; Parent=_5285

21      templeta        parent  43191   43851   .       +       .       ID=parent_cluster_5086.21.12; Name=Partial%20parent%20for%20training%20set;
21      templeta        kids    43191   43851   .       +       .       ID=_5286; Parent=parent_cluster_5086.21.12;
21      templeta        length  43191   43192   .       +       .       ID=_5286.utr5p1; Parent=_5286
21      templeta        location        43191   43851   .       +       .       ID=_5286.location1; Parent=_5286
21      templeta        pattern 43193   43819   .       +       0       ID=_5286.cds1; Parent=_5286; 5_prime_partial=true
21      templeta        length  43820   43851   .       +       .       ID=_5286.utr3p1; Parent=_5286

22      templeta        parent  4204    4962    .       -       .       ID=parent_cluster_5087.22.1; Name=Partial%20parent%20for%20training%20set;
22      templeta        kids    4204    4962    .       -       .       ID=_5287; Parent=parent_cluster_5087.22.1;
22      templeta        length  4876    4962    .       -       .       ID=_5287.utr5p1; Parent=_5287
22      templeta        location        4204    4962    .       -       .       ID=_5287.location1; Parent=_5287
22      templeta        pattern 4204    4875    .       -       0       ID=_5287.cds1; Parent=_5287; 3_prime_partial=true

My purpose is plan to use awk or any other programming language to extract those content that match exactly only "ID=_52" word instead of extract all the content that slightly match to "ID=_52" like "ID_05286", "ID_05287" .
Thanks for any advice.
# 2  
Old 04-12-2010
Did try using ^,$ ?

Code:
/^ID=_52$/

# 3  
Old 04-12-2010
Quote:
Originally Posted by patrick87
My purpose is plan to use awk or any other programming language to extract those content that match exactly only "ID=_52" word instead of extract all the content that slightly match to "ID=_52" like "ID_05286", "ID_05287" .
Code:
egrep -w ID=_52 file

# 4  
Old 04-12-2010
Hi,
Thanks for your reply.
It seems like no worked in my case Smilie

---------- Post updated at 05:05 AM ---------- Previous update was at 05:01 AM ----------

Hi,
Thanks for your sugguestion.
But it seems like the grep code can't print out the first line inside my output result.
Code:
egrep -w ID=_52 file
21      templeta        kids    35718   36554   .       -       .       ID=_52; Parent=parent_cluster_5085.21.11;
21      templeta        location        35840   36073   .       -       .       ID=_5285.location4; Parent=_5285
21      templeta        pattern 35840   36073   .       -       0       ID=_52.cds4; Parent=_5285
21      templeta        location        35718   35778   .       -       .       ID=_5285.location5; Parent=_5285
21      templeta        pattern 35758   35778   .       -       0       ID=_52.cds5; Parent=_5285
21      templeta        length  35718   35757   .       -       .       ID=_52.utr3p1; Parent=_5285

Desired output:
Code:
21      templeta        parent  35718   36554   .       -       .       ID=parent_cluster_50.21.11; Name=Partial%20parent%20for%20training%20set;
21      templeta        kids    35718   36554   .       -       .       ID=_52; Parent=parent_cluster_5085.21.11;
21      templeta        location        35840   36073   .       -       .       ID=_5285.location4; Parent=_5285
21      templeta        pattern 35840   36073   .       -       0       ID=_52.cds4; Parent=_5285
21      templeta        location        35718   35778   .       -       .       ID=_5285.location5; Parent=_5285
21      templeta        pattern 35758   35778   .       -       0       ID=_52.cds5; Parent=_5285
21      templeta        length  35718   35757   .       -       .       ID=_52.utr3p1; Parent=_5285

Do you got any other suggestion?
Thanks.
# 5  
Old 04-12-2010
Something like this?
Code:
awk '$9~"^ID=_52[^0-9]"' infile



---------- Post updated at 12:22 ---------- Previous update was at 12:20 ----------

Your first line does not contain ID=_52. Also ID=_5285.location4 and 5 do not match this pattern. So it is not clear what you are trying to achieve..
# 6  
Old 04-12-2010
Yup. My first line don't have "ID=_52"
Thus I used plan to extract the content based on two condition:
1. Use New line to be as Field separator
2. Once match "ID=_52" word, extract its all content include the first line.
Sorry if I misunderstanding you Smilie
Thanks for any advice to improve my awk code to archive my desired goal Smilie
# 7  
Old 04-12-2010
You mean something like this:
Code:
 awk '$9~"ID=parent_cluster"{h=$0;p=0} $9~"ID=_52;"{print h;p=1} p&&NF' infile

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to match file1 and extract specific tag values

File2 is tab-delimeted and I am trying to use $2 in file1 (space delimeted) as a search term in file2. If it is found then the AF= in and the FDP= values from file2 are extracted and printed next to the file1 line. I commented the awk before I added the lines in bold the current output resulted. I... (7 Replies)
Discussion started by: cmccabe
7 Replies

2. UNIX for Dummies Questions & Answers

Match columns and write specific word

Hi all I have another question as of now. I have two files One file contain data like this Serendipity glamerus Shenpurity In another file these entries are present in different columns like this from 2 column onwards SRN Serendipity Non serendipity ... (1 Reply)
Discussion started by: Priyanka Chopra
1 Replies

3. UNIX for Dummies Questions & Answers

How to print line starts with specific word and contains specific word using sed?

Hi, I have gone through may posts and dint find exact solution for my requirement. I have file which consists below data and same file have lot of other data. <MAPPING DESCRIPTION ='' ISVALID ='YES' NAME='m_TASK_UPDATE' OBJECTVERSION ='1'> <MAPPING DESCRIPTION ='' ISVALID ='NO'... (11 Replies)
Discussion started by: tmalik79
11 Replies

4. Shell Programming and Scripting

Extract portion of log info based on specific word

Hi Gurus, I'm using HP-UX B.11.23 operating system. I've been trying to extract a specific wording for example: "A tool used by tp produced warnings" from my below log data, but could not find a way to solve it. My intention is, if the log contain the word: "A tool used by tp produced... (9 Replies)
Discussion started by: superHonda123
9 Replies

5. Shell Programming and Scripting

Extract specific content from data and rename its header problem asking

Input file 1: >pattern_5 GAATTCGTTCATGTAGGTTGASDASFGDSGRTYRYGHDGSDFGSDGGDSGSDGSDFGSDF ATTTAATTATGATTCATACGTCATATGTTATTATTCAATCGTATAAAATTATGTGACCTT SDFSDGSDFKSDAFLKJASLFJASKLFSJAKJFHASJKFHASJKFHASJKFHSJAKFHAW >pattern_1 AAGTCTTAAGATATCACCGTCGATTAGGTTTATACAGCTTTTGTGTTATTTAAATTTGAC... (10 Replies)
Discussion started by: patrick87
10 Replies

6. Shell Programming and Scripting

Way to extract detail and its content above specific value problem asking

Input file: >position_10 sample:68711 coords:5453-8666 number:3 type:complete len:344 MSINQYSSDFHYHSLMWQQQQQQQQHQNDVVEEKEALFEKPLTPSDVGKLNRLVIPKQHA ERYFPLAAAAADAVEKGLLLCFEDEEGKPWRFRYSYWNSSQSYVLTKGWSRYVKEKHLDA NRTS* >position_4 sample:68711 coords:553-866 number:4 type:partial len:483... (7 Replies)
Discussion started by: patrick87
7 Replies

7. Shell Programming and Scripting

Extract all the content after a specific data

My input: >seq_1 DSASSTRRARRRRTPRTPSLRSRRSDVTCS >seq_3 RMRLRRWRKSCSERS*RRSN >seq_8 RTTGLSERPRLPTTASRSISSRWTR >seq_10 NELPLEKGSLDSISIE >seq_9 PNQGDAREPQAHLPRRQGPRDRPLQAYA+ QVQHRRHDHSRTQH*LCRRRQREDCDRLHR >seq_4 DRGKGQAGCRRPQEGEALVRRCS>seq_6 FA*GLAAQDGEA*SGRG My output: Extract all... (22 Replies)
Discussion started by: patrick87
22 Replies

8. Shell Programming and Scripting

Extract specific content from a file

My input file: >sequence_1 ASSSSSSSSSSSDDDDDDDDDDDCCCCCCC ASDSFDFFDFDFFWERERERERFSDFESFSFD >sequence_2 ASDFDFDFFDDFFDFDSFDSFDFSDFSDFDSFASDSADSADASD ASDFFDFDFASFASFASFAFSFFSDASFASFASFAFS >sequence_3 VEDFGSDGSDGSDGSDGSDGSDGSDG dDFSDFSDFSDFSDFSDFSDFSDFSDF SDGFDGSFDGSGSDGSDGSDGSDGSDG My... (22 Replies)
Discussion started by: patrick87
22 Replies

9. Shell Programming and Scripting

Shell script or command help to extract specific contents from a long list of content

Hi, I got a long list of contents: >sequence_1 ASSSSSSSSSSSDDDDDDDDDDDCCCCCCC ASDSFDFFDFDFFWERERERERFSDFESFSFD >sequence_2 ASDFDFDFFDDFFDFDSFDSFDFSDFSDFDSFASDSADSADASD ASDFFDFDFASFASFASFAFSFFSDASFASFASFAFS >sequence_3 VEDFGSDGSDGSDGSDGSDGSDGSDG dDFSDFSDFSDFSDFSDFSDFSDFSDF... (2 Replies)
Discussion started by: patrick87
2 Replies

10. UNIX for Dummies Questions & Answers

How to search files containing a specific word in the content

Hi all, Lets say I have 3 files a.txt and b.txt and c.txt. a.txt has the following text ==================== apple is good for health b.txt has the following text ==================== apple is pomme in french c.txt has the following text ==================== orange has citric acid... (1 Reply)
Discussion started by: amjath78
1 Replies
Login or Register to Ask a Question