Match first pattern first then extract second pattern match


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Match first pattern first then extract second pattern match
# 1  
Old 12-04-2009
Match first pattern first then extract second pattern match

My input file:
Code:
<accession>Q91G55</accession>
<name>043L_IIV6</name>
<protein>
<recommendedName>
<location>
<position position="294"/>
</location>
<fullName>Uncharacterized protein 043L</fullName>

<accession>P18556</accession>
<name>1106L_ASFB7</name>
<protein>
<recommendedName>
<fullName>Protein MGF 110-6L</fullName>

<accession>O55734</accession>
<name>120L_IIV6</name>
<fullName>Uncharacterized protein 120L</fullName>
.
.

My desired output file (extract accession number first, then extract the fullname belong to its):
Code:
<fullName>Uncharacterized protein 043L</fullName>
<fullName>Protein MGF 110-6L</fullName>
<fullName>Uncharacterized protein 120L</fullName>

This is the code I try, but it is not a good code because it will extract some <fullName> detail about other <accession> Smilie

Code:
grep -A8 '<accession>' file | grep '<fullName>'

The original file, each group start with <accession> and end with <fullName>, but the detail description on it, is different within each group.
Actually at first I won't extract all the <accession> from a long list of list. I only want to extract specific <accession> from a long list of data. From those selected <accession> detail, I want extract all of its <fullName>.
Thanks a lot for any suggestion and advice.

Last edited by patrick87; 12-05-2009 at 08:43 PM.. Reason: more code tags
# 2  
Old 12-04-2009
I don't understand what you need. From the description I'd say that your expected output is not correct.

Maybe you are looking for something like this:
Code:
awk '/^<accession/ {print} /^<fullName/ {print}' infile
<accession>Q91G55</accession>
<fullName>Uncharacterized protein 043L</fullName>
<accession>P18556</accession>
<fullName>Protein MGF 110-6L</fullName>
<accession>O55734</accession>
<fullName>Uncharacterized protein 120L</fullName>

If you only want the output you have shown in your example, use a simple grep to get he lines that contain "fullName".

Exact output example of what you need would help. The descriptions are confusing, sorry.
# 3  
Old 12-04-2009
Why don't you grep only for <fullName>:
Code:
grep '<fullName>' file

# 4  
Old 12-05-2009
Quote:
Originally Posted by zaxxon
I don't understand what you need. From the description I'd say that your expected output is not correct.

Maybe you are looking for something like this:
Code:
awk '/^<accession/ {print} /^<fullName/ {print}' infile
<accession>Q91G55</accession>
<fullName>Uncharacterized protein 043L</fullName>
<accession>P18556</accession>
<fullName>Protein MGF 110-6L</fullName>
<accession>O55734</accession>
<fullName>Uncharacterized protein 120L</fullName>

If you only want the output you have shown in your example, use a simple grep to get he lines that contain "fullName".

Exact output example of what you need would help. The descriptions are confusing, sorry.
Hi zaxxon,
sorry that my question let you all confusing Smilie
Actually I won't select all the <accession> from a long list of list. I only want to extract specific <accession> from a long list of data. From those selected <accession> detail, I want extract all of its <fullName>. Hope that you can understand my problem now Smilie
sorry if my problem give you all confusing Smilie

---------- Post updated at 07:42 PM ---------- Previous update was at 07:41 PM ----------

Hi Franklin52,
sorry that my question let you all confusing
Actually at first I won't extract all the <accession> from a long list of list. I only want to extract specific <accession> from a long list of data. From those selected <accession> detail, I want extract all of its <fullName>. Hope that you can understand my problem now Smilie
sorry if my problem give you all confusing Smilie
# 5  
Old 12-06-2009
Debian

patrick87, your sample input file make us confused. So I have to guess.

I change your input file, in number two session, there is no keyword accession. Please confirm, if this is your desired output.

Code:
$ cat input.txt
<accession>Q91G55</accession>
<name>043L_IIV6</name>
<protein>
<recommendedName>
<location>
<position position="294"/>
</location>
<fullName>Uncharacterized protein 043L</fullName>

<name>1106L_ASFB7</name>
<protein>
<recommendedName>
<fullName>Protein MGF 110-6L</fullName>

<accession>O55734</accession>
<name>120L_IIV6</name>
<fullName>Uncharacterized protein 120L</fullName>

$  awk 'BEGIN {RS=""; FS="\n"} /<accession>/ {for (i=1;i<=NF;i++) {if ($i~/<fullName>/) {print $i}}}' input.txt
<fullName>Uncharacterized protein 043L</fullName>
<fullName>Uncharacterized protein 120L</fullName>

# 6  
Old 12-06-2009
Thanks a lot, rdcwayx.
You are right!
Although the code will take some time when deal with huge data, it still worked perfectly Smilie
Thanks a lot for sharing.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Help with pattern match and Extract

Hi All, I am having a file like below . Basically when SB comes in the text with B. I have to take the word till SB. When there only B I should take take till B. Tried for cut it by demilter but not able to build the logic SB- CD B_RESTO SB_RESTO CRYSTALS BOILERS -->There SB and B so I... (6 Replies)
Discussion started by: arunkumar_mca
6 Replies

2. Shell Programming and Scripting

Match Pattern and print pattern and multiple lines into one line

Hello Experts , require help . See below output: File inputs ------------------------------------------ Server Host = mike id rl images allocated last updated density vimages expiration last read <------- STATUS ------->... (4 Replies)
Discussion started by: tigerhills
4 Replies

3. Shell Programming and Scripting

Rearrange or replace only the second line after pattern match or pattern match

Im using the command below , but thats not the output that i want. it only prints the odd and even numbers. awk '{if(NR%2){print $0 > "1"}else{print $0 > "2"}}' Im hoping for something like this file1: Text hi this is just a test text1 text2 text3 text4 text5 text6 Text hi... (2 Replies)
Discussion started by: invinzin21
2 Replies

4. Shell Programming and Scripting

Extract lines that match a pattern

Hi all, I got a file that contains the following content, Actually it is a part of the file content, Installing XYZ XYZA Image, API 18, revision 2 Unzipping XYZ XYZA Image, API 18, revision 2 (1%) Unzipping XYZ XYZA Image, API 18, revision 2 (96%) Unzipping XYZ XYZA Image, API 18,... (7 Replies)
Discussion started by: Kashyap
7 Replies

5. Shell Programming and Scripting

Pattern match exclusive return pattern/variable

I have an application(Minecraft Server) that generates a logfile live. Using Crontab and screen I send a 'list' command every minute. Sample Log view: 2013-06-07 19:14:37 <Willrocksyea1> hello* 2013-06-07 19:14:41 <Gromden29> hey 2013-06-07 19:14:42 Gromden29 lost connection:... (1 Reply)
Discussion started by: gatekeeper258
1 Replies

6. UNIX for Dummies Questions & Answers

Match Pattern after certain pattern and Print words next to Pattern

Hi experts , im new to Unix,AWK ,and im just not able to get this right. I need to match for some patterns if it matches I need to print the next few words to it.. I have only three such conditions to match… But I need to print only those words that comes after satisfying the first condition..... (2 Replies)
Discussion started by: 100bees
2 Replies

7. Shell Programming and Scripting

Pattern Match & Extract from a string

Hi, I have long string in 2nd field, as shown below: REF1 | CLESCLJSCSHSCSMSCSNSCSRSCUDSCUFSCU7SCV1SCWPSCXGPDBACAPA0DHDPDMESED6 REF2 | SBR4PCBFPCDRSCSCG3SCHEBSCKNSCKPSCLLSCMCZXTNPCVFPCV6P4KL0DMDSDSASEWG I have a group of fixed patterns which can occur in these long strings & only... (11 Replies)
Discussion started by: karumudi7
11 Replies

8. Shell Programming and Scripting

Awk to match a pattern and perform a search after the first pattern

Hello Guyz I have been following this forum for a while and the solutions provided are super useful. I currently have a scenario where i need to search for a pattern and start searching by keeping the first pattern as a baseline ABC DEF LMN EFG HIJ LMN OPQ In the above text i need to... (8 Replies)
Discussion started by: RickCharles
8 Replies

9. Shell Programming and Scripting

Need one liner to search pattern and print everything expect 6 lines from where pattern match made

i need to search for a pattern from a big file and print everything expect the next 6 lines from where the pattern match was made. (8 Replies)
Discussion started by: chidori
8 Replies

10. Shell Programming and Scripting

Extract data from records that match pattern

Hi Guys, I have a file as follows: a b c 1 2 3 4 pp gg gh hh 1 2 fm 3 4 g h i j k l m 1 2 3 4 d e f g h j i k l 1 2 3 f 3 4 r t y u i o p d p re 1 2 3 f 4 t y w e q w r a s p a 1 2 3 4 I am trying to extract all the 2's from each row. 2 is just an example... (6 Replies)
Discussion started by: npatwardhan
6 Replies
Login or Register to Ask a Question