05-20-2014
Quote:
Originally Posted by
Corona688
Quick and simple way:
Code :
awk '$1 == "Iteration_query-def" { print $2 }' RS="<" FS=">" iteration.xml
Sorry, i think i didn't say it clearly.
I have a xml file A like this:
Code :
<Iteration>
<Iteration_iter-num>3</Iteration_iter-num>
<Iteration_query-ID>lcl|3_0</Iteration_query-ID>
<Iteration_query-def>G383C4U01EQA0A length=197</Iteration_query-def>
<Iteration_query-len>197</Iteration_query-len>
<Iteration_stat>
<Statistics>
<Statistics_db-num>31601460</Statistics_db-num>
<Statistics_db-len>10937649309</Statistics_db-len>
<Statistics_hsp-len>0</Statistics_hsp-len>
<Statistics_eff-space>0</Statistics_eff-space>
<Statistics_kappa>0.041</Statistics_kappa>
<Statistics_lambda>0.267</Statistics_lambda>
<Statistics_entropy>0.14</Statistics_entropy>
</Statistics>
</Iteration_stat>
<Iteration_message>No hits found</Iteration_message>
</Iteration>
<Iteration>
<Iteration_iter-num>4</Iteration_iter-num>
<Iteration_query-ID>lcl|4_0</Iteration_query-ID>
<Iteration_query-def>G383C4U01AUSDH length=64</Iteration_query-def>
<Iteration_query-len>64</Iteration_query-len>
<Iteration_stat>
<Statistics>
<Statistics_db-num>31601460</Statistics_db-num>
<Statistics_db-len>10937649309</Statistics_db-len>
<Statistics_hsp-len>0</Statistics_hsp-len>
<Statistics_eff-space>0</Statistics_eff-space>
<Statistics_kappa>0.041</Statistics_kappa>
<Statistics_lambda>0.267</Statistics_lambda>
<Statistics_entropy>0.14</Statistics_entropy>
</Statistics>
</Iteration_stat>
<Iteration_message>No hits found</Iteration_message>
</Iteration>
<Iteration>
<Iteration_iter-num>5</Iteration_iter-num>
<Iteration_query-ID>lcl|5_0</Iteration_query-ID>
<Iteration_query-def>G383C4U01DPLAS length=224</Iteration_query-def>
<Iteration_query-len>224</Iteration_query-len>
<Iteration_hits>
<Hit>
<Hit_num>1</Hit_num>
<Hit_id>gi|460414860|ref|XP_004252780.1|</Hit_id>
<Hit_def>PREDICTED: exocyst complex component SEC3A-like [Solanum lycopersicum]</Hit_def>
<Hit_accession>XP_004252780</Hit_accession>
<Hit_len>888</Hit_len>
<Hit_hsps>
<Hsp>
<Hsp_num>1</Hsp_num>
<Hsp_bit-score>60.077</Hsp_bit-score>
<Hsp_score>144</Hsp_score>
<Hsp_evalue>1.95683e-09</Hsp_evalue>
<Hsp_query-from>61</Hsp_query-from>
<Hsp_query-to>222</Hsp_query-to>
<Hsp_hit-from>30</Hsp_hit-from>
<Hsp_hit-to>84</Hsp_hit-to>
<Hsp_query-frame>1</Hsp_query-frame>
<Hsp_identity>36</Hsp_identity>
<Hsp_positive>37</Hsp_positive>
<Hsp_gaps>2</Hsp_gaps>
<Hsp_align-len>56</Hsp_align-len>
<Hsp_qseq>IRVAKSRGIWESTAN--RSPNAKPRFVAISTKAKATTN*KHFSES*KYSTGGVLEP</Hsp_qseq>
<Hsp_hseq>IRVAKSRGIWAKTGKLGRSHTAKPRVIAISTKAKGQRT-KAFLHVLKYSTGGVLEP</Hsp_hseq>
<Hsp_midline>IRVAKSRGIW T RS AKPR +AISTKAK K F KYSTGGVLEP</Hsp_midline>
</Hsp>
</Hit_hsps>
</Hit>
<Hit>
<Hit_num>2</Hit_num>
<Hit_id>gi|225458426|ref|XP_002283704.1|</Hit_id>
<Hit_def>PREDICTED: exocyst complex component SEC3A isoform 1 [Vitis vinifera] >gi|302142418|emb|CBI19621.3| unnamed protein product [Vitis vinifera]</Hit_def>
<Hit_accession>XP_002283704</Hit_accession>
<Hit_len>886</Hit_len>
<Hit_hsps>
<Hsp>
<Hsp_num>1</Hsp_num>
<Hsp_bit-score>56.6102</Hsp_bit-score>
<Hsp_score>135</Hsp_score>
<Hsp_evalue>3.26752e-08</Hsp_evalue>
<Hsp_query-from>61</Hsp_query-from>
<Hsp_query-to>222</Hsp_query-to>
<Hsp_hit-from>30</Hsp_hit-from>
<Hsp_hit-to>83</Hsp_hit-to>
<Hsp_query-frame>1</Hsp_query-frame>
<Hsp_identity>34</Hsp_identity>
<Hsp_positive>37</Hsp_positive>
<Hsp_gaps>1</Hsp_gaps>
<Hsp_align-len>55</Hsp_align-len>
<Hsp_qseq>IRVAKSRGIWESTANRSPN-AKPRFVAISTKAKATTN*KHFSES*KYSTGGVLEP</Hsp_qseq>
<Hsp_hseq>IRVAKSRGIWGKSGKLGRNMAKPRVLALSTKAKAQRT-KAFLRVLKYSTGGVLEP</Hsp_hseq>
<Hsp_midline>IRVAKSRGIW + N AKPR +A+STKAKA K F KYSTGGVLEP</Hsp_midline>
</Hsp>
</Hit_hsps>
</Hit>
<Hit>
<Hit_num>3</Hit_num>
<Hit_id>gi|359492097|ref|XP_003634363.1|</Hit_id>
<Hit_def>PREDICTED: exocyst complex component SEC3A isoform 2 [Vitis vinifera]</Hit_def>
<Hit_accession>XP_003634363</Hit_accession>
<Hit_len>887</Hit_len>
<Hit_hsps>
<Hsp>
<Hsp_num>1</Hsp_num>
<Hsp_bit-score>56.6102</Hsp_bit-score>
<Hsp_score>135</Hsp_score>
<Hsp_evalue>3.26763e-08</Hsp_evalue>
<Hsp_query-from>61</Hsp_query-from>
<Hsp_query-to>222</Hsp_query-to>
<Hsp_hit-from>30</Hsp_hit-from>
<Hsp_hit-to>83</Hsp_hit-to>
<Hsp_query-frame>1</Hsp_query-frame>
<Hsp_identity>34</Hsp_identity>
<Hsp_positive>37</Hsp_positive>
<Hsp_gaps>1</Hsp_gaps>
<Hsp_align-len>55</Hsp_align-len>
<Hsp_qseq>IRVAKSRGIWESTANRSPN-AKPRFVAISTKAKATTN*KHFSES*KYSTGGVLEP</Hsp_qseq>
<Hsp_hseq>IRVAKSRGIWGKSGKLGRNMAKPRVLALSTKAKAQRT-KAFLRVLKYSTGGVLEP</Hsp_hseq>
<Hsp_midline>IRVAKSRGIW + N AKPR +A+STKAKA K F KYSTGGVLEP</Hsp_midline>
</Hsp>
</Hit_hsps>
</Hit>
<Hit>
<Hit_num>4</Hit_num>
<Hit_id>gi|255538520|ref|XP_002510325.1|</Hit_id>
<Hit_def>exocyst complex component sec3, putative [Ricinus communis] >gi|223551026|gb|EEF52512.1| exocyst complex component sec3, putative [Ricinus communis]</Hit_def>
<Hit_accession>XP_002510325</Hit_accession>
<Hit_len>889</Hit_len>
<Hit_hsps>
<Hsp>
<Hsp_num>1</Hsp_num>
<Hsp_bit-score>53.9138</Hsp_bit-score>
<Hsp_score>128</Hsp_score>
<Hsp_evalue>2.91784e-07</Hsp_evalue>
<Hsp_query-from>61</Hsp_query-from>
<Hsp_query-to>222</Hsp_query-to>
<Hsp_hit-from>30</Hsp_hit-from>
<Hsp_hit-to>83</Hsp_hit-to>
<Hsp_query-frame>1</Hsp_query-frame>
<Hsp_identity>32</Hsp_identity>
<Hsp_positive>36</Hsp_positive>
<Hsp_gaps>1</Hsp_gaps>
<Hsp_align-len>55</Hsp_align-len>
<Hsp_qseq>IRVAKSRGIWESTANRSPN-AKPRFVAISTKAKATTN*KHFSES*KYSTGGVLEP</Hsp_qseq>
<Hsp_hseq>IRVAKSRGIWGKSGKLGRQMAKPRVLALSTKSKGTRT-KAFLRVLKYSTGGVLEP</Hsp_hseq>
<Hsp_midline>IRVAKSRGIW + AKPR +A+STK+K T K F KYSTGGVLEP</Hsp_midline>
</Hsp>
</Hit_hsps>
</Hit>
<Hit>
<Hit_num>5</Hit_num>
<Hit_id>gi|449460129|ref|XP_004147798.1|</Hit_id>
<Hit_def>PREDICTED: exocyst complex component SEC3A-like [Cucumis sativus]</Hit_def>
<Hit_accession>XP_004147798</Hit_accession>
<Hit_len>883</Hit_len>
<Hit_hsps>
<Hsp>
<Hsp_num>1</Hsp_num>
<Hsp_bit-score>52.7582</Hsp_bit-score>
<Hsp_score>125</Hsp_score>
<Hsp_evalue>7.46528e-07</Hsp_evalue>
<Hsp_query-from>61</Hsp_query-from>
<Hsp_query-to>222</Hsp_query-to>
<Hsp_hit-from>30</Hsp_hit-from>
<Hsp_hit-to>84</Hsp_hit-to>
<Hsp_query-frame>1</Hsp_query-frame>
<Hsp_identity>32</Hsp_identity>
<Hsp_positive>35</Hsp_positive>
<Hsp_gaps>2</Hsp_gaps>
<Hsp_align-len>56</Hsp_align-len>
<Hsp_qseq>IRVAKSRGIWESTA--NRSPNAKPRFVAISTKAKATTN*KHFSES*KYSTGGVLEP</Hsp_qseq>
<Hsp_hseq>IRVAKSRGIWGKSGMLGRQQMAKPRVLALSTKEKGPRT-KAFLRVLKYSTGGVLEP</Hsp_hseq>
<Hsp_midline>IRVAKSRGIW + R AKPR +A+STK K K F KYSTGGVLEP</Hsp_midline>
</Hsp>
</Hit_hsps>
</Hit>
</Iteration_hits>
<Iteration_stat>
<Statistics>
<Statistics_db-num>31601460</Statistics_db-num>
<Statistics_db-len>10937649309</Statistics_db-len>
<Statistics_hsp-len>0</Statistics_hsp-len>
<Statistics_eff-space>0</Statistics_eff-space>
....
and a B file contain the names of interest:
Code :
G383C4U01AUSDH
G383C4U01DPLAS
..
I wanna get a C file like this:
Code :
<Iteration>
<Iteration_iter-num>4</Iteration_iter-num>
<Iteration_query-ID>lcl|4_0</Iteration_query-ID>
<Iteration_query-def>G383C4U01AUSDH length=64</Iteration_query-def>
<Iteration_query-len>64</Iteration_query-len>
<Iteration_stat>
<Statistics>
<Statistics_db-num>31601460</Statistics_db-num>
<Statistics_db-len>10937649309</Statistics_db-len>
<Statistics_hsp-len>0</Statistics_hsp-len>
<Statistics_eff-space>0</Statistics_eff-space>
<Statistics_kappa>0.041</Statistics_kappa>
<Statistics_lambda>0.267</Statistics_lambda>
<Statistics_entropy>0.14</Statistics_entropy>
</Statistics>
</Iteration_stat>
<Iteration_message>No hits found</Iteration_message>
</Iteration>
<Iteration>
<Iteration_iter-num>5</Iteration_iter-num>
<Iteration_query-ID>lcl|5_0</Iteration_query-ID>
<Iteration_query-def>G383C4U01DPLAS length=224</Iteration_query-def>
<Iteration_query-len>224</Iteration_query-len>
<Iteration_hits>
<Hit>
<Hit_num>1</Hit_num>
<Hit_id>gi|460414860|ref|XP_004252780.1|</Hit_id>
<Hit_def>PREDICTED: exocyst complex component SEC3A-like [Solanum lycopersicum]</Hit_def>
<Hit_accession>XP_004252780</Hit_accession>
<Hit_len>888</Hit_len>
<Hit_hsps>
<Hsp>
<Hsp_num>1</Hsp_num>
<Hsp_bit-score>60.077</Hsp_bit-score>
<Hsp_score>144</Hsp_score>
<Hsp_evalue>1.95683e-09</Hsp_evalue>
<Hsp_query-from>61</Hsp_query-from>
<Hsp_query-to>222</Hsp_query-to>
<Hsp_hit-from>30</Hsp_hit-from>
<Hsp_hit-to>84</Hsp_hit-to>
<Hsp_query-frame>1</Hsp_query-frame>
<Hsp_identity>36</Hsp_identity>
<Hsp_positive>37</Hsp_positive>
<Hsp_gaps>2</Hsp_gaps>
<Hsp_align-len>56</Hsp_align-len>
<Hsp_qseq>IRVAKSRGIWESTAN--RSPNAKPRFVAISTKAKATTN*KHFSES*KYSTGGVLEP</Hsp_qseq>
<Hsp_hseq>IRVAKSRGIWAKTGKLGRSHTAKPRVIAISTKAKGQRT-KAFLHVLKYSTGGVLEP</Hsp_hseq>
<Hsp_midline>IRVAKSRGIW T RS AKPR +AISTKAK K F KYSTGGVLEP</Hsp_midline>
</Hsp>
</Hit_hsps>
</Hit>
<Hit>
<Hit_num>2</Hit_num>
<Hit_id>gi|225458426|ref|XP_002283704.1|</Hit_id>
<Hit_def>PREDICTED: exocyst complex component SEC3A isoform 1 [Vitis vinifera] >gi|302142418|emb|CBI19621.3| unnamed protein product [Vitis vinifera]</Hit_def>
<Hit_accession>XP_002283704</Hit_accession>
<Hit_len>886</Hit_len>
<Hit_hsps>
<Hsp>
<Hsp_num>1</Hsp_num>
<Hsp_bit-score>56.6102</Hsp_bit-score>
<Hsp_score>135</Hsp_score>
<Hsp_evalue>3.26752e-08</Hsp_evalue>
<Hsp_query-from>61</Hsp_query-from>
<Hsp_query-to>222</Hsp_query-to>
<Hsp_hit-from>30</Hsp_hit-from>
<Hsp_hit-to>83</Hsp_hit-to>
<Hsp_query-frame>1</Hsp_query-frame>
<Hsp_identity>34</Hsp_identity>
<Hsp_positive>37</Hsp_positive>
<Hsp_gaps>1</Hsp_gaps>
<Hsp_align-len>55</Hsp_align-len>
<Hsp_qseq>IRVAKSRGIWESTANRSPN-AKPRFVAISTKAKATTN*KHFSES*KYSTGGVLEP</Hsp_qseq>
<Hsp_hseq>IRVAKSRGIWGKSGKLGRNMAKPRVLALSTKAKAQRT-KAFLRVLKYSTGGVLEP</Hsp_hseq>
<Hsp_midline>IRVAKSRGIW + N AKPR +A+STKAKA K F KYSTGGVLEP</Hsp_midline>
</Hsp>
</Hit_hsps>
</Hit>
<Hit>
<Hit_num>3</Hit_num>
<Hit_id>gi|359492097|ref|XP_003634363.1|</Hit_id>
<Hit_def>PREDICTED: exocyst complex component SEC3A isoform 2 [Vitis vinifera]</Hit_def>
<Hit_accession>XP_003634363</Hit_accession>
<Hit_len>887</Hit_len>
<Hit_hsps>
<Hsp>
<Hsp_num>1</Hsp_num>
<Hsp_bit-score>56.6102</Hsp_bit-score>
<Hsp_score>135</Hsp_score>
<Hsp_evalue>3.26763e-08</Hsp_evalue>
<Hsp_query-from>61</Hsp_query-from>
<Hsp_query-to>222</Hsp_query-to>
<Hsp_hit-from>30</Hsp_hit-from>
<Hsp_hit-to>83</Hsp_hit-to>
<Hsp_query-frame>1</Hsp_query-frame>
<Hsp_identity>34</Hsp_identity>
<Hsp_positive>37</Hsp_positive>
<Hsp_gaps>1</Hsp_gaps>
<Hsp_align-len>55</Hsp_align-len>
<Hsp_qseq>IRVAKSRGIWESTANRSPN-AKPRFVAISTKAKATTN*KHFSES*KYSTGGVLEP</Hsp_qseq>
<Hsp_hseq>IRVAKSRGIWGKSGKLGRNMAKPRVLALSTKAKAQRT-KAFLRVLKYSTGGVLEP</Hsp_hseq>
<Hsp_midline>IRVAKSRGIW + N AKPR +A+STKAKA K F KYSTGGVLEP</Hsp_midline>
</Hsp>
</Hit_hsps>
</Hit>
<Hit>
<Hit_num>4</Hit_num>
<Hit_id>gi|255538520|ref|XP_002510325.1|</Hit_id>
<Hit_def>exocyst complex component sec3, putative [Ricinus communis] >gi|223551026|gb|EEF52512.1| exocyst complex component sec3, putative [Ricinus communis]</Hit_def>
<Hit_accession>XP_002510325</Hit_accession>
<Hit_len>889</Hit_len>
<Hit_hsps>
<Hsp>
<Hsp_num>1</Hsp_num>
<Hsp_bit-score>53.9138</Hsp_bit-score>
<Hsp_score>128</Hsp_score>
<Hsp_evalue>2.91784e-07</Hsp_evalue>
<Hsp_query-from>61</Hsp_query-from>
<Hsp_query-to>222</Hsp_query-to>
<Hsp_hit-from>30</Hsp_hit-from>
<Hsp_hit-to>83</Hsp_hit-to>
<Hsp_query-frame>1</Hsp_query-frame>
<Hsp_identity>32</Hsp_identity>
<Hsp_positive>36</Hsp_positive>
<Hsp_gaps>1</Hsp_gaps>
<Hsp_align-len>55</Hsp_align-len>
<Hsp_qseq>IRVAKSRGIWESTANRSPN-AKPRFVAISTKAKATTN*KHFSES*KYSTGGVLEP</Hsp_qseq>
<Hsp_hseq>IRVAKSRGIWGKSGKLGRQMAKPRVLALSTKSKGTRT-KAFLRVLKYSTGGVLEP</Hsp_hseq>
<Hsp_midline>IRVAKSRGIW + AKPR +A+STK+K T K F KYSTGGVLEP</Hsp_midline>
</Hsp>
</Hit_hsps>
</Hit>
<Hit>
<Hit_num>5</Hit_num>
<Hit_id>gi|449460129|ref|XP_004147798.1|</Hit_id>
<Hit_def>PREDICTED: exocyst complex component SEC3A-like [Cucumis sativus]</Hit_def>
<Hit_accession>XP_004147798</Hit_accession>
<Hit_len>883</Hit_len>
<Hit_hsps>
<Hsp>
<Hsp_num>1</Hsp_num>
<Hsp_bit-score>52.7582</Hsp_bit-score>
<Hsp_score>125</Hsp_score>
<Hsp_evalue>7.46528e-07</Hsp_evalue>
<Hsp_query-from>61</Hsp_query-from>
<Hsp_query-to>222</Hsp_query-to>
<Hsp_hit-from>30</Hsp_hit-from>
<Hsp_hit-to>84</Hsp_hit-to>
<Hsp_query-frame>1</Hsp_query-frame>
<Hsp_identity>32</Hsp_identity>
<Hsp_positive>35</Hsp_positive>
<Hsp_gaps>2</Hsp_gaps>
<Hsp_align-len>56</Hsp_align-len>
<Hsp_qseq>IRVAKSRGIWESTA--NRSPNAKPRFVAISTKAKATTN*KHFSES*KYSTGGVLEP</Hsp_qseq>
<Hsp_hseq>IRVAKSRGIWGKSGMLGRQQMAKPRVLALSTKEKGPRT-KAFLRVLKYSTGGVLEP</Hsp_hseq>
<Hsp_midline>IRVAKSRGIW + R AKPR +A+STK K K F KYSTGGVLEP</Hsp_midline>
</Hsp>
</Hit_hsps>
</Hit>
</Iteration_hits>
<Iteration_stat>
<Statistics>
<Statistics_db-num>31601460</Statistics_db-num>
<Statistics_db-len>10937649309</Statistics_db-len>
<Statistics_hsp-len>0</Statistics_hsp-len>
<Statistics_eff-space>0</Statistics_eff-space>
...
10 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
Hello All,
I need some assistance to extract a piece of information from a huge file.
The file is like this one :
database information
ccccccccccccccccc
ccccccccccccccccc
ccccccccccccccccc
ccccccccccccccccc
os information
cccccccccccccccccc
cccccccccccccccccc... (2 Replies)
Discussion started by: Marcor
2 Replies
2. Shell Programming and Scripting
Hi,
Following is sample portion of the file;
<JDBCConnectionPool DriverName="oracle.jdbc.OracleDriver"
MaxCapacity="10" Name="MyApp_DevPool"
PasswordEncrypted="{3DES}7tXFH69Xg1c="
Properties="user=MYAPP_ADMIN" ShrinkingEnabled="false"
... (12 Replies)
Discussion started by: sujoy101
12 Replies
3. Shell Programming and Scripting
Good evening! Trying to make a shell script to parse log file and show only required information.
log file has 44 fields and alot of lines, each columns separated by ":".
log file is like:
first_1:3:4:5:6:1:3:4:5:something:notinterested
second_2:3:4:3:4:2
first_1:3:4:6:6:7:8
I am interested... (3 Replies)
Discussion started by: dummie55
3 Replies
4. Shell Programming and Scripting
Hi to all,
I got this content/pattern from file http.log.20110808.gz
mail1 httpd: Account Notice: close igchung@abc.com 2011/8/7 7:37:36 0:00:03 0 0 1
mail1 httpd: Account Information: login sastria9@abc.com proxy sid=gFp4DLm5HnU
mail1 httpd: Account Notice: close sastria9@abc.com... (16 Replies)
Discussion started by: Mr_47
16 Replies
5. Shell Programming and Scripting
I'm still new to bash script , I have a log file and I want to extract the items within the last 5 days . and also within the last 10 hours
the log file is like this : it has 14000 items started from march 2002 to january 2003
awk '{print $4}' < *.log |uniq -c|sort -g|tail -10
but... (14 Replies)
Discussion started by: matarsak
14 Replies
6. Shell Programming and Scripting
Hye ShamRock
If you can help me with this difficult task for me then it will save my day
Logs :
==================================================================================================================
... (4 Replies)
Discussion started by: SilvesterJ
4 Replies
7. Shell Programming and Scripting
Hello!
I need help :) I have a file like this:
AA BC FG
RF TT GH
DD FF HH
(a few number of rows and three columns) and I want to put the letters of each column in a variable step by step in order to give them as input in another script. So I would like to obtain:
for the 1° loop:... (11 Replies)
Discussion started by: edekP
11 Replies
8. Shell Programming and Scripting
Gents,
If is possible please help.
I have a big file (example attached) which contends exactly same value in column, but from column 2 to 6 these values are diff. I will like to compile for all records all columns like the example attached in .csv format (output.rar ).. The last column in the... (11 Replies)
Discussion started by: jiam912
11 Replies
9. Shell Programming and Scripting
In a particular directory, there can be 1000 files like below.
filename is job901.ksh
#!/bin/ksh
cront -x << EOJ
submit file=$PRODPATH/scripts/genReport.sh maxdelay=30
&node=xnode01
tname=job901
&pfile1=/prod/mldata/data/test1.dat
... (17 Replies)
Discussion started by: vedanta
17 Replies
10. UNIX for Beginners Questions & Answers
I need help to extract transcript information from gff3 file.
Here is the input
Chr01 JGI gene 82773 86941 . - . ID=Potri.001G000900;Name=Potri.001G000900
Chr01 JGI mRNA 82793 86530 . - . ID=PAC:27047814;Name=Potri.001G000900.1;pacid=27047814;longest=1;Parent=Potri.001G000900... (6 Replies)
Discussion started by: Maduranga
6 Replies