How to extract information from a file?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to extract information from a file?
# 1  
Old 05-20-2014
How to extract information from a file?

Hi, i have a file like this:
Code:
<Iteration>
      <Iteration_iter-num>3</Iteration_iter-num>
      <Iteration_query-ID>lcl|3_0</Iteration_query-ID>
      <Iteration_query-def>G383C4U01EQA0A length=197</Iteration_query-def>
      <Iteration_query-len>197</Iteration_query-len>
      <Iteration_stat>
        <Statistics>
          <Statistics_db-num>31601460</Statistics_db-num>
          <Statistics_db-len>10937649309</Statistics_db-len>
          <Statistics_hsp-len>0</Statistics_hsp-len>
          <Statistics_eff-space>0</Statistics_eff-space>
          <Statistics_kappa>0.041</Statistics_kappa>
          <Statistics_lambda>0.267</Statistics_lambda>
          <Statistics_entropy>0.14</Statistics_entropy>
        </Statistics>
      </Iteration_stat>
      <Iteration_message>No hits found</Iteration_message>
    </Iteration>
    <Iteration>
      <Iteration_iter-num>4</Iteration_iter-num>
      <Iteration_query-ID>lcl|4_0</Iteration_query-ID>
      <Iteration_query-def>G383C4U01AUSDH length=64</Iteration_query-def>
      <Iteration_query-len>64</Iteration_query-len>
      <Iteration_stat>
        <Statistics>
          <Statistics_db-num>31601460</Statistics_db-num>
          <Statistics_db-len>10937649309</Statistics_db-len>
          <Statistics_hsp-len>0</Statistics_hsp-len>
          <Statistics_eff-space>0</Statistics_eff-space>
          <Statistics_kappa>0.041</Statistics_kappa>
          <Statistics_lambda>0.267</Statistics_lambda>
          <Statistics_entropy>0.14</Statistics_entropy>
        </Statistics>
      </Iteration_stat>
      <Iteration_message>No hits found</Iteration_message>
    </Iteration>
    <Iteration>
      <Iteration_iter-num>5</Iteration_iter-num>
      <Iteration_query-ID>lcl|5_0</Iteration_query-ID>
      <Iteration_query-def>G383C4U01DPLAS length=224</Iteration_query-def>
      <Iteration_query-len>224</Iteration_query-len>
      <Iteration_hits>
        <Hit>
          <Hit_num>1</Hit_num>
          <Hit_id>gi|460414860|ref|XP_004252780.1|</Hit_id>
          <Hit_def>PREDICTED: exocyst complex component SEC3A-like [Solanum lycopersicum]</Hit_def>
          <Hit_accession>XP_004252780</Hit_accession>
          <Hit_len>888</Hit_len>
          <Hit_hsps>
            <Hsp>
              <Hsp_num>1</Hsp_num>
              <Hsp_bit-score>60.077</Hsp_bit-score>
              <Hsp_score>144</Hsp_score>
              <Hsp_evalue>1.95683e-09</Hsp_evalue>
              <Hsp_query-from>61</Hsp_query-from>
              <Hsp_query-to>222</Hsp_query-to>
              <Hsp_hit-from>30</Hsp_hit-from>
              <Hsp_hit-to>84</Hsp_hit-to>
              <Hsp_query-frame>1</Hsp_query-frame>
              <Hsp_identity>36</Hsp_identity>
              <Hsp_positive>37</Hsp_positive>
              <Hsp_gaps>2</Hsp_gaps>
              <Hsp_align-len>56</Hsp_align-len>
              <Hsp_qseq>IRVAKSRGIWESTAN--RSPNAKPRFVAISTKAKATTN*KHFSES*KYSTGGVLEP</Hsp_qseq>
              <Hsp_hseq>IRVAKSRGIWAKTGKLGRSHTAKPRVIAISTKAKGQRT-KAFLHVLKYSTGGVLEP</Hsp_hseq>
              <Hsp_midline>IRVAKSRGIW  T    RS  AKPR +AISTKAK     K F    KYSTGGVLEP</Hsp_midline>
            </Hsp>
          </Hit_hsps>
        </Hit>
        <Hit>
          <Hit_num>2</Hit_num>
          <Hit_id>gi|225458426|ref|XP_002283704.1|</Hit_id>
          <Hit_def>PREDICTED: exocyst complex component SEC3A isoform 1 [Vitis vinifera] &gt;gi|302142418|emb|CBI19621.3| unnamed protein product [Vitis vinifera]</Hit_def>
          <Hit_accession>XP_002283704</Hit_accession>
          <Hit_len>886</Hit_len>
          <Hit_hsps>
            <Hsp>
              <Hsp_num>1</Hsp_num>
              <Hsp_bit-score>56.6102</Hsp_bit-score>
              <Hsp_score>135</Hsp_score>
              <Hsp_evalue>3.26752e-08</Hsp_evalue>
              <Hsp_query-from>61</Hsp_query-from>
              <Hsp_query-to>222</Hsp_query-to>
              <Hsp_hit-from>30</Hsp_hit-from>
              <Hsp_hit-to>83</Hsp_hit-to>
              <Hsp_query-frame>1</Hsp_query-frame>
              <Hsp_identity>34</Hsp_identity>
              <Hsp_positive>37</Hsp_positive>
              <Hsp_gaps>1</Hsp_gaps>
              <Hsp_align-len>55</Hsp_align-len>
              <Hsp_qseq>IRVAKSRGIWESTANRSPN-AKPRFVAISTKAKATTN*KHFSES*KYSTGGVLEP</Hsp_qseq>
              <Hsp_hseq>IRVAKSRGIWGKSGKLGRNMAKPRVLALSTKAKAQRT-KAFLRVLKYSTGGVLEP</Hsp_hseq>
              <Hsp_midline>IRVAKSRGIW  +     N AKPR +A+STKAKA    K F    KYSTGGVLEP</Hsp_midline>
            </Hsp>
          </Hit_hsps>
        </Hit>
        <Hit>
          <Hit_num>3</Hit_num>
          <Hit_id>gi|359492097|ref|XP_003634363.1|</Hit_id>
          <Hit_def>PREDICTED: exocyst complex component SEC3A isoform 2 [Vitis vinifera]</Hit_def>
          <Hit_accession>XP_003634363</Hit_accession>
          <Hit_len>887</Hit_len>
          <Hit_hsps>
            <Hsp>
              <Hsp_num>1</Hsp_num>
              <Hsp_bit-score>56.6102</Hsp_bit-score>
              <Hsp_score>135</Hsp_score>
              <Hsp_evalue>3.26763e-08</Hsp_evalue>
              <Hsp_query-from>61</Hsp_query-from>
              <Hsp_query-to>222</Hsp_query-to>
              <Hsp_hit-from>30</Hsp_hit-from>
              <Hsp_hit-to>83</Hsp_hit-to>
              <Hsp_query-frame>1</Hsp_query-frame>
              <Hsp_identity>34</Hsp_identity>
              <Hsp_positive>37</Hsp_positive>
              <Hsp_gaps>1</Hsp_gaps>
              <Hsp_align-len>55</Hsp_align-len>
              <Hsp_qseq>IRVAKSRGIWESTANRSPN-AKPRFVAISTKAKATTN*KHFSES*KYSTGGVLEP</Hsp_qseq>
              <Hsp_hseq>IRVAKSRGIWGKSGKLGRNMAKPRVLALSTKAKAQRT-KAFLRVLKYSTGGVLEP</Hsp_hseq>
              <Hsp_midline>IRVAKSRGIW  +     N AKPR +A+STKAKA    K F    KYSTGGVLEP</Hsp_midline>
            </Hsp>
          </Hit_hsps>
        </Hit>
        <Hit>
          <Hit_num>4</Hit_num>
          <Hit_id>gi|255538520|ref|XP_002510325.1|</Hit_id>
          <Hit_def>exocyst complex component sec3, putative [Ricinus communis] &gt;gi|223551026|gb|EEF52512.1| exocyst complex component sec3, putative [Ricinus communis]</Hit_def>
          <Hit_accession>XP_002510325</Hit_accession>
          <Hit_len>889</Hit_len>
          <Hit_hsps>
            <Hsp>
              <Hsp_num>1</Hsp_num>
              <Hsp_bit-score>53.9138</Hsp_bit-score>
              <Hsp_score>128</Hsp_score>
              <Hsp_evalue>2.91784e-07</Hsp_evalue>
              <Hsp_query-from>61</Hsp_query-from>
              <Hsp_query-to>222</Hsp_query-to>
              <Hsp_hit-from>30</Hsp_hit-from>
              <Hsp_hit-to>83</Hsp_hit-to>
              <Hsp_query-frame>1</Hsp_query-frame>
              <Hsp_identity>32</Hsp_identity>
              <Hsp_positive>36</Hsp_positive>
              <Hsp_gaps>1</Hsp_gaps>
              <Hsp_align-len>55</Hsp_align-len>
              <Hsp_qseq>IRVAKSRGIWESTANRSPN-AKPRFVAISTKAKATTN*KHFSES*KYSTGGVLEP</Hsp_qseq>
              <Hsp_hseq>IRVAKSRGIWGKSGKLGRQMAKPRVLALSTKSKGTRT-KAFLRVLKYSTGGVLEP</Hsp_hseq>
              <Hsp_midline>IRVAKSRGIW  +       AKPR +A+STK+K T   K F    KYSTGGVLEP</Hsp_midline>
            </Hsp>
          </Hit_hsps>
        </Hit>
        <Hit>
          <Hit_num>5</Hit_num>
          <Hit_id>gi|449460129|ref|XP_004147798.1|</Hit_id>
          <Hit_def>PREDICTED: exocyst complex component SEC3A-like [Cucumis sativus]</Hit_def>
          <Hit_accession>XP_004147798</Hit_accession>
          <Hit_len>883</Hit_len>
          <Hit_hsps>
            <Hsp>
              <Hsp_num>1</Hsp_num>
              <Hsp_bit-score>52.7582</Hsp_bit-score>
              <Hsp_score>125</Hsp_score>
              <Hsp_evalue>7.46528e-07</Hsp_evalue>
              <Hsp_query-from>61</Hsp_query-from>
              <Hsp_query-to>222</Hsp_query-to>
              <Hsp_hit-from>30</Hsp_hit-from>
              <Hsp_hit-to>84</Hsp_hit-to>
              <Hsp_query-frame>1</Hsp_query-frame>
              <Hsp_identity>32</Hsp_identity>
              <Hsp_positive>35</Hsp_positive>
              <Hsp_gaps>2</Hsp_gaps>
              <Hsp_align-len>56</Hsp_align-len>
              <Hsp_qseq>IRVAKSRGIWESTA--NRSPNAKPRFVAISTKAKATTN*KHFSES*KYSTGGVLEP</Hsp_qseq>
              <Hsp_hseq>IRVAKSRGIWGKSGMLGRQQMAKPRVLALSTKEKGPRT-KAFLRVLKYSTGGVLEP</Hsp_hseq>
              <Hsp_midline>IRVAKSRGIW  +    R   AKPR +A+STK K     K F    KYSTGGVLEP</Hsp_midline>
            </Hsp>
          </Hit_hsps>
        </Hit>
      </Iteration_hits>
      <Iteration_stat>
        <Statistics>
          <Statistics_db-num>31601460</Statistics_db-num>
          <Statistics_db-len>10937649309</Statistics_db-len>
          <Statistics_hsp-len>0</Statistics_hsp-len>
          <Statistics_eff-space>0</Statistics_eff-space>

Every inquiry starts from <Iteration>, and ends with </Iteration>


I want to extract only the information from certain inquires, for example, from a B file:
Code:
G383C4U01AUSDH 
G383C4U01DPLAS
..


How could i do this?

Thanks.
# 2  
Old 05-20-2014
Quick and simple way:
Code:
awk '$1 == "Iteration_query-def" { print $2 }' RS="<" FS=">" iteration.xml

# 3  
Old 05-20-2014
Quote:
Originally Posted by Corona688
Quick and simple way:
Code:
awk '$1 == "Iteration_query-def" { print $2 }' RS="<" FS=">" iteration.xml

Actually both files are very large, and the inquires in file B are not continuous.
# 4  
Old 05-20-2014
In what way does my solution not work for you?

In what way does your data differ from what you posted?
# 5  
Old 05-20-2014
If you mean that you only want to extract information from between <Iteration> tags:

Code:
awk '/^Iteration/ { P=1 } ; P && ($1 == "Iteration_query-def") { print $2 } ; /^\/Iteration/ { P=0 }' RS="<" FS=">" iteration.xml

# 6  
Old 05-20-2014
Quote:
Originally Posted by Corona688
Quick and simple way:
Code:
awk '$1 == "Iteration_query-def" { print $2 }' RS="<" FS=">" iteration.xml

Sorry, i think i didn't say it clearly.

I have a xml file A like this:
Code:
<Iteration>
      <Iteration_iter-num>3</Iteration_iter-num>
      <Iteration_query-ID>lcl|3_0</Iteration_query-ID>
      <Iteration_query-def>G383C4U01EQA0A length=197</Iteration_query-def>
      <Iteration_query-len>197</Iteration_query-len>
      <Iteration_stat>
        <Statistics>
          <Statistics_db-num>31601460</Statistics_db-num>
          <Statistics_db-len>10937649309</Statistics_db-len>
          <Statistics_hsp-len>0</Statistics_hsp-len>
          <Statistics_eff-space>0</Statistics_eff-space>
          <Statistics_kappa>0.041</Statistics_kappa>
          <Statistics_lambda>0.267</Statistics_lambda>
          <Statistics_entropy>0.14</Statistics_entropy>
        </Statistics>
      </Iteration_stat>
      <Iteration_message>No hits found</Iteration_message>
    </Iteration>
    <Iteration>
      <Iteration_iter-num>4</Iteration_iter-num>
      <Iteration_query-ID>lcl|4_0</Iteration_query-ID>
      <Iteration_query-def>G383C4U01AUSDH length=64</Iteration_query-def>
      <Iteration_query-len>64</Iteration_query-len>
      <Iteration_stat>
        <Statistics>
          <Statistics_db-num>31601460</Statistics_db-num>
          <Statistics_db-len>10937649309</Statistics_db-len>
          <Statistics_hsp-len>0</Statistics_hsp-len>
          <Statistics_eff-space>0</Statistics_eff-space>
          <Statistics_kappa>0.041</Statistics_kappa>
          <Statistics_lambda>0.267</Statistics_lambda>
          <Statistics_entropy>0.14</Statistics_entropy>
        </Statistics>
      </Iteration_stat>
      <Iteration_message>No hits found</Iteration_message>
    </Iteration>
    <Iteration>
      <Iteration_iter-num>5</Iteration_iter-num>
      <Iteration_query-ID>lcl|5_0</Iteration_query-ID>
      <Iteration_query-def>G383C4U01DPLAS length=224</Iteration_query-def>
      <Iteration_query-len>224</Iteration_query-len>
      <Iteration_hits>
        <Hit>
          <Hit_num>1</Hit_num>
          <Hit_id>gi|460414860|ref|XP_004252780.1|</Hit_id>
          <Hit_def>PREDICTED: exocyst complex component SEC3A-like [Solanum lycopersicum]</Hit_def>
          <Hit_accession>XP_004252780</Hit_accession>
          <Hit_len>888</Hit_len>
          <Hit_hsps>
            <Hsp>
              <Hsp_num>1</Hsp_num>
              <Hsp_bit-score>60.077</Hsp_bit-score>
              <Hsp_score>144</Hsp_score>
              <Hsp_evalue>1.95683e-09</Hsp_evalue>
              <Hsp_query-from>61</Hsp_query-from>
              <Hsp_query-to>222</Hsp_query-to>
              <Hsp_hit-from>30</Hsp_hit-from>
              <Hsp_hit-to>84</Hsp_hit-to>
              <Hsp_query-frame>1</Hsp_query-frame>
              <Hsp_identity>36</Hsp_identity>
              <Hsp_positive>37</Hsp_positive>
              <Hsp_gaps>2</Hsp_gaps>
              <Hsp_align-len>56</Hsp_align-len>
              <Hsp_qseq>IRVAKSRGIWESTAN--RSPNAKPRFVAISTKAKATTN*KHFSES*KYSTGGVLEP</Hsp_qseq>
              <Hsp_hseq>IRVAKSRGIWAKTGKLGRSHTAKPRVIAISTKAKGQRT-KAFLHVLKYSTGGVLEP</Hsp_hseq>
              <Hsp_midline>IRVAKSRGIW  T    RS  AKPR +AISTKAK     K F    KYSTGGVLEP</Hsp_midline>
            </Hsp>
          </Hit_hsps>
        </Hit>
        <Hit>
          <Hit_num>2</Hit_num>
          <Hit_id>gi|225458426|ref|XP_002283704.1|</Hit_id>
          <Hit_def>PREDICTED: exocyst complex component SEC3A isoform 1 [Vitis vinifera] &gt;gi|302142418|emb|CBI19621.3| unnamed protein product [Vitis vinifera]</Hit_def>
          <Hit_accession>XP_002283704</Hit_accession>
          <Hit_len>886</Hit_len>
          <Hit_hsps>
            <Hsp>
              <Hsp_num>1</Hsp_num>
              <Hsp_bit-score>56.6102</Hsp_bit-score>
              <Hsp_score>135</Hsp_score>
              <Hsp_evalue>3.26752e-08</Hsp_evalue>
              <Hsp_query-from>61</Hsp_query-from>
              <Hsp_query-to>222</Hsp_query-to>
              <Hsp_hit-from>30</Hsp_hit-from>
              <Hsp_hit-to>83</Hsp_hit-to>
              <Hsp_query-frame>1</Hsp_query-frame>
              <Hsp_identity>34</Hsp_identity>
              <Hsp_positive>37</Hsp_positive>
              <Hsp_gaps>1</Hsp_gaps>
              <Hsp_align-len>55</Hsp_align-len>
              <Hsp_qseq>IRVAKSRGIWESTANRSPN-AKPRFVAISTKAKATTN*KHFSES*KYSTGGVLEP</Hsp_qseq>
              <Hsp_hseq>IRVAKSRGIWGKSGKLGRNMAKPRVLALSTKAKAQRT-KAFLRVLKYSTGGVLEP</Hsp_hseq>
              <Hsp_midline>IRVAKSRGIW  +     N AKPR +A+STKAKA    K F    KYSTGGVLEP</Hsp_midline>
            </Hsp>
          </Hit_hsps>
        </Hit>
        <Hit>
          <Hit_num>3</Hit_num>
          <Hit_id>gi|359492097|ref|XP_003634363.1|</Hit_id>
          <Hit_def>PREDICTED: exocyst complex component SEC3A isoform 2 [Vitis vinifera]</Hit_def>
          <Hit_accession>XP_003634363</Hit_accession>
          <Hit_len>887</Hit_len>
          <Hit_hsps>
            <Hsp>
              <Hsp_num>1</Hsp_num>
              <Hsp_bit-score>56.6102</Hsp_bit-score>
              <Hsp_score>135</Hsp_score>
              <Hsp_evalue>3.26763e-08</Hsp_evalue>
              <Hsp_query-from>61</Hsp_query-from>
              <Hsp_query-to>222</Hsp_query-to>
              <Hsp_hit-from>30</Hsp_hit-from>
              <Hsp_hit-to>83</Hsp_hit-to>
              <Hsp_query-frame>1</Hsp_query-frame>
              <Hsp_identity>34</Hsp_identity>
              <Hsp_positive>37</Hsp_positive>
              <Hsp_gaps>1</Hsp_gaps>
              <Hsp_align-len>55</Hsp_align-len>
              <Hsp_qseq>IRVAKSRGIWESTANRSPN-AKPRFVAISTKAKATTN*KHFSES*KYSTGGVLEP</Hsp_qseq>
              <Hsp_hseq>IRVAKSRGIWGKSGKLGRNMAKPRVLALSTKAKAQRT-KAFLRVLKYSTGGVLEP</Hsp_hseq>
              <Hsp_midline>IRVAKSRGIW  +     N AKPR +A+STKAKA    K F    KYSTGGVLEP</Hsp_midline>
            </Hsp>
          </Hit_hsps>
        </Hit>
        <Hit>
          <Hit_num>4</Hit_num>
          <Hit_id>gi|255538520|ref|XP_002510325.1|</Hit_id>
          <Hit_def>exocyst complex component sec3, putative [Ricinus communis] &gt;gi|223551026|gb|EEF52512.1| exocyst complex component sec3, putative [Ricinus communis]</Hit_def>
          <Hit_accession>XP_002510325</Hit_accession>
          <Hit_len>889</Hit_len>
          <Hit_hsps>
            <Hsp>
              <Hsp_num>1</Hsp_num>
              <Hsp_bit-score>53.9138</Hsp_bit-score>
              <Hsp_score>128</Hsp_score>
              <Hsp_evalue>2.91784e-07</Hsp_evalue>
              <Hsp_query-from>61</Hsp_query-from>
              <Hsp_query-to>222</Hsp_query-to>
              <Hsp_hit-from>30</Hsp_hit-from>
              <Hsp_hit-to>83</Hsp_hit-to>
              <Hsp_query-frame>1</Hsp_query-frame>
              <Hsp_identity>32</Hsp_identity>
              <Hsp_positive>36</Hsp_positive>
              <Hsp_gaps>1</Hsp_gaps>
              <Hsp_align-len>55</Hsp_align-len>
              <Hsp_qseq>IRVAKSRGIWESTANRSPN-AKPRFVAISTKAKATTN*KHFSES*KYSTGGVLEP</Hsp_qseq>
              <Hsp_hseq>IRVAKSRGIWGKSGKLGRQMAKPRVLALSTKSKGTRT-KAFLRVLKYSTGGVLEP</Hsp_hseq>
              <Hsp_midline>IRVAKSRGIW  +       AKPR +A+STK+K T   K F    KYSTGGVLEP</Hsp_midline>
            </Hsp>
          </Hit_hsps>
        </Hit>
        <Hit>
          <Hit_num>5</Hit_num>
          <Hit_id>gi|449460129|ref|XP_004147798.1|</Hit_id>
          <Hit_def>PREDICTED: exocyst complex component SEC3A-like [Cucumis sativus]</Hit_def>
          <Hit_accession>XP_004147798</Hit_accession>
          <Hit_len>883</Hit_len>
          <Hit_hsps>
            <Hsp>
              <Hsp_num>1</Hsp_num>
              <Hsp_bit-score>52.7582</Hsp_bit-score>
              <Hsp_score>125</Hsp_score>
              <Hsp_evalue>7.46528e-07</Hsp_evalue>
              <Hsp_query-from>61</Hsp_query-from>
              <Hsp_query-to>222</Hsp_query-to>
              <Hsp_hit-from>30</Hsp_hit-from>
              <Hsp_hit-to>84</Hsp_hit-to>
              <Hsp_query-frame>1</Hsp_query-frame>
              <Hsp_identity>32</Hsp_identity>
              <Hsp_positive>35</Hsp_positive>
              <Hsp_gaps>2</Hsp_gaps>
              <Hsp_align-len>56</Hsp_align-len>
              <Hsp_qseq>IRVAKSRGIWESTA--NRSPNAKPRFVAISTKAKATTN*KHFSES*KYSTGGVLEP</Hsp_qseq>
              <Hsp_hseq>IRVAKSRGIWGKSGMLGRQQMAKPRVLALSTKEKGPRT-KAFLRVLKYSTGGVLEP</Hsp_hseq>
              <Hsp_midline>IRVAKSRGIW  +    R   AKPR +A+STK K     K F    KYSTGGVLEP</Hsp_midline>
            </Hsp>
          </Hit_hsps>
        </Hit>
      </Iteration_hits>
      <Iteration_stat>
        <Statistics>
          <Statistics_db-num>31601460</Statistics_db-num>
          <Statistics_db-len>10937649309</Statistics_db-len>
          <Statistics_hsp-len>0</Statistics_hsp-len>
          <Statistics_eff-space>0</Statistics_eff-space>

....

and a B file contain the names of interest:

Code:
G383C4U01AUSDH 
G383C4U01DPLAS
..

I wanna get a C file like this:
Code:
   <Iteration>
      <Iteration_iter-num>4</Iteration_iter-num>
      <Iteration_query-ID>lcl|4_0</Iteration_query-ID>
      <Iteration_query-def>G383C4U01AUSDH length=64</Iteration_query-def>
      <Iteration_query-len>64</Iteration_query-len>
      <Iteration_stat>
        <Statistics>
          <Statistics_db-num>31601460</Statistics_db-num>
          <Statistics_db-len>10937649309</Statistics_db-len>
          <Statistics_hsp-len>0</Statistics_hsp-len>
          <Statistics_eff-space>0</Statistics_eff-space>
          <Statistics_kappa>0.041</Statistics_kappa>
          <Statistics_lambda>0.267</Statistics_lambda>
          <Statistics_entropy>0.14</Statistics_entropy>
        </Statistics>
      </Iteration_stat>
      <Iteration_message>No hits found</Iteration_message>
    </Iteration>
    <Iteration>
      <Iteration_iter-num>5</Iteration_iter-num>
      <Iteration_query-ID>lcl|5_0</Iteration_query-ID>
      <Iteration_query-def>G383C4U01DPLAS length=224</Iteration_query-def>
      <Iteration_query-len>224</Iteration_query-len>
      <Iteration_hits>
        <Hit>
          <Hit_num>1</Hit_num>
          <Hit_id>gi|460414860|ref|XP_004252780.1|</Hit_id>
          <Hit_def>PREDICTED: exocyst complex component SEC3A-like [Solanum lycopersicum]</Hit_def>
          <Hit_accession>XP_004252780</Hit_accession>
          <Hit_len>888</Hit_len>
          <Hit_hsps>
            <Hsp>
              <Hsp_num>1</Hsp_num>
              <Hsp_bit-score>60.077</Hsp_bit-score>
              <Hsp_score>144</Hsp_score>
              <Hsp_evalue>1.95683e-09</Hsp_evalue>
              <Hsp_query-from>61</Hsp_query-from>
              <Hsp_query-to>222</Hsp_query-to>
              <Hsp_hit-from>30</Hsp_hit-from>
              <Hsp_hit-to>84</Hsp_hit-to>
              <Hsp_query-frame>1</Hsp_query-frame>
              <Hsp_identity>36</Hsp_identity>
              <Hsp_positive>37</Hsp_positive>
              <Hsp_gaps>2</Hsp_gaps>
              <Hsp_align-len>56</Hsp_align-len>
              <Hsp_qseq>IRVAKSRGIWESTAN--RSPNAKPRFVAISTKAKATTN*KHFSES*KYSTGGVLEP</Hsp_qseq>
              <Hsp_hseq>IRVAKSRGIWAKTGKLGRSHTAKPRVIAISTKAKGQRT-KAFLHVLKYSTGGVLEP</Hsp_hseq>
              <Hsp_midline>IRVAKSRGIW  T    RS  AKPR +AISTKAK     K F    KYSTGGVLEP</Hsp_midline>
            </Hsp>
          </Hit_hsps>
        </Hit>
        <Hit>
          <Hit_num>2</Hit_num>
          <Hit_id>gi|225458426|ref|XP_002283704.1|</Hit_id>
          <Hit_def>PREDICTED: exocyst complex component SEC3A isoform 1 [Vitis vinifera] &gt;gi|302142418|emb|CBI19621.3| unnamed protein product [Vitis vinifera]</Hit_def>
          <Hit_accession>XP_002283704</Hit_accession>
          <Hit_len>886</Hit_len>
          <Hit_hsps>
            <Hsp>
              <Hsp_num>1</Hsp_num>
              <Hsp_bit-score>56.6102</Hsp_bit-score>
              <Hsp_score>135</Hsp_score>
              <Hsp_evalue>3.26752e-08</Hsp_evalue>
              <Hsp_query-from>61</Hsp_query-from>
              <Hsp_query-to>222</Hsp_query-to>
              <Hsp_hit-from>30</Hsp_hit-from>
              <Hsp_hit-to>83</Hsp_hit-to>
              <Hsp_query-frame>1</Hsp_query-frame>
              <Hsp_identity>34</Hsp_identity>
              <Hsp_positive>37</Hsp_positive>
              <Hsp_gaps>1</Hsp_gaps>
              <Hsp_align-len>55</Hsp_align-len>
              <Hsp_qseq>IRVAKSRGIWESTANRSPN-AKPRFVAISTKAKATTN*KHFSES*KYSTGGVLEP</Hsp_qseq>
              <Hsp_hseq>IRVAKSRGIWGKSGKLGRNMAKPRVLALSTKAKAQRT-KAFLRVLKYSTGGVLEP</Hsp_hseq>
              <Hsp_midline>IRVAKSRGIW  +     N AKPR +A+STKAKA    K F    KYSTGGVLEP</Hsp_midline>
            </Hsp>
          </Hit_hsps>
        </Hit>
        <Hit>
          <Hit_num>3</Hit_num>
          <Hit_id>gi|359492097|ref|XP_003634363.1|</Hit_id>
          <Hit_def>PREDICTED: exocyst complex component SEC3A isoform 2 [Vitis vinifera]</Hit_def>
          <Hit_accession>XP_003634363</Hit_accession>
          <Hit_len>887</Hit_len>
          <Hit_hsps>
            <Hsp>
              <Hsp_num>1</Hsp_num>
              <Hsp_bit-score>56.6102</Hsp_bit-score>
              <Hsp_score>135</Hsp_score>
              <Hsp_evalue>3.26763e-08</Hsp_evalue>
              <Hsp_query-from>61</Hsp_query-from>
              <Hsp_query-to>222</Hsp_query-to>
              <Hsp_hit-from>30</Hsp_hit-from>
              <Hsp_hit-to>83</Hsp_hit-to>
              <Hsp_query-frame>1</Hsp_query-frame>
              <Hsp_identity>34</Hsp_identity>
              <Hsp_positive>37</Hsp_positive>
              <Hsp_gaps>1</Hsp_gaps>
              <Hsp_align-len>55</Hsp_align-len>
              <Hsp_qseq>IRVAKSRGIWESTANRSPN-AKPRFVAISTKAKATTN*KHFSES*KYSTGGVLEP</Hsp_qseq>
              <Hsp_hseq>IRVAKSRGIWGKSGKLGRNMAKPRVLALSTKAKAQRT-KAFLRVLKYSTGGVLEP</Hsp_hseq>
              <Hsp_midline>IRVAKSRGIW  +     N AKPR +A+STKAKA    K F    KYSTGGVLEP</Hsp_midline>
            </Hsp>
          </Hit_hsps>
        </Hit>
        <Hit>
          <Hit_num>4</Hit_num>
          <Hit_id>gi|255538520|ref|XP_002510325.1|</Hit_id>
          <Hit_def>exocyst complex component sec3, putative [Ricinus communis] &gt;gi|223551026|gb|EEF52512.1| exocyst complex component sec3, putative [Ricinus communis]</Hit_def>
          <Hit_accession>XP_002510325</Hit_accession>
          <Hit_len>889</Hit_len>
          <Hit_hsps>
            <Hsp>
              <Hsp_num>1</Hsp_num>
              <Hsp_bit-score>53.9138</Hsp_bit-score>
              <Hsp_score>128</Hsp_score>
              <Hsp_evalue>2.91784e-07</Hsp_evalue>
              <Hsp_query-from>61</Hsp_query-from>
              <Hsp_query-to>222</Hsp_query-to>
              <Hsp_hit-from>30</Hsp_hit-from>
              <Hsp_hit-to>83</Hsp_hit-to>
              <Hsp_query-frame>1</Hsp_query-frame>
              <Hsp_identity>32</Hsp_identity>
              <Hsp_positive>36</Hsp_positive>
              <Hsp_gaps>1</Hsp_gaps>
              <Hsp_align-len>55</Hsp_align-len>
              <Hsp_qseq>IRVAKSRGIWESTANRSPN-AKPRFVAISTKAKATTN*KHFSES*KYSTGGVLEP</Hsp_qseq>
              <Hsp_hseq>IRVAKSRGIWGKSGKLGRQMAKPRVLALSTKSKGTRT-KAFLRVLKYSTGGVLEP</Hsp_hseq>
              <Hsp_midline>IRVAKSRGIW  +       AKPR +A+STK+K T   K F    KYSTGGVLEP</Hsp_midline>
            </Hsp>
          </Hit_hsps>
        </Hit>
        <Hit>
          <Hit_num>5</Hit_num>
          <Hit_id>gi|449460129|ref|XP_004147798.1|</Hit_id>
          <Hit_def>PREDICTED: exocyst complex component SEC3A-like [Cucumis sativus]</Hit_def>
          <Hit_accession>XP_004147798</Hit_accession>
          <Hit_len>883</Hit_len>
          <Hit_hsps>
            <Hsp>
              <Hsp_num>1</Hsp_num>
              <Hsp_bit-score>52.7582</Hsp_bit-score>
              <Hsp_score>125</Hsp_score>
              <Hsp_evalue>7.46528e-07</Hsp_evalue>
              <Hsp_query-from>61</Hsp_query-from>
              <Hsp_query-to>222</Hsp_query-to>
              <Hsp_hit-from>30</Hsp_hit-from>
              <Hsp_hit-to>84</Hsp_hit-to>
              <Hsp_query-frame>1</Hsp_query-frame>
              <Hsp_identity>32</Hsp_identity>
              <Hsp_positive>35</Hsp_positive>
              <Hsp_gaps>2</Hsp_gaps>
              <Hsp_align-len>56</Hsp_align-len>
              <Hsp_qseq>IRVAKSRGIWESTA--NRSPNAKPRFVAISTKAKATTN*KHFSES*KYSTGGVLEP</Hsp_qseq>
              <Hsp_hseq>IRVAKSRGIWGKSGMLGRQQMAKPRVLALSTKEKGPRT-KAFLRVLKYSTGGVLEP</Hsp_hseq>
              <Hsp_midline>IRVAKSRGIW  +    R   AKPR +A+STK K     K F    KYSTGGVLEP</Hsp_midline>
            </Hsp>
          </Hit_hsps>
        </Hit>
      </Iteration_hits>
      <Iteration_stat>
        <Statistics>
          <Statistics_db-num>31601460</Statistics_db-num>
          <Statistics_db-len>10937649309</Statistics_db-len>
          <Statistics_hsp-len>0</Statistics_hsp-len>
          <Statistics_eff-space>0</Statistics_eff-space>
...

# 7  
Old 05-20-2014
Something more like this, then:

Code:
$ cat iteration.awk

BEGIN { while((getline <bfile) > 0) D[$1]=1; RS="<"; FS=">" }

$1 == "Iteration_query-def" {   split($2, Q, " ");      if(D[Q[1]]) M=1 }
$1 == "Iteration" {     P=1     }
P { R=R"<"$0 }
$1 == "/Iteration" { if(M) print R;  M=P=R="" }
END { if(M) print R }

$ awk -v bfile="b" -f iteration.awk a.xml

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

awk script to extract transcript information from gff3 file

I need help to extract transcript information from gff3 file. Here is the input Chr01 JGI gene 82773 86941 . - . ID=Potri.001G000900;Name=Potri.001G000900 Chr01 JGI mRNA 82793 86530 . - . ID=PAC:27047814;Name=Potri.001G000900.1;pacid=27047814;longest=1;Parent=Potri.001G000900... (6 Replies)
Discussion started by: Maduranga
6 Replies

2. Shell Programming and Scripting

Extract information from file

In a particular directory, there can be 1000 files like below. filename is job901.ksh #!/bin/ksh cront -x << EOJ submit file=$PRODPATH/scripts/genReport.sh maxdelay=30 &node=xnode01 tname=job901 &pfile1=/prod/mldata/data/test1.dat ... (17 Replies)
Discussion started by: vedanta
17 Replies

3. Shell Programming and Scripting

Extract information from file

Gents, If is possible please help. I have a big file (example attached) which contends exactly same value in column, but from column 2 to 6 these values are diff. I will like to compile for all records all columns like the example attached in .csv format (output.rar ).. The last column in the... (11 Replies)
Discussion started by: jiam912
11 Replies

4. Shell Programming and Scripting

Extract information from txt file

Hello! I need help :) I have a file like this: AA BC FG RF TT GH DD FF HH (a few number of rows and three columns) and I want to put the letters of each column in a variable step by step in order to give them as input in another script. So I would like to obtain: for the 1° loop:... (11 Replies)
Discussion started by: edekP
11 Replies

5. Shell Programming and Scripting

Extract various information from a log file

Hye ShamRock If you can help me with this difficult task for me then it will save my day Logs : ================================================================================================================== ... (4 Replies)
Discussion started by: SilvesterJ
4 Replies

6. Shell Programming and Scripting

extract information from a log file (last days)

I'm still new to bash script , I have a log file and I want to extract the items within the last 5 days . and also within the last 10 hours the log file is like this : it has 14000 items started from march 2002 to january 2003 awk '{print $4}' < *.log |uniq -c|sort -g|tail -10 but... (14 Replies)
Discussion started by: matarsak
14 Replies

7. Shell Programming and Scripting

Create shell script to extract unique information from one file to a new file.

Hi to all, I got this content/pattern from file http.log.20110808.gz mail1 httpd: Account Notice: close igchung@abc.com 2011/8/7 7:37:36 0:00:03 0 0 1 mail1 httpd: Account Information: login sastria9@abc.com proxy sid=gFp4DLm5HnU mail1 httpd: Account Notice: close sastria9@abc.com... (16 Replies)
Discussion started by: Mr_47
16 Replies

8. Shell Programming and Scripting

Extract information from Log file formatted

Good evening! Trying to make a shell script to parse log file and show only required information. log file has 44 fields and alot of lines, each columns separated by ":". log file is like: first_1:3:4:5:6:1:3:4:5:something:notinterested second_2:3:4:3:4:2 first_1:3:4:6:6:7:8 I am interested... (3 Replies)
Discussion started by: dummie55
3 Replies

9. Shell Programming and Scripting

extract and format information from a file

Hi, Following is sample portion of the file; <JDBCConnectionPool DriverName="oracle.jdbc.OracleDriver" MaxCapacity="10" Name="MyApp_DevPool" PasswordEncrypted="{3DES}7tXFH69Xg1c=" Properties="user=MYAPP_ADMIN" ShrinkingEnabled="false" ... (12 Replies)
Discussion started by: sujoy101
12 Replies

10. Shell Programming and Scripting

How to extract a piece of information from a huge file

Hello All, I need some assistance to extract a piece of information from a huge file. The file is like this one : database information ccccccccccccccccc ccccccccccccccccc ccccccccccccccccc ccccccccccccccccc os information cccccccccccccccccc cccccccccccccccccc... (2 Replies)
Discussion started by: Marcor
2 Replies
Login or Register to Ask a Question