How to delete lines like this??


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to delete lines like this??
# 1  
Old 04-18-2014
How to delete lines like this??

hi, i have a very large file like this:

Code:
 <Iteration>
      <Iteration_iter-num>1073</Iteration_iter-num>
      <Iteration_query-ID>lcl|1073_0</Iteration_query-ID>
      <Iteration_query-def>contig01073  length=1  numreads=7  gene=isogroup00087  status=icl_thresh</Iteration_query-def>
      <Iteration_query-len>1</Iteration_query-len>
      <Iteration_stat>
        <Statistics>
          <Statistics_db-num>31601460</Statistics_db-num>
          <Statistics_db-len>10937649309</Statistics_db-len>
          <Statistics_hsp-len>0</Statistics_hsp-len>
          <Statistics_eff-space>0</Statistics_eff-space>
          <Statistics_kappa>0.041</Statistics_kappa>
          <Statistics_lambda>0.267</Statistics_lambda>
          <Statistics_entropy>0.14</Statistics_entropy>
        </Statistics>
      </Iteration_stat>
      <Iteration_message>No hits found</Iteration_message>
    </Iteration>
    <Iteration>
      <Iteration_iter-num>1074</Iteration_iter-num>
      <Iteration_query-ID>lcl|1074_0</Iteration_query-ID>
      <Iteration_query-def>contig01074  length=737  numreads=30  gene=isogroup00088  status=isotig</Iteration_query-def>
      <Iteration_query-len>737</Iteration_query-len>
      <Iteration_hits>
        <Hit>
          <Hit_num>1</Hit_num>
          <Hit_id>gi|222630026|gb|EEE62158.1|</Hit_id>
          <Hit_def>hypothetical protein OsJ_16945 [Oryza sativa Japonica Group]</Hit_def>
          <Hit_accession>EEE62158</Hit_accession>
          <Hit_len>89</Hit_len>
          <Hit_hsps>
            <Hsp>
              <Hsp_num>1</Hsp_num>
              <Hsp_bit-score>69.707</Hsp_bit-score>
              <Hsp_score>169</Hsp_score>
              <Hsp_evalue>1.38738e-12</Hsp_evalue>
              <Hsp_query-from>275</Hsp_query-from>
              <Hsp_query-to>457</Hsp_query-to>
              <Hsp_hit-from>30</Hsp_hit-from>
              <Hsp_hit-to>89</Hsp_hit-to>
              <Hsp_query-frame>2</Hsp_query-frame>
              <Hsp_identity>32</Hsp_identity>
              <Hsp_positive>40</Hsp_positive>
              <Hsp_align-len>61</Hsp_align-len>
              <Hsp_qseq>TPSADAVGPCAACTILHRRCTDKCYLASYFPQGVEPHNFTVVDSLFGLSNVVELLQQNSNS</Hsp_qseq>
              <Hsp_hseq>TTTTVVLSPCAACKILRRRCVDRCVLAPYFPP-TEPHKFTTAHRVFGASNIIKLLQASSYS</Hsp_hseq>
              <Hsp_midline>T +   + PCAAC IL RRC D+C LA YFP   EPH FT    +FG SN+++LLQ +S S</Hsp_midline>
            </Hsp>
          </Hit_hsps>
        </Hit>
        <Hit>
          <Hit_num>2</Hit_num>
          <Hit_id>gi|413950127|gb|AFW82776.1|</Hit_id>
          <Hit_def>putative LOB domain-containing family protein [Zea mays]</Hit_def>
          <Hit_accession>AFW82776</Hit_accession>
          <Hit_len>212</Hit_len>
          <Hit_hsps>
            <Hsp>
              <Hsp_num>1</Hsp_num>
              <Hsp_bit-score>70.4774</Hsp_bit-score>
              <Hsp_score>171</Hsp_score>
              <Hsp_evalue>1.04484e-11</Hsp_evalue>
              <Hsp_query-from>278</Hsp_query-from>
              <Hsp_query-to>445</Hsp_query-to>
              <Hsp_hit-from>29</Hsp_hit-from>
              <Hsp_hit-to>83</Hsp_hit-to>
              <Hsp_query-frame>2</Hsp_query-frame>
              <Hsp_identity>31</Hsp_identity>
              <Hsp_positive>38</Hsp_positive>
              <Hsp_align-len>56</Hsp_align-len>
              <Hsp_qseq>PSADAVGPCAACTILHRRCTDKCYLASYFPQGVEPHNFTVVDSLFGLSNVVELLQQ</Hsp_qseq>
              <Hsp_hseq>PPAPALSPCAACKILRRRCVDRCVLAPYFPP-TEPHKFATAHRVFGASNIIKLLQE</Hsp_hseq>
              <Hsp_midline>P A A+ PCAAC IL RRC D+C LA YFP   EPH F     +FG SN+++LLQ+</Hsp_midline>
            </Hsp>
          </Hit_hsps>
        </Hit>
        <Hit>
          <Hit_num>3</Hit_num>
          <Hit_id>gi|413942115|gb|AFW74764.1|</Hit_id>
          <Hit_def>putative LOB domain-containing family protein [Zea mays]</Hit_def>
          <Hit_accession>AFW74764</Hit_accession>
          <Hit_len>221</Hit_len>
          <Hit_hsps>
            <Hsp>
              <Hsp_num>1</Hsp_num>
              <Hsp_bit-score>68.9366</Hsp_bit-score>
              <Hsp_score>167</Hsp_score>
              <Hsp_evalue>4.36373e-11</Hsp_evalue>
              <Hsp_query-from>278</Hsp_query-from>
              <Hsp_query-to>445</Hsp_query-to>
              <Hsp_hit-from>34</Hsp_hit-from>

If the length <100, i wanna delete the information of that particular cotig from <Iteration> to </Iteration>..

How could i achieve??

Thanks a lot
# 2  
Old 04-19-2014
Code:
awk -F"</Iteration>" -v RS="" '{for(i=1;i<=NF;i++)if(match($i,"length="))if(substr($i,RSTART+7,3)>100)printf("%s\n</Iteration>",$i)}' test.txt

# 3  
Old 04-19-2014
To accommodate the length 100, you might want to make the test >=100.
# 4  
Old 04-22-2014
hi, itkamaraj,
Thanks for your response, but it did not work. It seems only the length=100, 100x, 100xx were deleted.

Quote:
Originally Posted by itkamaraj
Code:
awk -F"</Iteration>" -v RS="" '{for(i=1;i<=NF;i++)if(match($i,"length="))if(substr($i,RSTART+7,3)>100)printf("%s\n</Iteration>",$i)}' test.txt

# 5  
Old 04-22-2014
replace 'file' with your input filename
Code:
awk 'NR > 0 && $0 ~ /<Iteration>/{print ""}1' file | \
awk '{n=split($0, a, "[ =]");
  for(i = 1; i <= n; i++)
    {if(a[i] == "length" && a[i+1] >= 100)
      {print $0; next}}}' RS=


Last edited by SriniShoo; 04-22-2014 at 02:35 PM.. Reason: correction
# 6  
Old 04-22-2014
Quote:
Originally Posted by the_simpsons
hi, itkamaraj,
Thanks for your response, but it did not work. It seems only the length=100, 100x, 100xx were deleted.
Replace the 3 by a 9!
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Delete multiple lines between blank lines containing two patterns

Hi all, I'm looking for a way (sed or awk) to delete multiple lines between blank lines containing two patterns ex: user: alpha parameter_1 = 15 parameter_2 = 1 parameter_3 = 0 user: alpha parameter_1 = 15 parameter_2 = 1 parameter_3 = 0 user: alpha parameter_1 = 16... (3 Replies)
Discussion started by: ce9888
3 Replies

2. Shell Programming and Scripting

Sed/awk to delete single lines that aren't touching other lines

Hello, I'm trying to figure out how to use sed or awk to delete single lines in a file. By single, I mean lines that are not touching any other lines (just one line with white space above and below). Example: one two three four five six seven eight I want it to look like: (6 Replies)
Discussion started by: slimjbe
6 Replies

3. Shell Programming and Scripting

search and replace, when found, delete multiple lines, add new set of lines?

hey guys, I tried searching but most 'search and replace' questions are related to one liners. Say I have a file to be replaced that has the following: $ cat testing.txt TESTING AAA BBB CCC DDD EEE FFF GGG HHH ENDTESTING This is the input file: (3 Replies)
Discussion started by: DeuceLee
3 Replies

4. UNIX for Advanced & Expert Users

In a huge file, Delete duplicate lines leaving unique lines

Hi All, I have a very huge file (4GB) which has duplicate lines. I want to delete duplicate lines leaving unique lines. Sort, uniq, awk '!x++' are not working as its running out of buffer space. I dont know if this works : I want to read each line of the File in a For Loop, and want to... (16 Replies)
Discussion started by: krishnix
16 Replies

5. Shell Programming and Scripting

need to delete all lines from a group of files except the 1st 2 lines

Hello, I have a group of text files with many lines in each file. I need to delete all the lines in each and only leave 2 lines in each file. (3 Replies)
Discussion started by: script_op2a
3 Replies

6. UNIX for Dummies Questions & Answers

How get only required lines & delete the rest of the lines in file

Hiiii I have a file which contains huge data as a.dat: PDE 1990 1 9 18 51 28.90 24.7500 95.2800 118.0 6.1 0.0 BURMA event name: 010990D time shift: 7.3000 half duration: 5.0000 latitude: 24.4200 longitude: 94.9500 depth: 129.6000 Mrr: ... (7 Replies)
Discussion started by: reva
7 Replies

7. Shell Programming and Scripting

sed problem - delete all lines until a match on 2 lines

First of all, I know this can be more eassily done with perl or other scripting languages but, that's not the issue. I need this in sed. (or wander if it's possible ) I got a file (trace file to recreate the control file from oracle for the dba boys) which contains some lines another line... (11 Replies)
Discussion started by: plelie2
11 Replies

8. Shell Programming and Scripting

How to delete lines in a file that have duplicates or derive the lines that aper once

Input: a b b c d d I need: a c I know how to get this (the lines that have duplicates) : b d sort file | uniq -d But i need opossite of this. I have searched the forum and other places as well, but have found solution for everything except this variant of the problem. (3 Replies)
Discussion started by: necroman08
3 Replies

9. Shell Programming and Scripting

Grep and delete lines except the lines with strings

Hi I am writing a script which should read a file and search for certain strings 'approved' or 'removed' and retain only those lines that contain the above strings. Ex: file name 'test' test: approved package waiting for approval package disapproved package removed package approved... (14 Replies)
Discussion started by: vj8436
14 Replies

10. Shell Programming and Scripting

How to delete first 5 lines and last five lines in all text files

Hi I want to delete first five and last five lines in text files without opening the file and also i want to keep the same file name for all the files. Thanks in advance!!! Ragav (10 Replies)
Discussion started by: ragavendran31
10 Replies
Login or Register to Ask a Question