Delete chunk of text if contains certain strings


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Delete chunk of text if contains certain strings
# 1  
Old 02-01-2012
Delete chunk of text if contains certain strings

Using awk how to delete chunk of text if it contains certain strings? As in the following, delete a reference chunk, i.e. everything from <reference attribute = "value"> to </reference> inclusive, if within it "Group ID" value is 7 or 96 or 103 or 1005.

Code:
<reference attribute = "value">
 <title>title</title>
 <refbody>
 <section><title>subtitle1</title><p>value</p></section>
 <section><title>subtitle2</title><p>1003</p></section>
 <section><title>subtitle3</title><p>value</p></section>
 <section><title>Group ID</title><p>7</p></section>
 <p>text</p>
 <p>text</p>
</refbody></reference>
<reference attribute = "value">
 <title>title</title>
 <refbody>
 <section><title>subtitle1</title><p>value</p></section>
 <section><title>subtitle2</title><p>value</p></section>
 <section><title>Group ID</title><p>1005</p></section>
 <section><title>subtitle4</title><p>value</p></section>
</refbody></reference>
<reference attribute = "value">
 <title>title</title>
 <refbody>
 <section><title>subtitle1</title><p>value</p></section>
 <section><title>subtitle2</title><p>103</p></section>
 <section><title>subtitle3</title><p>value</p></section>
 <section><title>Group ID</title><p>999</p></section>
 <section><title>subtitle5</title><p>value</p></section>
 <p>text</p>
 <p>text</p>
 <section><title>subtitle6</title><p>value</p></section>
</refbody></reference>
<reference attribute = "value">
<title>title</title>
 <refbody>
 <section><title>subtitle1</title><p>1005</p></section>
 <section><title>Group ID</title><p>501</p></section>
 <section><title>subtitle3</title><p>value</p></section>
</refbody></reference>

# 2  
Old 02-01-2012
Hi pioavi,

Why awk? In my opinion, it's not the tool for the job. I would use xpath, xquery or xslt instead, but here you have a script (probably needed the GNU version). Test it:
Code:
$ cat infile
<reference attribute = "value">
 <title>title</title>
 <refbody>
 <section><title>subtitle1</title><p>value</p></section>
 <section><title>subtitle2</title><p>1003</p></section>
 <section><title>subtitle3</title><p>value</p></section>
 <section><title>Group ID</title><p>7</p></section>
 <p>text</p>
 <p>text</p>
</refbody></reference>
<reference attribute = "value">
 <title>title</title>
 <refbody>
 <section><title>subtitle1</title><p>value</p></section>
 <section><title>subtitle2</title><p>value</p></section>
 <section><title>Group ID</title><p>1005</p></section>
 <section><title>subtitle4</title><p>value</p></section>
</refbody></reference>
<reference attribute = "value">
 <title>title</title>
 <refbody>
 <section><title>subtitle1</title><p>value</p></section>
 <section><title>subtitle2</title><p>103</p></section>
 <section><title>subtitle3</title><p>value</p></section>
 <section><title>Group ID</title><p>999</p></section>
 <section><title>subtitle5</title><p>value</p></section>
 <p>text</p>
 <p>text</p>
 <section><title>subtitle6</title><p>value</p></section>
</refbody></reference>
<reference attribute = "value">
<title>title</title>
 <refbody>
 <section><title>subtitle1</title><p>1005</p></section>
 <section><title>Group ID</title><p>501</p></section>
 <section><title>subtitle3</title><p>value</p></section>
</refbody></reference>
$ cat script.awk
BEGIN {
        IGNORECASE = 1
        RS = "</reference>"
}

$0 !~ /<title>Group ID<\/title><p>(7|96|103|1005)<\/p>/ {
        sub( /^\s*/, "" )
        print $0 RS
}
$ awk -f script.awk infile
<reference attribute = "value">
 <title>title</title>
 <refbody>
 <section><title>subtitle1</title><p>value</p></section>
 <section><title>subtitle2</title><p>103</p></section>
 <section><title>subtitle3</title><p>value</p></section>
 <section><title>Group ID</title><p>999</p></section>
 <section><title>subtitle5</title><p>value</p></section>
 <p>text</p>
 <p>text</p>
 <section><title>subtitle6</title><p>value</p></section>
</refbody></reference>
<reference attribute = "value">
<title>title</title>
 <refbody>
 <section><title>subtitle1</title><p>1005</p></section>
 <section><title>Group ID</title><p>501</p></section>
 <section><title>subtitle3</title><p>value</p></section>
</refbody></reference>
</reference>

Regards,
Birei
This User Gave Thanks to birei For This Post:
# 3  
Old 02-01-2012
Code:
 
$ nawk -F"</reference>" 'BEGIN{RS=""}{for(i=1;i<=NF;i++){if($i!~/Group ID.*\>(7|96|103|1005)\</){printf("%s</reference>\n",$i)}}}' test.txt        
<reference attribute = "value">
 <title>title</title>
 <refbody>
 <section><title>subtitle1</title><p>value</p></section>
 <section><title>subtitle2</title><p>103</p></section>
 <section><title>subtitle3</title><p>value</p></section>
 <section><title>Group ID</title><p>999</p></section>
 <section><title>subtitle5</title><p>value</p></section>
 <p>text</p>
 <p>text</p>
 <section><title>subtitle6</title><p>value</p></section>
</refbody></reference>
<reference attribute = "value">
<title>title</title>
 <refbody>
 <section><title>subtitle1</title><p>1005</p></section>
 <section><title>Group ID</title><p>501</p></section>
 <section><title>subtitle3</title><p>value</p></section>
</refbody></reference>
</reference>

# 4  
Old 02-01-2012
Thank you birei, for the script and the pointer. Thank you too itkamaraj.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Isolating a chunk of text using php

greetings, i'll start by stating; i am NOT looking for the EXACT syntax to my query but a simple yes or no of its possibility. and if you're feeling generous maybe the php function(s) that i'd use as a jump start. i could use bash but i really want to take a shot at doing this with php. the... (0 Replies)
Discussion started by: crimso
0 Replies

2. Shell Programming and Scripting

How to delete strings in a file?

hi, i have a big file like this: >s31 length=12 numreads=6 gene=isotig454 status=igo ldfddfdfdfdkkkkkkfdfdkkkksdfdkkkkkkkkkksdfd dfdfdfldfdkdffdlfddflfdjkkkkkkfdgkkgfhghfgkkk ldfddfdfdfdkkkkkkfdfdkkkksdfdkkkkkkkkkksdfd dfdfdfldfdkdffdlfddflfdjkkkkkkfdgkkgfhghfgkkk >c2 length =344... (4 Replies)
Discussion started by: the_simpsons
4 Replies

3. Shell Programming and Scripting

Delete duplicate strings in a line

Hi, i need help to remove duplicates in my file. The problem is i need to delete one duplicate for each line only. the input file as follows and it is not tab delimited:- The output need to remove 2nd word (in red) that duplicate with 1st word (in blue). Other duplicates should remained... (12 Replies)
Discussion started by: redse171
12 Replies

4. Shell Programming and Scripting

Delete 2 strings from 1 line with sed?

Hi guys, I wonder if it's possible to search for a line containing 2 strings and delete that line and perhaps replace the source file with already deleted line(s). What I mean is something like this: sourcefile.txt line1: something 122344 somethin2 24334 45554676 line2: another something... (6 Replies)
Discussion started by: netrom
6 Replies

5. Shell Programming and Scripting

Grabbing a chunk of text from a file

Hi, I have a Report.txt file. Say the contents of this file are : 1 2 3 4 5 7 df v g gf e r dfkf lsdk dslsdklsdk Report Start: xxxxxxdad asdffsdfsdfsdfasfasdffasdf sadfasdfsadffsfsdf Report End. sdfasdfasdf sdfasfdasdfasdfasdfasdf sadfasdfsdf I need to grab from Report Start... (3 Replies)
Discussion started by: mrskittles99
3 Replies

6. Shell Programming and Scripting

Delete lines in file containing duplicate strings, keeping longer strings

The question is not as simple as the title... I have a file, it looks like this <string name="string1">RZ-LED</string> <string name="string2">2.0</string> <string name="string2">Version 2.0</string> <string name="string3">BP</string> I would like to check for duplicate entries of... (11 Replies)
Discussion started by: raidzero
11 Replies

7. UNIX for Dummies Questions & Answers

Delete strings in file1 based on the list of strings in file2

Hello guys, should be a very easy questn for you: I need to delete strings in file1 based on the list of strings in file2. like file2: word1_word2_ word3_word5_ word3_word4_ word6_word7_ file1: word1_word2_otherwords..,word3_word5_others... (7 Replies)
Discussion started by: roussine
7 Replies

8. Shell Programming and Scripting

Delete Strings that are present in another file

HI, if a String is present in file1.txt, i want to delete that String from file2.txt. How can i do this?? I am sure that the file1.txt is a subset of file2.txt. (2 Replies)
Discussion started by: jathin12
2 Replies

9. Shell Programming and Scripting

recursively delete the text between 2 strings from a file

i have 200000bytes size of a unix file i need to delete some text between two strings recursively using a loop with sed or awk . these two strings are : 1st string getting from a file :::2 nd string is fi...its constant . can anyone help me sed -n'/<1 st string >/,/fi/' <input_filename> is the... (2 Replies)
Discussion started by: santosh1234
2 Replies

10. Shell Programming and Scripting

Delete strings in a file

Hi, I have a file named status.txt that looks like the file below. What I want to do is to delete the part <status> and </status> and just leave the number and print each number per line. How can I do it? If I will use sed or awk how can I do it? I tried with sed but it didn't work. Maybe I... (8 Replies)
Discussion started by: ayhanne
8 Replies
Login or Register to Ask a Question