How can i delete the content between all the occurences of two strings using sed or awk command


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How can i delete the content between all the occurences of two strings using sed or awk command
# 1  
Old 08-17-2011
How can i delete the content between all the occurences of two strings using sed or awk command

Hi. I have to delete the content between all the occurrences of the xml tags in a single file.

For example:

* The tags <script>.....................</script> occurs more than once in the same file.

* It follows tagging rules meaning a start tag will be followed by an end tag. Will not have two continuous similar opening tags.

* But the tags are not necessarily in separate lines.

I used the below script which has deleted just the first occurrence in the file.
Code:
sed -e "s/ <script>*?<\/script>//g" $INF > $OUTF

Please help me in doing this. Since i have to process huge amount of data, more efficient method would be better.

If there is any other way apart from sed and awk that would also be better.

Last edited by Franklin52; 08-17-2011 at 10:38 AM.. Reason: Please use code tags for data and code samples, thank you
# 2  
Old 08-17-2011
Something like this?
Code:
awk '/<script>/{p=1} /<\/script>/{p=0; next}!p' file

# 3  
Old 08-17-2011
looks like it's deleting from the begining of the line where the <script> and </script> are located.

test.file
Code:
this is <script> the first line in the file1
and the second </script> line
and the third line in the file

when I execute the script
awk '/<script>/{p=1} /<\/script>/{p=0; next}!p' test.file
I get
Code:
and the third line in the file

I am assuming Mr satheeshkumar want is

Code:
this is  
line
and the third line in the file

Thanks.
# 4  
Old 08-17-2011
Quote:
Originally Posted by Franklin52
Something like this?
Code:
awk '/<script>/{p=1} /<\/script>/{p=0; next}!p' file

Thanks Frank. It removes contents in all the occurrence of <script> and </script> tag. It removes the content in the complete line where the above tags are present.

Could you please help me to remove the content which are between those tags instead of removing everything in a line.

Example:

*Below is the output that i get when i execute your command

input file content:

client side<script>java script</script>java scripting is.......
server side<scipt>classic asp</script>ASP is a microsoft technology.......

satheesh here

output:

satheesh here

But i want the output as:

client side java scripting is.......
server side ASP is a microsoft technology.......

---------- Post updated at 09:28 AM ---------- Previous update was at 09:24 AM ----------

Quote:
Originally Posted by jville
looks like it's deleting from the begining of the line where the <script> and </script> are located.

test.file
Code:
this is <script> the first line in the file1
and the second </script> line
and the third line in the file

when I execute the script
awk '/<script>/{p=1} /<\/script>/{p=0; next}!p' test.file
I get
Code:
and the third line in the file

I am assuming Mr satheeshkumar want is

Code:
this is  
line
and the third line in the file

Thanks.
You are correct jville. Thats what i need exactly.Smilie
# 5  
Old 08-17-2011
Try:
Code:
awk '
/<script>/ && /<\/script>/{
  sub("<script>.*</script>",x)
  if($0){print}
  next
}
/<script>/{p=1} /<\/script>/{
  p=0; next
}
!p' file

# 6  
Old 08-17-2011
Frank it works fine for the first line. But when it comes to the next line of input it does the same error. It removes the complete line instead of just removing the content between the <script> </script> tag.

This time i get the output as

client side java scripting is.......

satheesh here

instead of

client side java scripting is.......
server side ASP is a microsoft technology.......
satheesh here

Second line of input has been deleted as the whole. It seems that the conditional deletion works only for first occurrence.

Thanks
Satheesh
# 7  
Old 08-17-2011
Code:
awk '
/<script>/ && /<\/script>/{
  sub("<script>.*</script>",x)
  if($0){print}
  next
}
/<script>/{
  sub("<script>.*",x)
  if($0){print}
  p=1
} 
/<\/script>/{
  sub(".*</script>",x)
  if($0){print} 
  p=0
  next
}
$0 && !p' file

This User Gave Thanks to Franklin52 For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Sed, awk or another bash command to modify string with the content of another file

Hello everybody, I would like modify some strings using sed or another command line with the content file. For example: - {fqdn: "server-01" , ip: "server-01"} - {fqdn: "server-02" , ip: "server-02"} - {fqdn: "server-03" , ip: "server-03"} - {fqdn: "server-04" , ip: "server-04"} My... (4 Replies)
Discussion started by: dco
4 Replies

2. Shell Programming and Scripting

awk to delete content before and after a matched pattern

Hello, I have been trying to write a script where I could get awk to delete data before and after a matched pattern. For eg Raw data Start NAME = John Age = 35 Occupation = Programmer City = New York Certification Completed = No Salary = 80000 End Start NAME = Mary Age = 25... (2 Replies)
Discussion started by: sidnow
2 Replies

3. UNIX for Advanced & Expert Users

Perl regex problem on strings with several occurences of one char

Hi all, i have the following line in a record file : retenu=non demande=non script=#vtbackup /path=/save/backup/demande position=140+70 and i want to use Perl regex to have the following output key : "retenu" value : "non" key : "demande" value "non" key : "script" value :... (2 Replies)
Discussion started by: Fundix
2 Replies

4. Shell Programming and Scripting

sed - delete content inside tags multiline

I need that a certain part of the content below excluded ==Image Gallery== followed by <gallery> and the content until </gallery> test SED1 ==Image Gallery== <gallery> Image:car1.jpg| Car 1<sup>1</sup> Imagem: car2.jpg| Car2<sup>2</sup> </gallery> test SED2 ==Image... (5 Replies)
Discussion started by: dperboni
5 Replies

5. Shell Programming and Scripting

Count the occurences of strings

I have some text files in a folder f1 with 10 columns. The first five columns of a file are shown below. aab abb 263-455 263 455 aab abb 263-455 263 455 aab abb 263-455 263 455 bbb abb 26-455 26 455 bbb abb 26-455 26 455 bbb aka 264-266 264 266 bga bga 230-232 230 ... (10 Replies)
Discussion started by: gomez
10 Replies

6. Shell Programming and Scripting

Delete 2 strings from 1 line with sed?

Hi guys, I wonder if it's possible to search for a line containing 2 strings and delete that line and perhaps replace the source file with already deleted line(s). What I mean is something like this: sourcefile.txt line1: something 122344 somethin2 24334 45554676 line2: another something... (6 Replies)
Discussion started by: netrom
6 Replies

7. Shell Programming and Scripting

Using Bash/Sed to delete between identical strings

Hi. I'm hoping that someone can help me with a bash script to delete a block of lines from a file. What I want to do is delete every line between two stings that are the same, including the line the first string is on but not the second. (Marked lines to match with !) For example if I... (2 Replies)
Discussion started by: Zykr
2 Replies

8. Shell Programming and Scripting

awk/sed/perl command to delete specific pattern and content above it...

Hi, Below is my input file: Data: 1 Length: 20 Got result. Data: 2 Length: 30 No result. Data: 3 Length: 20 (7 Replies)
Discussion started by: edge_diners
7 Replies

9. Shell Programming and Scripting

sed, awk [TAG]$content[/TAG] How to get var in $content in textfile?

Hello, I got a Qstion. Im posting to a phpbb forum with bash and curl.. i have a text file with the following tags that i post to the forum: $var1 $var2 $var3 How can i with sed or awk put var content from shell script between the ... in the... (7 Replies)
Discussion started by: atmosroll
7 Replies

10. UNIX for Dummies Questions & Answers

Counting occurences of different strings in a file

Hi, i'd like to know if the following is possible with a shell script, and can't find the answer in the search. Suppose i have a logfile build like this: # 8 :riuyzp1028 # 38 : riuyzp1028 # 25 : riuyvzp1032 # 30 : nlkljpa0202 # 1 : nlklja0205 # 38 : riuyzp1028 # 25 :... (4 Replies)
Discussion started by: Freerider
4 Replies
Login or Register to Ask a Question