sed - delete content inside tags multiline


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting sed - delete content inside tags multiline
# 1  
Old 10-26-2014
Linux sed - delete content inside tags multiline

I need that a certain part of the content below excluded
==Image Gallery== followed by <gallery> and the content until </gallery>

Code:
test SED1

==Image Gallery==
<gallery>
Image:car1.jpg| Car 1<sup>1</sup>
Imagem: car2.jpg| Car2<sup>2</sup>
</gallery> test SED2

==Image Gallery==<gallery> Image:car3.jpg | car3<sup>1</sup> </gallery>test SED3
test SED4
test SED5 ==Image Gallery== <gallery>
Image: plane1.jpg | plane1 <sup> 1
</sup> Image:plane2.jpg | plane2 <sup>2</sup>
</gallery> teste SED6
test SED7

With this :

Code:
sed -e '/==Image Gallery==.*<gallery>/ { :k s/<gallery.*[^gallery>]*\/gallery>//g; /</ {N; bk } }' file

I got this:

Code:
test SED1

==Image Gallery==
<gallery>
Image:car1.jpg| Car 1<sup>1</sup>
Imagem: car2.jpg| Car2<sup>2</sup>
</gallery> test SED2

==Image Gallery==test SED3
test SED4
test SED5 ==Image Gallery==  teste SED6
testSED7

The result should be:

Code:
test SED1

test SED2

test SED3
test SED4
test SED5  test SED6
test SED7

Smilie What am i missing ?
Moderator's Comments:
Mod Comment Sample code belongs in CODE tags; not B tags.

Last edited by dperboni; 10-26-2014 at 04:31 AM.. Reason: Add CODE and ICODE tags; change B tags to CODE tags.
# 2  
Old 10-26-2014
... if perl is an option:
Code:
perl -0777 -pe 's/==Image Gallery==.?<gallery>.*?<\/gallery>\s?//gs' file

# 3  
Old 10-26-2014
Or using mawk or GNU awk:
Code:
awk 'NR%2' RS='==Image Gallery==|</gallery>' ORS= file

---
One problem with your sed attempt is this: [^gallery>] . You can only use this negation for single characters, not for strings. The construct used here effectively means: a single character that is not g, a, l, e, r, y or > .

So you cannot force lazy matching this way.

Last edited by Scrutinizer; 10-26-2014 at 05:55 AM..
# 4  
Old 10-26-2014
How about
Code:
sed -n 'H;g;s#==*Image.*</gallery>##g;h; $p' file

test SED1

 test SED2

test SED3
test SED4
test SED5  teste SED6
test SED7

# 5  
Old 10-26-2014
That works fine for the sample, but there would not be lazy matching, so if the pattern would appear twice on a line, this would not work for text between multiple patterns on the same line.

An alternative would be to use an arbitrary replacement character (for example §), something like:
Code:
sed '1h;1!H;$!d;g;s#</gallery>#§#g;s#==Image Gallery==[^§]*§##g' file

--
In jethrow's perl suggestion lazy matching is accomplished by the lazy matching operator ?: .*?

Last edited by Scrutinizer; 10-26-2014 at 08:35 AM..
This User Gave Thanks to Scrutinizer For This Post:
# 6  
Old 11-07-2014
Several solutions are fine, but there is a problem with the test file that contemplates only a snippet of what needs to be checked, being necessary to expand the test file for a sql insert wich has several words and many parts like the above file and there is still over the characters '\n' in the middle of the file that complicates further as it is a sql dump of an entire table of mediawiki.
But i managed to solve the initial problem using sed and perl.
Code:
sed 's/==.\{0,2\}Gallery.\{0,12\}==//g' $arqOriginal > $arqSed1
perl -0777 -pe 's/<gallery>.*?<\/gallery>\s?//gs' $arqSed1 > $arqSed2

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Multiline sed

Hi guys, I am fairly comfortable with using the sed command if the string to be replaced is all on a single line. I was wondering is it possible to use sed command in a multiline way ? Say for example I have the below string on 2 different lines: { "key": "brandNameA", ... (3 Replies)
Discussion started by: Junaid Subhani
3 Replies

2. Shell Programming and Scripting

Help - delete content inside square brackets under conditions

I have the file sed1.txt and I need to strip the brackets (]) and content inside them only when I have two or three letters followed by a colon. for example,it may be any letter, not just abc ] ] #-- cat sed1.txt 1 ] FISICA 2 ]PORTUGUES 3 ] ]MATEMATICA 4 ]]INGLES ] 5 ]QUIMICA 6... (2 Replies)
Discussion started by: dperboni
2 Replies

3. UNIX for Dummies Questions & Answers

Need Multiline sed help!!

Hey everyone, I'm new to sed and I need to create a script for inserting one line of code at the beginning of every method in a Xcode project (over 6,000 methods). Each method Structure is (+ or -) (Various declarations-- could span multiple lines) ({) I've tried for days, any guidance would be... (2 Replies)
Discussion started by: jimmyz
2 Replies

4. Shell Programming and Scripting

Change/Delete Multiline text

Hi, I want to change/delete all occurences of a multiline text in a file which match the specific pattern: aaa <This text should be changed>bbb ccc ddddddd eee<This text should be changed> fff gggggg hhh<This text should be deleted> iii jjjj kkkkk<This text should be... (1 Reply)
Discussion started by: wenclu
1 Replies

5. Shell Programming and Scripting

help with sed needed to extract content from html tags

Hi I've searched for it for few hours now and i can't seem to find anything working like i want. I've got webpage, saved in file par with form like this: <html><body><form name='sendme' action='http://example.com/' method='POST'> <textarea name='1st'>abc123def678</textarea> <textarea... (9 Replies)
Discussion started by: seb001
9 Replies

6. Shell Programming and Scripting

How can i delete the content between all the occurences of two strings using sed or awk command

Hi. I have to delete the content between all the occurrences of the xml tags in a single file. For example: * The tags <script>.....................</script> occurs more than once in the same file. * It follows tagging rules meaning a start tag will be followed by an end tag. Will not have... (9 Replies)
Discussion started by: satheeshkumar
9 Replies

7. Shell Programming and Scripting

awk/sed/perl command to delete specific pattern and content above it...

Hi, Below is my input file: Data: 1 Length: 20 Got result. Data: 2 Length: 30 No result. Data: 3 Length: 20 (7 Replies)
Discussion started by: edge_diners
7 Replies

8. Shell Programming and Scripting

SED Delete all between 2 tags ...

Hi @all, i need an SED-command which deletes all signs in an xml-file between the </text> and the following <text> - tags. sed '/</text>/,/<text>/d' doesn`t work! regards alex (3 Replies)
Discussion started by: alexander_
3 Replies

9. Shell Programming and Scripting

delete multiline string from file using sed.

Hi, I have file which has the following content... GOOD MORNING **********WARNING********** when it kicks from the kickstart, sshd daemon should start at last. (WHEN KICKING ITSELF, NOT AFTER KICKING). /etc/rc3.d/S55sshd ( run level specification for sshd is 55, now I would want to... (4 Replies)
Discussion started by: skmdu
4 Replies

10. Shell Programming and Scripting

how to delete content in a file (delete content only)

Hi Friends I have a file called processLog.txt file processLog.txt --------------- echo "line starts "$LINE suppCode=${LINE:0:3} #gatewayArchive=`scp root@mrp-gateway:/usr/local/apache/data/PLAT/MIMUS/upload/PROD/archive/112042708173000.txt /home/krishnaveni/scripts/tempFolder` #echo... (5 Replies)
Discussion started by: kittusri9
5 Replies
Login or Register to Ask a Question