Multiline html tag parse shell script


 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers Multiline html tag parse shell script
# 1  
Old 10-24-2019
Multiline html tag parse shell script

Hello,

I want to parse the contents of a multiline html tag

ex:
Code:
<html>
  <body>
    <p>some other text</p>
    <div>
      <p class="margin-bottom-0">
        text1
        <br>
        text2
        <br>
        <br>
        text3
      </p>
    </div>
  </body>
</html>

and i want the output to be:
Code:
text1
text2
text3


i tried with grep and sed combination, awk but i couldn't figure out the formula.

Thanks!
# 2  
Old 10-24-2019
Show your grep, sed, and awk attempts.
# 3  
Old 10-24-2019
i have just the most recent attempt with awk, i don't remember what i tried with sed

Code:
echo $siteSource | awk 'f{ if (/<\/p>/){printf "%s", buf; f=0; buf=""} else buf = buf $0 ORS}; /<p class="margin-bottom-0">/{f=1}'

which i thought would show at least:
Code:
text1
<br>
text2
<br>
<br>
text3


would be a half solution since it would also show the <br>s but it's not working at all.
# 4  
Old 10-24-2019
Pls be aware that there are better suited, taylored tools out there when it comes to analysing / handling HTML data. How far would

Code:
sed -n '1h; 1!H; ${x; s/ *<[^>]*>\n* *//g; p;}' file
some other texttext1
text2
text3

(as a starter) get you?
# 5  
Old 10-24-2019
not good enough since the some other text in my situation is much more i just simplified it in the example.

I want to get at least what is between <p class="margin-bottom-0"> and </p>
so that the output would be:
Code:
text1
<br>
text2
<br>
<br>
text3

I know that there are better tools, but i started out with a simple shell script that grew in time,
and i got everything that i need... this is the last remaining item that i could not parse.

Thanks.
# 6  
Old 10-24-2019
Another acceptable solution would be to get the next 5 rows in the code after finding <p class="margin-bottom-0">
i can process that result after

I tried this but did not worked
Code:
echo $siteSource | grep -A 5 -Eoi '<p class="margin-bottom-0">[^>]+<'

# 7  
Old 10-24-2019
Quote:
Originally Posted by SorcRR
not good enough since the some other text in my situation is much more i just simplified it in the example.

I want to get at least what is between <p class="margin-bottom-0"> and </p>
so that the output would be:
Code:
text1
<br>
text2
<br>
<br>
text3

I know that there are better tools, but i started out with a simple shell script that grew in time,
and i got everything that i need... this is the last remaining item that i could not parse.

Thanks.
Code:
$ sed -n '/<p class="margin-bottom-0">/,/<\/p>/p' myFile
      <p class="margin-bottom-0">
        text1
        <br>
        text2
        <br>
        <br>
        text3
      </p>

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to remove html tag which has multiple lines in SHELL?

I want to clean a html file. I try to remove the script part in the html and remove the rest of tags and empty lines. The code I try to use is the following: sed '/<script/,/<\/script>/d' webpage.html | sed -e 's/<*>//g' | sed '/^\s*$/d' > output.txt However, in this method, I can not... (10 Replies)
Discussion started by: YuhuiFeng
10 Replies

2. Shell Programming and Scripting

XML Parse between to tag with upper tag

Hi Guys Here is my Input : <?xml version="1.0" encoding="UTF-8"?> <xn:MeContext id="01736"> <xn:VsDataContainer id="01736"> <xn:attributes> <xn:vsDataType>vsDataMeContext</xn:vsDataType> ... (12 Replies)
Discussion started by: pareshkp
12 Replies

3. Shell Programming and Scripting

Using shell command need to parse multiple nested tag value of a XML file

I have this XML file - <gp> <mms>1110012</mms> <tg>988</tg> <mm>LongTime</mm> <lv> <lkid>StartEle=ONE, Desti = Motion</lkid> <kk>12</kk> </lv> <lv> <lkid>StartEle=ONE, Source = Velocity</lkid> <kk>2</kk> </lv> <lv> ... (3 Replies)
Discussion started by: NeedASolution
3 Replies

4. Shell Programming and Scripting

Search for a html tag and print the entire tag

I want to print from <fruits> to </fruits> tag which have <fruit> as mango. Also i want both <fruits> and </fruits> in output. Please help eg. <fruits> <fruit id="111">mango<fruit> . another 20 lines . </fruits> (3 Replies)
Discussion started by: Ashik409
3 Replies

5. Shell Programming and Scripting

awk Script to parse a XML tag

I have an XML tag like this: <property name="agent" value="/var/tmp/root/eclipse" /> Is there way using awk that i can get the value from the above tag. So the output should be: /var/tmp/root/eclipse Help will be appreciated. Regards, Adi (6 Replies)
Discussion started by: asirohi
6 Replies

6. Shell Programming and Scripting

Script to delete HTML tag

Guys, I have a little script that I got of the internet and that I use in Squid to block ads. I used that script with linux but now i have moved my servers to freebsd. I have a step learning curve there but it is fun: Back to the script issue. The script used to work i with linux but... (15 Replies)
Discussion started by: zongo
15 Replies

7. Shell Programming and Scripting

Parse HTML tag parameters and text

Hi! I have a bunch of HTML files, which I want to parse to CSV files. Every page has a table in it, and I need to parse each row into a csv record. With awk and sed, I managed to put every table row in separate lines. So my file looks like this: <TR> .... </TR> <TR> .... </TR> ...One... (1 Reply)
Discussion started by: senszey
1 Replies

8. UNIX for Advanced & Expert Users

shell script to parse html file

hi all, i have a html file something similar to this. <tr class="evenrow"> <td class="data">added</td><td class="data">xyz@abc.com</td> <td class="data">filename.sql</td><td class="modifications-data">08/25/2009 07:58:40</td><td class="data">Added TK prof script</td> </tr> <tr... (1 Reply)
Discussion started by: sais
1 Replies

9. Shell Programming and Scripting

how to use html tag in shell scripting

Hai friends I have a small doubt.. how can we use html tag in shell scripting code : echo "<html>" echo "<body>" echo " welcome to peace world " echo "</body>" echo "</html>" output displayed like this: <html> <body> welcome to peace world </body> </html> (5 Replies)
Discussion started by: jrex1983
5 Replies

10. UNIX for Dummies Questions & Answers

How do I extract text only from html file without HTML tag

I have a html file called myfile. If I simply put "cat myfile.html" in UNIX, it shows all the html tags like <a href=r/26><img src="http://www>. But I want to extract only text part. Same problem happens in "type" command in MS-DOS. I know you can do it by opening it in Internet Explorer,... (4 Replies)
Discussion started by: los111
4 Replies
Login or Register to Ask a Question

Featured Tech Videos