Multiline html tag parse shell script Post: 303040265

Sponsored Content

Top Forums UNIX for Beginners Questions & Answers Multiline html tag parse shell script Post 303040265 by SorcRR on Friday 25th of October 2019 05:42:14 PM

10-25-2019

Registered User

RudiC, thanks, that works just great if i have a file with the html code, but i store the html code in variable:

This works, but i store the html code in a variable, not a file:

Code:

text=$(sed -n '/<p class="margin-bottom-0">/,/<\/p/ {
            /<p.*\/p>/b
            s/ *<[^>]*> *//g
            /^$/d
            p
            }' htmlfile)

echo $text >> results

This is my final solution

Code:

siteSource=$(curl -L --connect-timeout 14 "$urls" 2> /dev/null)

text=$(printf "%s" "$siteSource" | sed -n '/<p class="margin-bottom-0">/,/<\/p/ {
            /<p.*\/p>/b
            s/ *<[^>]*> *//g
            /^$/d
            p
            }')

echo $text >> results

oh, and also i had to get rid of the semicolons because i had an error sed: 1: "/<p/,/<\/p/ {/<p.*\/p>/ ...": unexpected EOF (pending }'s)
and found that getting rid of the semicolons and using newline instead fixes this error.

Thanks everyone for the help.

--- Post updated at 09:42 PM ---

stomp, i like your solution too, looks very clean unfortunately xmlstarlet is very picky,
in my real life problem it's not just <br>-s that needs to be transformed to be compliant and would be overkill to check and transform the whole html page for xmlstarlet
But glad that you showed me this, i might use it somewhere else.

Thanks!

Last edited by SorcRR; 10-25-2019 at 07:11 PM..

SorcRR

View Public Profile for SorcRR

Find all posts by SorcRR

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

How do I extract text only from html file without HTML tag

I have a html file called myfile. If I simply put "cat myfile.html" in UNIX, it shows all the html tags like <a href=r/26><img src="http://www>. But I want to extract only text part. Same problem happens in "type" command in MS-DOS. I know you can do it by opening it in Internet Explorer,...

2. Shell Programming and Scripting

how to use html tag in shell scripting

Hai friends I have a small doubt.. how can we use html tag in shell scripting code : echo "<html>" echo "<body>" echo " welcome to peace world " echo "</body>" echo "</html>" output displayed like this: <html> <body> welcome to peace world </body> </html>

3. UNIX for Advanced & Expert Users

shell script to parse html file

hi all, i have a html file something similar to this. <tr class="evenrow"> <td class="data">added</td><td class="data">xyz@abc.com</td> <td class="data">filename.sql</td><td class="modifications-data">08/25/2009 07:58:40</td><td class="data">Added TK prof script</td> </tr> <tr...

4. Shell Programming and Scripting

Parse HTML tag parameters and text

Hi! I have a bunch of HTML files, which I want to parse to CSV files. Every page has a table in it, and I need to parse each row into a csv record. With awk and sed, I managed to put every table row in separate lines. So my file looks like this: <TR> .... </TR> <TR> .... </TR> ...One...

5. Shell Programming and Scripting

Script to delete HTML tag

Guys, I have a little script that I got of the internet and that I use in Squid to block ads. I used that script with linux but now i have moved my servers to freebsd. I have a step learning curve there but it is fun: Back to the script issue. The script used to work i with linux but...

6. Shell Programming and Scripting

awk Script to parse a XML tag

I have an XML tag like this: <property name="agent" value="/var/tmp/root/eclipse" /> Is there way using awk that i can get the value from the above tag. So the output should be: /var/tmp/root/eclipse Help will be appreciated. Regards, Adi

7. Shell Programming and Scripting

Search for a html tag and print the entire tag

I want to print from <fruits> to </fruits> tag which have <fruit> as mango. Also i want both <fruits> and </fruits> in output. Please help eg. <fruits> <fruit id="111">mango<fruit> . another 20 lines . </fruits>

8. Shell Programming and Scripting

Using shell command need to parse multiple nested tag value of a XML file

I have this XML file - <gp> <mms>1110012</mms> <tg>988</tg> <mm>LongTime</mm> <lv> <lkid>StartEle=ONE, Desti = Motion</lkid> <kk>12</kk> </lv> <lv> <lkid>StartEle=ONE, Source = Velocity</lkid> <kk>2</kk> </lv> <lv> ...

9. Shell Programming and Scripting

XML Parse between to tag with upper tag

Hi Guys Here is my Input : <?xml version="1.0" encoding="UTF-8"?> <xn:MeContext id="01736"> <xn:VsDataContainer id="01736"> <xn:attributes> <xn:vsDataType>vsDataMeContext</xn:vsDataType> ...

10. Shell Programming and Scripting

How to remove html tag which has multiple lines in SHELL?

I want to clean a html file. I try to remove the script part in the html and remove the rest of tags and empty lines. The code I try to use is the following: sed '/<script/,/<\/script>/d' webpage.html | sed -e 's/<*>//g' | sed '/^\s*$/d' > output.txt However, in this method, I can not...

LEARN ABOUT DEBIAN

mojo::dom::html

Mojo::DOM::HTML(3pm)					User Contributed Perl Documentation				      Mojo::DOM::HTML(3pm)

NAME

       Mojo::DOM::HTML - HTML5/XML engine

SYNOPSIS

	 use Mojo::DOM::HTML;

	 # Turn HTML5 into DOM tree
	 my $html = Mojo::DOM::HTML->new;
	 $html->parse('<div><p id="a">A</p><p id="b">B</p></div>');
	 my $tree = $html->tree;

DESCRIPTION

       Mojo::DOM::HTML is the HTML5/XML engine used by Mojo::DOM.

ATTRIBUTES

       Mojo::DOM::HTML implements the following attributes.

   "charset"
	 my $charset = $html->charset;
	 $html	     = $html->charset('UTF-8');

       Charset used for decoding and encoding HTML5/XML.

   "tree"
	 my $tree = $html->tree;
	 $html	  = $html->tree(['root', [qw(text lalala)]]);

       Document Object Model.

   "xml"
	 my $xml = $html->xml;
	 $html	 = $html->xml(1);

       Disable HTML5 semantics in parser and activate case sensitivity, defaults to auto detection based on processing instructions.

METHODS

       Mojo::DOM::HTML inherits all methods from Mojo::Base and implements the following new ones.

   "parse"
	 $html = $html->parse('<foo bar="baz">test</foo>');

       Parse HTML5/XML document.

   "render"
	 my $xml = $html->render;

       Render DOM to XML.

SEE ALSO

       Mojolicious, Mojolicious::Guides, <http://mojolicio.us>.

perl v5.14.2							    2012-09-05						      Mojo::DOM::HTML(3pm)

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

How do I extract text only from html file without HTML tag

Discussion started by: los111

2. Shell Programming and Scripting

how to use html tag in shell scripting

Discussion started by: jrex1983

3. UNIX for Advanced & Expert Users

shell script to parse html file

Discussion started by: sais

4. Shell Programming and Scripting

Parse HTML tag parameters and text

Discussion started by: senszey