Sponsored Content
Top Forums Shell Programming and Scripting extracting Line between HTML tag Post 302603785 by Scrutinizer on Friday 2nd of March 2012 03:12:10 AM
Old 03-02-2012
Quote:
Originally Posted by newlook2011
1st Thanks to huaihaizi3 ,agama[..]Between can you care to explain code. I am hitting man awk, could not find appropriate answers.
-F\>Use > as a field separator.
/^tag>/if a record starts with "tag" followed by > then
{print $2}print the second field of the record. Since the field separator is set to > $1 will be the tag and $2 will be the content
RS=\<Use < as record separator instead of a newline

---------- Post updated at 09:12 ---------- Previous update was at 09:06 ----------

They can be slightly improved still:
Code:
awk '$1==t{print $2}' RS=\< FS=\> t="tag" infile

removing newlines:
Varying tag:
Code:
awk '$1==t{gsub(ORS,x);print $2}' RS=\< FS=\> t="tag" infile

 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

How do I extract text only from html file without HTML tag

I have a html file called myfile. If I simply put "cat myfile.html" in UNIX, it shows all the html tags like <a href=r/26><img src="http://www>. But I want to extract only text part. Same problem happens in "type" command in MS-DOS. I know you can do it by opening it in Internet Explorer,... (4 Replies)
Discussion started by: los111
4 Replies

2. Shell Programming and Scripting

how to use html tag in shell scripting

Hai friends I have a small doubt.. how can we use html tag in shell scripting code : echo "<html>" echo "<body>" echo " welcome to peace world " echo "</body>" echo "</html>" output displayed like this: <html> <body> welcome to peace world </body> </html> (5 Replies)
Discussion started by: jrex1983
5 Replies

3. Shell Programming and Scripting

How can i delete html attributes from tag ?

Input: <table class="pixelBorderTable faqTable" width="100%" border="1" cellpadding="3" cellspacing="0"> <tbody><tr> <td class="pixelBorderTableHeaderTd" valign="top" width="20%" bgcolor="#666666"><p>&nbsp;</p></td> <td class="pixelBorderTableHeaderTd" valign="top"... (1 Reply)
Discussion started by: cola
1 Replies

4. Shell Programming and Scripting

Script to delete HTML tag

Guys, I have a little script that I got of the internet and that I use in Squid to block ads. I used that script with linux but now i have moved my servers to freebsd. I have a step learning curve there but it is fun: Back to the script issue. The script used to work i with linux but... (15 Replies)
Discussion started by: zongo
15 Replies

5. Shell Programming and Scripting

How to retrieve the value from XML tag whose end tag is in next line

Hi All, Find the following code: <Universal>D38x82j1JJ </Universal> I want to retrieve the value of <Universal> tag as below: Please help me. (3 Replies)
Discussion started by: mjavalkar
3 Replies

6. Shell Programming and Scripting

Add the html tag first and last line the file

Hi, i have 30 html files and i want to add the html tag first (<html>) and end of the line </html> tag..How to do it in script. Thanks, (7 Replies)
Discussion started by: bmk
7 Replies

7. Shell Programming and Scripting

Extracting a string from html tag

Hi I am new to string extractions in shell script... I am trying to extract a string such as #1753 from html tag looks like below. <a class="model-link tl-tr" href="lastSuccessfulBuild/">Last successful build (#1753), 40 min ago</a> and want the value as 1753 Could someone help me to... (3 Replies)
Discussion started by: hicharbo
3 Replies

8. Shell Programming and Scripting

Search for a html tag and print the entire tag

I want to print from <fruits> to </fruits> tag which have <fruit> as mango. Also i want both <fruits> and </fruits> in output. Please help eg. <fruits> <fruit id="111">mango<fruit> . another 20 lines . </fruits> (3 Replies)
Discussion started by: Ashik409
3 Replies

9. Shell Programming and Scripting

Print Value between desired html tag

Hi, I have a html line as below :-... (6 Replies)
Discussion started by: satishmallidi
6 Replies

10. Shell Programming and Scripting

Extracting data between two tag pairs

In a huge log file (43MB, 43k lines) I am trying to extract data between two tag pairs on same line and export it to a file so I can pull it into Excel for a report. One Pair is <Text>data I need</Text> Other pair follows on same line and is <TimeStamp>more data I need</TimeStamp> I would need... (2 Replies)
Discussion started by: NanookArctic
2 Replies
HTML::RewriteAttributes(3pm)				User Contributed Perl Documentation			      HTML::RewriteAttributes(3pm)

NAME
HTML::RewriteAttributes - concise attribute rewriting SYNOPSIS
$html = HTML::RewriteAttributes->rewrite($html, sub { my ($tag, $attr, $value) = @_; # delete any attribute that mentions.. return if $value =~ /COBOL/i; $value =~ s/rocks/rules/g; return $value; }); # writing some HTML email I see.. $html = HTML::RewriteAttributes::Resources->rewrite($html, sub { my $uri = shift; my $content = render_template($uri); my $cid = generate_cid_from($content); $mime->attach($cid => content); return "cid:$cid"; }); # up for some HTML::ResolveLink? $html = HTML::RewriteAttributes::Links->rewrite($html, "http://search.cpan.org"); # or perhaps HTML::LinkExtor? HTML::RewriteAttributes::Links->rewrite($html, sub { my ($tag, $attr, $value) = @_; push @links, $value; $value; }); DESCRIPTION
"HTML::RewriteAttributes" is designed for simple yet powerful HTML attribute rewriting. You simply specify a callback to run for each attribute and we do the rest for you. This module is designed to be subclassable to make handling special cases eaiser. See the source for methods you can override. METHODS
"new" You don't need to call "new" explicitly - it's done in "rewrite". It takes no arguments. "rewrite" HTML, callback -> HTML This is the main interface of the module. You pass in some HTML and a callback, the callback is invoked potentially many times, and you get back some similar HTML. The callback receives as arguments the tag name, the attribute name, and the attribute value (though subclasses may override this -- HTML::RewriteAttributes::Resources does). Return "undef" to remove the attribute, or any other value to set the value of the attribute. SEE ALSO
HTML::Parser, HTML::ResolveLink, Email::MIME::CreateHTML, HTML::LinkExtor THANKS
Some code was inspired by, and tests borrowed from, Miyagawa's HTML::ResolveLink. AUTHOR
Shawn M Moore, "<sartak@bestpractical.com>" LICENSE
Copyright 2008-2010 Best Practical Solutions, LLC. HTML::RewriteAttributes is distributed under the same terms as Perl itself. perl v5.10.1 2010-11-18 HTML::RewriteAttributes(3pm)
All times are GMT -4. The time now is 10:19 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy