extracting Line between HTML tag


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting extracting Line between HTML tag
# 1  
Old 03-01-2012
Question [solved]extracting Line between HTML tag[/solved]

Hi everyone:
I want to extract string which is in between certain html tag.
e.g.
Quote:
<tag>I_want_extract_this_line.com</tag>
I tried with grep,cut, awk but could not find exact syntax for this one. Smilie

PS>Sorry about bad english.

Last edited by newlook2011; 03-02-2012 at 12:08 AM.. Reason: solved
# 2  
Old 03-01-2012
Have a go with:

Code:
sed -n 's/.*<tag>//; T; s/<\/tag>.*//; T; p' input-file >output-file

This assumes both opening and closing tags do not have a newline between them.
# 3  
Old 03-01-2012
Or even with newlines:
Code:
awk -F\> '/^tag>/{print $2}' RS=\< infile

and if you also want to eliminate them:
Code:
awk -F\> '/^tag>/{gsub(ORS,x);print $2}' RS=\< infile

With varying tag:
Code:
awk -F\> '$0~"^"t">" {gsub(ORS,x);print $2}' RS=\< t="tag" infile

@agama note: T is GNU sed only

Last edited by Scrutinizer; 03-02-2012 at 12:12 AM..
This User Gave Thanks to Scrutinizer For This Post:
# 4  
Old 03-02-2012
Hi I Got
Quote:
invalid command code T
I tried man sed could not find
Quote:
T
Am I missing something?
# 5  
Old 03-02-2012
Code:
sed 's/<[^>]*>/ /g'

or
Code:
grep -Po '(?<=>)\w+(?=<)'

# 6  
Old 03-02-2012
1st Thanks to huaihaizi3 ,agama for quick responds.

Quote:
Originally Posted by Scrutinizer
Or even with newlines:
Code:
awk -F\> '/tag>/ && !/^\//{print $2}' RS=\< infile

and if you also want to eliminate them:
Code:
awk -F\> '/tag>/ && !/^\//{gsub(ORS,x);print $2}' RS=\< infile

@agama note: T is GNU sed only
Worked!!! I been trying to solve this issue for 2 hours but you did in 10 min.

Between can you care to explain code. I am hitting man awk, could not find appropriate answers.
This User Gave Thanks to newlook2011 For This Post:
# 7  
Old 03-02-2012
Note: I edited the code in my post...
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extracting data between two tag pairs

In a huge log file (43MB, 43k lines) I am trying to extract data between two tag pairs on same line and export it to a file so I can pull it into Excel for a report. One Pair is <Text>data I need</Text> Other pair follows on same line and is <TimeStamp>more data I need</TimeStamp> I would need... (2 Replies)
Discussion started by: NanookArctic
2 Replies

2. Shell Programming and Scripting

Print Value between desired html tag

Hi, I have a html line as below :-... (6 Replies)
Discussion started by: satishmallidi
6 Replies

3. Shell Programming and Scripting

Search for a html tag and print the entire tag

I want to print from <fruits> to </fruits> tag which have <fruit> as mango. Also i want both <fruits> and </fruits> in output. Please help eg. <fruits> <fruit id="111">mango<fruit> . another 20 lines . </fruits> (3 Replies)
Discussion started by: Ashik409
3 Replies

4. Shell Programming and Scripting

Extracting a string from html tag

Hi I am new to string extractions in shell script... I am trying to extract a string such as #1753 from html tag looks like below. <a class="model-link tl-tr" href="lastSuccessfulBuild/">Last successful build (#1753), 40 min ago</a> and want the value as 1753 Could someone help me to... (3 Replies)
Discussion started by: hicharbo
3 Replies

5. Shell Programming and Scripting

Add the html tag first and last line the file

Hi, i have 30 html files and i want to add the html tag first (<html>) and end of the line </html> tag..How to do it in script. Thanks, (7 Replies)
Discussion started by: bmk
7 Replies

6. Shell Programming and Scripting

How to retrieve the value from XML tag whose end tag is in next line

Hi All, Find the following code: <Universal>D38x82j1JJ </Universal> I want to retrieve the value of <Universal> tag as below: Please help me. (3 Replies)
Discussion started by: mjavalkar
3 Replies

7. Shell Programming and Scripting

Script to delete HTML tag

Guys, I have a little script that I got of the internet and that I use in Squid to block ads. I used that script with linux but now i have moved my servers to freebsd. I have a step learning curve there but it is fun: Back to the script issue. The script used to work i with linux but... (15 Replies)
Discussion started by: zongo
15 Replies

8. Shell Programming and Scripting

How can i delete html attributes from tag ?

Input: <table class="pixelBorderTable faqTable" width="100%" border="1" cellpadding="3" cellspacing="0"> <tbody><tr> <td class="pixelBorderTableHeaderTd" valign="top" width="20%" bgcolor="#666666"><p>&nbsp;</p></td> <td class="pixelBorderTableHeaderTd" valign="top"... (1 Reply)
Discussion started by: cola
1 Replies

9. Shell Programming and Scripting

how to use html tag in shell scripting

Hai friends I have a small doubt.. how can we use html tag in shell scripting code : echo "<html>" echo "<body>" echo " welcome to peace world " echo "</body>" echo "</html>" output displayed like this: <html> <body> welcome to peace world </body> </html> (5 Replies)
Discussion started by: jrex1983
5 Replies

10. UNIX for Dummies Questions & Answers

How do I extract text only from html file without HTML tag

I have a html file called myfile. If I simply put "cat myfile.html" in UNIX, it shows all the html tags like <a href=r/26><img src="http://www>. But I want to extract only text part. Same problem happens in "type" command in MS-DOS. I know you can do it by opening it in Internet Explorer,... (4 Replies)
Discussion started by: los111
4 Replies
Login or Register to Ask a Question