html tags


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting html tags
# 1  
Old 11-28-2007
html tags

hi new to the forum so hi every one hope you all well,

Iam attempting to write a bash script at the moment its a scraper/grabber using wget to download webpages related to the users query. that part is no probs when i have the page i need to stipr all the useless (to me) data out of the html source ie :-

Quote:

<html>
test test test
<tag>test ttest </tag>
<new>
this is the data i want to grab between the new tags
</new>
<html>

as you can seen from the above the data i need to grab is from between the new tags these are always on the source what ever the uses query. Can anyone help or point me in the correct direction any help would be greatly appreciated thanks for listening dunryc
# 2  
Old 11-28-2007
# 3  
Old 11-28-2007
Quote:
Originally Posted by dunryc
the data i need to grab is from between the new tags these are always on the source what ever the uses query.
There are two different cases to be considered: the starting and ending tags are on the same line or they are on different lines:

Code:
Example

<new>This is the text to catch</new>

<new>
This is some text
to catch</new>

Both can be matched by simple regular expressions. For each regexp i give the matched portion in blue:

Code:
sed -n 's/.*<new>\(.*\)<\/new>.*/\1/p'

blabla <new>text to match</new> blabla

sed -n '/<new>/,/<\/new>/ {
               s/.*<new>//
               s/<\/new>.*//
               /^$/d
               p
               }'

blabla <new>text
to
match</new> blabla

bakunin
# 4  
Old 11-29-2007
thanks for the pointers guys , i did have a look at XMLStarlet to grab the data and it works great but i wanted to use tools that would be present in most distros the commands that bakunin work great once again thanks for the help
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Print content between two html tags

Hi Expert, Is there any other way to print and write to a same filename the content between two html tags? Here the sample: cat file.html <div id="outline"> hello world<br> </div> <div id="container_faq"> test1<br> </div> <div class="widget_quick"> thead test<br> </div> ... (3 Replies)
Discussion started by: lxdorney
3 Replies

2. Shell Programming and Scripting

Removing all except couple of html tags from html file

I tried to find elegant (or at least simple) way to remove all but couple of html tags from html file, but all examples I found dealt with removing all the tags. The logic of the script would be: - if there is <li> or <ul> on the line, do nothing (=write same line to output) - if there is:... (0 Replies)
Discussion started by: juubuntu
0 Replies

3. Shell Programming and Scripting

Removing html tags

I store different variance of the below in an xml file. and apparently, xml has an issue loading up data like this because it contains html tags. i would like to preserve this data as it is, but unfortunately, xml says i cant. so i have to strip out all the html tags. the examples i found... (9 Replies)
Discussion started by: SkySmart
9 Replies

4. Shell Programming and Scripting

Remove html tags with particular string inside the tags

Could someone, please provide a solution to the following: I would like to remove some tags from the "head" of multiple html documents across the web site. They look like <link rel="alternate" type="application/rss+xml" title="Business and Investment in the Philippines"... (2 Replies)
Discussion started by: georgi58
2 Replies

5. Shell Programming and Scripting

Parsing HTML, get text between 2 HTML tags

Hi there, I'm quite new to the forum and shell scripting. I want to filter out the "166.0 points". The results, that i found in google / the forum search didn't helped me :( <a href="/user/test" class="headitem menu" style="color:rgb(83,186,224);">test</a><a href="/points" class="headitem... (1 Reply)
Discussion started by: Mysthik
1 Replies

6. UNIX for Advanced & Expert Users

Removing HTML tags

Hello Unix Gurus I am having a problem with one of the files that i am generating using a Unix Script. This Unix Scripts connects to the MY SQL Server and loads the data into a Text file. While generating the Text file for one of the tables the value in one of the column is as follows. <p>... (3 Replies)
Discussion started by: chetan.mudike
3 Replies

7. Shell Programming and Scripting

sed - striping out html tags

I have pasted the contents of a log file (swmbackup.wrkstn.1262071383.sales2a) below: Workstation: sales2a<BR Vault sales2a-hogwarts will be initialized.<BR <font color="red"There was a problem mounting /mnt/sales2a/desktop$ </FONT<BR <font color="red"There was a problem mounting... (4 Replies)
Discussion started by: bigtonydallas
4 Replies

8. Shell Programming and Scripting

Curl getting html tags

I am making a script in which i wana use curl to download a web page and check status.But problem is when i use curl in linux command line it downlaod htlm tags.How can we ignore these tage any idea. (2 Replies)
Discussion started by: aliahsan81
2 Replies

9. Shell Programming and Scripting

Remove html tags with bash

Hello, is there a way to go through a file and remove certain html tags with bash? If it needs sed or awk, that'll do too. The reason why I want this is, because I have a monitor script which generates a logfile in HTML and every time it generates a logfile, the tags are reproduced. The tags... (4 Replies)
Discussion started by: dejavu88
4 Replies

10. Solaris

Automated replacement of HTML Tags

Hi All, I use a utility to generate a xml file....which looks something as follows <xml> <name>some name</name> <value>some value</value> <machine>rocker</machine> </xml> I would like to run a KSH script which will replace this machine tag value 'rocker' to say 'docker'. I would like... (1 Reply)
Discussion started by: nem_kirk
1 Replies
Login or Register to Ask a Question