![]() |
Hello and Welcome from United States to the UNIX and Linux Forums! Thank You for Visiting and Joining Our Global Community.
|
|
google unix.com
|
|||||||
| Forums | Register | Forum Rules | Links | Albums | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here. |
More UNIX and Linux Forum Topics You Might Find Helpful
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Remove html tags with bash | dejavu88 | Shell Programming and Scripting | 4 | 05-22-2008 01:58 PM |
| How to supplement HTML tags with SED | DocBrewer | Shell Programming and Scripting | 3 | 04-25-2008 09:40 AM |
| How to remove only html tags inside a file? | btech_raju | Linux | 2 | 11-23-2007 12:25 PM |
| Automated replacement of HTML Tags | nem_kirk | SUN Solaris | 1 | 11-17-2005 01:24 AM |
| unsing sed to strip html tags - help | zap | Shell Programming and Scripting | 3 | 04-18-2004 04:03 AM |
![]() |
|
|
LinkBack | Thread Tools | Search this Thread | Rate Thread | Display Modes |
|
||||
|
html tags
hi new to the forum so hi every one hope you all well,
Iam attempting to write a bash script at the moment its a scraper/grabber using wget to download webpages related to the users query. that part is no probs when i have the page i need to stipr all the useless (to me) data out of the html source ie :- Quote:
as you can seen from the above the data i need to grab is from between the new tags these are always on the source what ever the uses query. Can anyone help or point me in the correct direction any help would be greatly appreciated thanks for listening dunryc |
|
||||
|
Have you considered XMLStarlet Command Line XML Toolkit: Overview
|
|
||||
|
Quote:
Code:
Example <new>This is the text to catch</new> <new> This is some text to catch</new> Code:
sed -n 's/.*<new>\(.*\)<\/new>.*/\1/p'
blabla <new>text to match</new> blabla
sed -n '/<new>/,/<\/new>/ {
s/.*<new>//
s/<\/new>.*//
/^$/d
p
}'
blabla <new>text
to
match</new> blabla
|
|
||||
|
thanks for the pointers guys , i did have a look at XMLStarlet to grab the data and it works great but i wanted to use tools that would be present in most distros the commands that bakunin work great once again thanks for the help
|
| Sponsored Links | ||
|
|
![]() |
| Bookmarks |
| Tags |
| regex, regular expressions |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|