![]() |
|
|
google unix.com
|
|||||||
| Forums | Register | Forum Rules | Links | Albums | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here. |
More UNIX and Linux Forum Topics You Might Find Helpful
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Remove html tags with bash | dejavu88 | Shell Programming and Scripting | 4 | 05-22-2008 02:58 PM |
| html tags | dunryc | Shell Programming and Scripting | 3 | 11-29-2007 06:14 PM |
| How to remove only html tags inside a file? | btech_raju | Linux | 2 | 11-23-2007 12:25 PM |
| Automated replacement of HTML Tags | nem_kirk | SUN Solaris | 1 | 11-17-2005 01:24 AM |
| unsing sed to strip html tags - help | zap | Shell Programming and Scripting | 3 | 04-18-2004 05:03 AM |
![]() |
|
|
LinkBack | Thread Tools | Search this Thread | Rate Thread | Display Modes |
|
|
|
||||
|
How to supplement HTML tags with SED
I am cleaning up HTML with sed. With the regexp
<a name="[A-Za-z0-9 ?_.]+"></a><h[123]>[ ]*<span class="mw-headline" >[A-Za-z0-9 ?_.]+</span></h[123]> I can find the tags I need. But when I place them in a sed command, sed fails. So I started building up from a smaller command. This is where I am now: sed -r -e s/"<a name=\"/replacement/ <in >out This works. But when I enter: sed -r -e s/"<a name=\"[A-Za-z0-9 ?_.]+"/replacement/ <in >out it fails with: sed: can't read <in: Invalid argument sed: can't read >out: Invalid argument But the in file is really there. How can I get the regexp in the sed command? I have tried escaping/not escaping chars, but sed does not seem to accept it. |
|
||||
|
From a tag like this:
<a name="Introduction"></a><h1><span class="mw-headline" >Introduction</span></h1> I'd like to make: <a name="Introduction"></a><h1><span class="mw-headline" id="Introduction" >Introduction</span></h1> Therefore I do the following replacement: Match: <a name="([A-Za-z0-9 ?_.]+)"></a><h([123])>[^mw]*mw-headline" >([A-Za-z0-9 ?_.]+)</span></h[123]> And replace it with: <a name="\1"></a><h\2><span class="mw-headline" id="\1" >\3</span></h\2> This works when using a find and replace editor which accepts regex. But I can't seem to fit it in one sed command. |
![]() |
| Bookmarks |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|