![]() |
|
|
|
|
|||||||
| Forums | Portal | Register | Forum Rules | FAQ | Contribute | Members List | Arcade | Search | Today's Posts | Mark Forums Read |
| Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts here. |
|
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Remove html tags with bash | dejavu88 | Shell Programming and Scripting | 4 | 05-22-2008 10:58 AM |
| html tags | dunryc | Shell Programming and Scripting | 3 | 11-29-2007 03:14 PM |
| How to remove only html tags inside a file? | btech_raju | Linux | 2 | 11-23-2007 09:25 AM |
| Automated replacement of HTML Tags | nem_kirk | SUN Solaris | 1 | 11-16-2005 10:24 PM |
| unsing sed to strip html tags - help | zap | Shell Programming and Scripting | 3 | 04-18-2004 01:03 AM |
|
|
Submit Tools | LinkBack | Thread Tools | Search this Thread | Display Modes |
|
#1
|
|||
|
|||
|
How to supplement HTML tags with SED
I am cleaning up HTML with sed. With the regexp
<a name="[A-Za-z0-9 ?_.]+"></a><h[123]>[ ]*<span class="mw-headline" >[A-Za-z0-9 ?_.]+</span></h[123]> I can find the tags I need. But when I place them in a sed command, sed fails. So I started building up from a smaller command. This is where I am now: sed -r -e s/"<a name=\"/replacement/ <in >out This works. But when I enter: sed -r -e s/"<a name=\"[A-Za-z0-9 ?_.]+"/replacement/ <in >out it fails with: sed: can't read <in: Invalid argument sed: can't read >out: Invalid argument But the in file is really there. How can I get the regexp in the sed command? I have tried escaping/not escaping chars, but sed does not seem to accept it. |
| Forum Sponsor | ||
|
|
|
#2
|
|||
|
|||
|
Can you provide the ouput you desire?
Regards |
|
#3
|
|||
|
|||
|
From a tag like this:
<a name="Introduction"></a><h1><span class="mw-headline" >Introduction</span></h1> I'd like to make: <a name="Introduction"></a><h1><span class="mw-headline" id="Introduction" >Introduction</span></h1> Therefore I do the following replacement: Match: <a name="([A-Za-z0-9 ?_.]+)"></a><h([123])>[^mw]*mw-headline" >([A-Za-z0-9 ?_.]+)</span></h[123]> And replace it with: <a name="\1"></a><h\2><span class="mw-headline" id="\1" >\3</span></h\2> This works when using a find and replace editor which accepts regex. But I can't seem to fit it in one sed command. |
|
#4
|
|||
|
|||
|
Something like:
Code:
echo '<a name="Introduction"></a><h1><span class="mw-headline" >Introduction</span></h1>'| sed 's/\(.*"\)\(.*\)/\1 id="Introduction" \2/' |
|||
| Google The UNIX and Linux Forums |