The UNIX and Linux Forums  


Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
.
google unix.com



Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Remove html tags with bash dejavu88 Shell Programming and Scripting 4 05-22-2008 02:58 PM
html tags dunryc Shell Programming and Scripting 3 11-29-2007 06:14 PM
How to remove only html tags inside a file? btech_raju Linux 2 11-23-2007 12:25 PM
Automated replacement of HTML Tags nem_kirk SUN Solaris 1 11-17-2005 01:24 AM
unsing sed to strip html tags - help zap Shell Programming and Scripting 3 04-18-2004 05:03 AM

Closed Thread
English Japanese Spanish French German Portuguese Italian Dutch Swedish Russian Norwegian Hungarian Hebrew Danish Bulgarian Greek Powered by Powered by Google
 
LinkBack Thread Tools Search this Thread Rate Thread Display Modes
  #1 (permalink)  
Old 04-25-2008
DocBrewer DocBrewer is offline
Registered User
  
 

Join Date: Apr 2008
Posts: 6
How to supplement HTML tags with SED

I am cleaning up HTML with sed. With the regexp

<a name="[A-Za-z0-9 ?_.]+"></a><h[123]>[ ]*<span class="mw-headline" >[A-Za-z0-9 ?_.]+</span></h[123]>

I can find the tags I need. But when I place them in a sed command, sed fails. So I started building up from a smaller command. This is where I am now:

sed -r -e s/"<a name=\"/replacement/ <in >out

This works. But when I enter:

sed -r -e s/"<a name=\"[A-Za-z0-9 ?_.]+"/replacement/ <in >out

it fails with:

sed: can't read <in: Invalid argument
sed: can't read >out: Invalid argument

But the in file is really there. How can I get the regexp in the sed command? I have tried escaping/not escaping chars, but sed does not seem to accept it.
  #2 (permalink)  
Old 04-25-2008
Franklin52 Franklin52 is offline Forum Staff  
Moderator
  
 

Join Date: Feb 2007
Posts: 4,330
Can you provide the ouput you desire?

Regards
  #3 (permalink)  
Old 04-25-2008
DocBrewer DocBrewer is offline
Registered User
  
 

Join Date: Apr 2008
Posts: 6
From a tag like this:

<a name="Introduction"></a><h1><span class="mw-headline" >Introduction</span></h1>

I'd like to make:

<a name="Introduction"></a><h1><span class="mw-headline" id="Introduction" >Introduction</span></h1>

Therefore I do the following replacement:

Match:
<a name="([A-Za-z0-9 ?_.]+)"></a><h([123])>[^mw]*mw-headline" >([A-Za-z0-9 ?_.]+)</span></h[123]>

And replace it with:

<a name="\1"></a><h\2><span class="mw-headline" id="\1" >\3</span></h\2>

This works when using a find and replace editor which accepts regex. But I can't seem to fit it in one sed command.
  #4 (permalink)  
Old 04-25-2008
Franklin52 Franklin52 is offline Forum Staff  
Moderator
  
 

Join Date: Feb 2007
Posts: 4,330
Something like:


Code:
echo '<a name="Introduction"></a><h1><span class="mw-headline" >Introduction</span></h1>'|
sed 's/\(.*"\)\(.*\)/\1 id="Introduction" \2/'

Regards
Closed Thread

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT -4. The time now is 08:40 PM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited. Language Translations Powered by .
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios
The UNIX and Linux Forums Content Copyright ©1993-2009. All Rights Reserved.Ad Management by RedTyger

Content Relevant URLs by vBSEO 3.2.0