Unix/Linux Go Back    


Shell Programming and Scripting BSD, Linux, and UNIX shell scripting — Post awk, bash, csh, ksh, perl, php, python, sed, sh, shell scripts, and other shell scripting languages questions here.

How to replace multiple "&nbsp;" entry with in <td> tag into single entry using sed?

Shell Programming and Scripting


Reply    
 
Thread Tools Search this Thread Display Modes
    #1  
Old Unix and Linux 01-09-2017
thomasraj87 thomasraj87 is offline
Registered User
 
Join Date: Dec 2011
Last Activity: 27 February 2017, 7:34 PM EST
Posts: 35
Thanks: 12
Thanked 1 Time in 1 Post
How to replace multiple "&nbsp;" entry with in <td> tag into single entry using sed?

I have the input file like this.

Input file: 12.txt

1) There are one or more than one <tr> tags in same line.
2) Some tr tags may have one <td> or more tna one <td> tags within it.
3) Few <td> tags having "<td> &nbsp; </td>". Few having more than one "&nbsp;" entry in it.


Code:
<tr> some td tags </tr>
<tr> some td tags </tr><tr>some td tags</tr><tr>some td tags</tr>
<tr> some td tags </tr><tr><td> &nbsp; </td><td>some text</td></tr>
<tr> some td tags </tr><tr><td> &nbsp; </td> <td> &nbsp; </td><td>some text</td></tr>
<tr> some td tags </tr><tr><td> &nbsp; &nbsp; </td> <td> &nbsp; </td><td>some text</td></tr><tr>some td tags</tr>
<tr> some td tags </tr><tr><td> &nbsp; &nbsp; &nbsp; </td><td>some text</td></tr><tr>some td tags</tr>
<tr> some td tags </tr><tr><td> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </td><td>some text</td></tr><tr>some td tags</tr>
<tr> some td tags </tr><tr><td> &nbsp; &nbsp; &nbsp; </td> <td> &nbsp; </td></tr><tr>some td tags</tr>

Expected Output file:
I want to remove the multiple "&nbsp;" entry if exists within <td> and want to display only one "&nbsp;" entry like <td> &nbsp; </td> like below.

Code:
<tr> some td tags </tr><tr>some td tags</tr><tr>some td tags</tr>
<tr> some td tags </tr><tr><td> &nbsp; </td><td>some text</td></tr>
<tr> some td tags </tr><tr><td> &nbsp; </td> <td> &nbsp; </td><td>some text</td></tr>
<tr> some td tags </tr><tr><td> &nbsp; </td> <td> &nbsp; </td><td>some text</td></tr><tr>some td tags</tr>
<tr> some td tags </tr><tr><td> &nbsp; </td><td>some text</td></tr><tr>some td tags</tr>
<tr> some td tags </tr><tr><td> &nbsp; </td><td>some text</td></tr><tr>some td tags</tr>
<tr> some td tags </tr><tr><td> &nbsp; </td> <td> &nbsp; </td></tr><tr>some td tags</tr>

Tried with these some sed commands. Not getting expected output. Please help on this.

Code:
sed -e 's/<td> &nbsp; &nbsp; /<td> &nbsp; /g' 12.txt
sed -e 's/<td> &nbsp; /<td> &nbsp; /g' 12.txt

Sponsored Links
    #2  
Old Unix and Linux 01-09-2017
RudiC RudiC is offline Forum Staff  
Moderator
 
Join Date: Jul 2012
Last Activity: 27 March 2017, 4:08 PM EDT
Location: Aachen, Germany
Posts: 10,540
Thanks: 255
Thanked 3,227 Times in 2,974 Posts
info sed:
Quote:
the REPLACEMENT can contain unescaped '&' characters which reference the whole matched portion of the pattern space
So - use the escaped \& sequence there. And, for multiple search patterns, use a group regex with the * character:

Code:
sed 's/<td> \(&nbsp; \)*/<td> \&nbsp; /g' file
<tr> some td tags </tr>
<tr> some td tags </tr><tr>some td tags</tr><tr>some td tags</tr>
<tr> some td tags </tr><tr><td> &nbsp; </td><td>some text</td></tr>
<tr> some td tags </tr><tr><td> &nbsp; </td> <td> &nbsp; </td><td>some text</td></tr>
<tr> some td tags </tr><tr><td> &nbsp; </td> <td> &nbsp; </td><td>some text</td></tr><tr>some td tags</tr>
<tr> some td tags </tr><tr><td> &nbsp; </td><td>some text</td></tr><tr>some td tags</tr>
<tr> some td tags </tr><tr><td> &nbsp; </td><td>some text</td></tr><tr>some td tags</tr>
<tr> some td tags </tr><tr><td> &nbsp; </td> <td> &nbsp; </td></tr><tr>some td tags</tr>

The Following User Says Thank You to RudiC For This Useful Post:
thomasraj87 (01-11-2017)
Sponsored Links
    #3  
Old Unix and Linux 01-09-2017
Corona688 Corona688 is offline Forum Staff  
Mead Rotor
 
Join Date: Aug 2005
Last Activity: 27 March 2017, 6:32 PM EDT
Location: Saskatchewan
Posts: 22,012
Thanks: 1,063
Thanked 4,127 Times in 3,819 Posts
-e is redundant here. If your SED supports extended regexps:


Code:
$ echo "&nbsp; &nbsp; &nbsp; a &nbsp; &nbsp; &nbsp;" | sed -r 's/(&nbsp; *)+/\&nbsp;/g'
&nbsp;a &nbsp;

$

The ( ) brackets group a whole section, after which is a + for "one or more repeats of this expression".

The & has to be escaped in the output expression as \&, otherwise & has the special meaning "the entire matched expression" which would end up adding MORE &nbsp;

Question though: If specific numbers of non-breaking spaces aren't meant to be there, are any non-breaking spaces meant to be there? Why not replace them entirely with non-breaking spaces?
The Following User Says Thank You to Corona688 For This Useful Post:
thomasraj87 (01-11-2017)
    #4  
Old Unix and Linux 01-09-2017
RudiC RudiC is offline Forum Staff  
Moderator
 
Join Date: Jul 2012
Last Activity: 27 March 2017, 4:08 PM EDT
Location: Aachen, Germany
Posts: 10,540
Thanks: 255
Thanked 3,227 Times in 2,974 Posts
Corona688 is right - use the + repetition indicator instead of the * (which indicates zero or more repetitions).
The Following User Says Thank You to RudiC For This Useful Post:
thomasraj87 (01-11-2017)
Sponsored Links
    #5  
Old Unix and Linux 01-09-2017
Don Cragun's Unix or Linux Image
Don Cragun Don Cragun is offline Forum Staff  
Administrator
 
Join Date: Jul 2012
Last Activity: 27 March 2017, 5:59 PM EDT
Location: San Jose, CA, USA
Posts: 10,123
Thanks: 501
Thanked 3,503 Times in 2,983 Posts
You could also try the following, which, other than copying the 1st line of your sample input file unchanged to the output, seems to create the output you said you want (and I don't understand from your description why the 1st line should not be copied unchanged):

Code:
sed ':x
s/&nbsp; &nbsp;/\&nbsp;/g
tx' 12.txt

The Following User Says Thank You to Don Cragun For This Useful Post:
thomasraj87 (01-11-2017)
Sponsored Links
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Linux More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
awk command to replace ";" with "|" and ""|" at diferent places in line of file shis100 Shell Programming and Scripting 7 03-16-2011 08:59 AM
"Join" or "Merge" more than 2 files into single output based on common key (column) Katabatic Shell Programming and Scripting 1 05-20-2010 11:41 AM
VI Editor issue "E558: Terminal entry not found in terminfo" vikram3.r Solaris 8 03-11-2010 04:19 PM
how to delete entry in file "wtmpx"(/var/adm/wtmpx) arm_naja UNIX for Advanced & Expert Users 4 03-08-2006 03:00 AM
No utpmx entry: you must exec "login" from lowest level "shell" peterpan UNIX for Dummies Questions & Answers 0 01-18-2006 03:15 AM



All times are GMT -4. The time now is 08:25 PM.