How to replace multiple "&nbsp;" entry with in <td> tag into single entry using sed?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to replace multiple "&nbsp;" entry with in <td> tag into single entry using sed?
# 1  
Old 01-09-2017
How to replace multiple "&nbsp;" entry with in <td> tag into single entry using sed?

I have the input file like this.

Input file: 12.txt

1) There are one or more than one <tr> tags in same line.
2) Some tr tags may have one <td> or more tna one <td> tags within it.
3) Few <td> tags having "<td> &nbsp; </td>". Few having more than one "&nbsp;" entry in it.

Code:
<tr> some td tags </tr>
<tr> some td tags </tr><tr>some td tags</tr><tr>some td tags</tr>
<tr> some td tags </tr><tr><td> &nbsp; </td><td>some text</td></tr>
<tr> some td tags </tr><tr><td> &nbsp; </td> <td> &nbsp; </td><td>some text</td></tr>
<tr> some td tags </tr><tr><td> &nbsp; &nbsp; </td> <td> &nbsp; </td><td>some text</td></tr><tr>some td tags</tr>
<tr> some td tags </tr><tr><td> &nbsp; &nbsp; &nbsp; </td><td>some text</td></tr><tr>some td tags</tr>
<tr> some td tags </tr><tr><td> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </td><td>some text</td></tr><tr>some td tags</tr>
<tr> some td tags </tr><tr><td> &nbsp; &nbsp; &nbsp; </td> <td> &nbsp; </td></tr><tr>some td tags</tr>

Expected Output file:
I want to remove the multiple "&nbsp;" entry if exists within <td> and want to display only one "&nbsp;" entry like <td> &nbsp; </td> like below.
Code:
<tr> some td tags </tr><tr>some td tags</tr><tr>some td tags</tr>
<tr> some td tags </tr><tr><td> &nbsp; </td><td>some text</td></tr>
<tr> some td tags </tr><tr><td> &nbsp; </td> <td> &nbsp; </td><td>some text</td></tr>
<tr> some td tags </tr><tr><td> &nbsp; </td> <td> &nbsp; </td><td>some text</td></tr><tr>some td tags</tr>
<tr> some td tags </tr><tr><td> &nbsp; </td><td>some text</td></tr><tr>some td tags</tr>
<tr> some td tags </tr><tr><td> &nbsp; </td><td>some text</td></tr><tr>some td tags</tr>
<tr> some td tags </tr><tr><td> &nbsp; </td> <td> &nbsp; </td></tr><tr>some td tags</tr>

Tried with these some sed commands. Not getting expected output. Please help on this.
Code:
sed -e 's/<td> &nbsp; &nbsp; /<td> &nbsp; /g' 12.txt
sed -e 's/<td> &nbsp; /<td> &nbsp; /g' 12.txt

# 2  
Old 01-09-2017
info sed:
Quote:
the REPLACEMENT can contain unescaped '&' characters which reference the whole matched portion of the pattern space
So - use the escaped \& sequence there. And, for multiple search patterns, use a group regex with the * character:
Code:
sed 's/<td> \(&nbsp; \)*/<td> \&nbsp; /g' file
<tr> some td tags </tr>
<tr> some td tags </tr><tr>some td tags</tr><tr>some td tags</tr>
<tr> some td tags </tr><tr><td> &nbsp; </td><td>some text</td></tr>
<tr> some td tags </tr><tr><td> &nbsp; </td> <td> &nbsp; </td><td>some text</td></tr>
<tr> some td tags </tr><tr><td> &nbsp; </td> <td> &nbsp; </td><td>some text</td></tr><tr>some td tags</tr>
<tr> some td tags </tr><tr><td> &nbsp; </td><td>some text</td></tr><tr>some td tags</tr>
<tr> some td tags </tr><tr><td> &nbsp; </td><td>some text</td></tr><tr>some td tags</tr>
<tr> some td tags </tr><tr><td> &nbsp; </td> <td> &nbsp; </td></tr><tr>some td tags</tr>

This User Gave Thanks to RudiC For This Post:
# 3  
Old 01-09-2017
-e is redundant here. If your SED supports extended regexps:

Code:
$ echo "&nbsp; &nbsp; &nbsp; a &nbsp; &nbsp; &nbsp;" | sed -r 's/(&nbsp; *)+/\&nbsp;/g'
&nbsp;a &nbsp;

$

The ( ) brackets group a whole section, after which is a + for "one or more repeats of this expression".

The & has to be escaped in the output expression as \&, otherwise & has the special meaning "the entire matched expression" which would end up adding MORE &nbsp;

Question though: If specific numbers of non-breaking spaces aren't meant to be there, are any non-breaking spaces meant to be there? Why not replace them entirely with non-breaking spaces?
This User Gave Thanks to Corona688 For This Post:
# 4  
Old 01-09-2017
Corona688 is right - use the + repetition indicator instead of the * (which indicates zero or more repetitions).
This User Gave Thanks to RudiC For This Post:
# 5  
Old 01-09-2017
You could also try the following, which, other than copying the 1st line of your sample input file unchanged to the output, seems to create the output you said you want (and I don't understand from your description why the 1st line should not be copied unchanged):
Code:
sed ':x
s/&nbsp; &nbsp;/\&nbsp;/g
tx' 12.txt

This User Gave Thanks to Don Cragun For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

find files in sub dir with tag & add "." at the beginning [tag -f "Note" . | xargs -0 {} mv {} .{}]

I am trying find files in sub dir with certain tags using tag command, and add the period to the beginning. I can't use chflags hidden {} cause it doesn't add period to the beginning of the string for web purpose. So far with my knowledge, I only know mdfind or tag can be used to search files with... (6 Replies)
Discussion started by: Nexeu
6 Replies

2. UNIX for Beginners Questions & Answers

ERROR: ldapmodify: wrong attributeType at line 6, entry "olcDatabase={0}hdb,cn=config"

please use code tags, thanks Initially olcDatabase={2}config.ldif file was in non-prod and it does not have any entries of database and password as well that is why I was getting error as “ldap_bind: Invalid credentials (49)”¯ , I was comparing with my production olcDatabase={2}config.ldif file ... (1 Reply)
Discussion started by: Bibhusisa
1 Replies

3. Shell Programming and Scripting

awk Help: Filter Multiple Entry & print in one line.

AWK Gurus, data: srvhcm01 AZSCI srvhcm01 AZSDB srvhcm01 BZSDB srvhcm01 E2QDI31 srvhcm01 YPDCI srvhcm01 YPDDB srvhcm01 UV2FSCR srvhcm01 UV2FSBI srvhcm01 UV2FSXI srvhcm01 UV2FSUC srvhcm01 UV2FSEP srvhcm01 UV2FSRE srvhcm01 NASCI srvhcm01 NASDB srvhcm01 UV2FSSL srvhcm01 UV2FSDI (7 Replies)
Discussion started by: rveri
7 Replies

4. Shell Programming and Scripting

Replace dashes positions 351-357 & 024-043 with 0 & replace " " if exis with 04 at position 381-382

I need to replace dashes (i.e. -) if present from positions 351-357 with zero (i.e. 0), I also need to replace dash (i.e “-“) if present between position 024-043 with zero (i.e. 0) & I replace " " (i.e. 2 space characters) if present at position 381-382 with "04". Total length of record is 413.... (11 Replies)
Discussion started by: lancesunny
11 Replies

5. Shell Programming and Scripting

awk command to replace ";" with "|" and ""|" at diferent places in line of file

Hi, I have line in input file as below: 3G_CENTRAL;INDONESIA_(M)_TELKOMSEL;SPECIAL_WORLD_GRP_7_FA_2_TELKOMSEL My expected output for line in the file must be : "1-Radon1-cMOC_deg"|"LDIndex"|"3G_CENTRAL|INDONESIA_(M)_TELKOMSEL"|LAST|"SPECIAL_WORLD_GRP_7_FA_2_TELKOMSEL" Can someone... (7 Replies)
Discussion started by: shis100
7 Replies

6. Shell Programming and Scripting

"Join" or "Merge" more than 2 files into single output based on common key (column)

Hi All, I have working (Perl) code to combine 2 input files into a single output file using the join function that works to a point, but has the following limitations: 1. I am restrained to 2 input files only. 2. Only the "matched" fields are written out to the "matched" output file and... (1 Reply)
Discussion started by: Katabatic
1 Replies

7. Solaris

VI Editor issue "E558: Terminal entry not found in terminfo"

When trying to open a document with VI editor, getting the following error.. E558: Terminal entry not found in terminfo 'vt100' not known. Available builtin terminals are: builtin_gui builtin_riscos builtin_amiga builtin_beos-ansi builtin_ansi builtin_pcansi ... (8 Replies)
Discussion started by: vikram3.r
8 Replies

8. Shell Programming and Scripting

"sed" to check file size & echo " " to destination file

Hi, I've modified the syslogd source to include a thread that will keep track of a timer(or a timer thread). My intention is to check the file size of /var/log/messages in every one minute & if the size is more than 128KB, do a echo " " > /var/log/messages, so that the file size will be set... (7 Replies)
Discussion started by: jockey007
7 Replies

9. UNIX for Advanced & Expert Users

how to delete entry in file "wtmpx"(/var/adm/wtmpx)

Do someone know how to delete entry(some lines) in file "wtmpx" that command "last" use it. this file is binary so I cannot edit directy. ========================= #last root pts/1 noc Fri Mar 3 22:04 still logged in root pts/1 noc Fri Mar 3 22:01 - 22:02 ... (4 Replies)
Discussion started by: arm_naja
4 Replies

10. UNIX for Dummies Questions & Answers

No utpmx entry: you must exec "login" from lowest level "shell"

Hi I have installed solaris 10 on an intel machine. Logged in as root. In CDE, i open terminal session, type login alex (normal user account) and password and i get this message No utpmx entry: you must exec "login" from lowest level "shell" :confused: What i want is: open various... (0 Replies)
Discussion started by: peterpan
0 Replies
Login or Register to Ask a Question