SED: Can't Repeat Search Character in SED Output


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers SED: Can't Repeat Search Character in SED Output
# 1  
Old 09-14-2009
SED: Can't Repeat Search Character in SED Output

I'm not sure if the problem I'm seeing is an artifact of sed or simply a beginner's mistake. Here's the problem: I want to add a zero-width space following each underscore between XML tags. For example, if I had the following xml:

<MY_BIG_TAG>This_is_a_test</MY_BIG_TAG>

It should look like this after I run my sed script:

<MY_BIG_TAG>This_​is_​a_​test</MY_BIG_TAG>

To accomplish this, I found an example from the web and modified it for my purposes. Unfortunately, the example does not allow me to search for the underscore and then re-use it in the output. The script works fine if I just want to search on the underscore character and replace it with a different character (in this case, the zero-width space); however, as soon as I try to search for the underscore and replace it with that same underscore followed by a zero-width space, sed stalls and never completes.

Here's the script:

#/usr/bin/sh

sed 's/>[^>]*<\//\n&/g #This isolates strings in which I'm interested by inserting newline characters.
:loop
s/\(\n>[^>]*\)_\([^>]*<\/\)/\1_\​\2/ #This is supposed to replace "_" with "_​" but it does not; instead, it stalls sed.
t loop
s/\n//g' 1.xml #This removes the newline characters

This doesn't look like a problem with the underscore character itself since I have the same problem no matter what character I search on: I'm unable to find a character and replace it with the same character followed my something new.

This seems like such a basic sed feature that I'm inclined to think I'm doing something wrong.

Any ideas?

Thanks,

Rob
# 2  
Old 09-14-2009
I don't see any difference between the given input and desired output.

To keep the forums high quality for all users, please take the time to format your posts correctly.

First of all, use Code Tags when you post any code or data samples so others can easily read your code. You can easily do this by highlighting your code and then clicking on the # in the editing menu. (You can also type code tags [code] and [/code] by hand.)

Second, avoid adding color or different fonts and font size to your posts. Selective use of color to highlight a single word or phrase can be useful at times, but using color, in general, makes the forums harder to read, especially bright colors like red.

Third, be careful when you cut-and-paste, edit any odd characters and make sure all links are working property.

Thank You.

The UNIX and Linux Forums
# 3  
Old 09-14-2009
Code correction

I made a mistake when I posted my code example. I neglected to show that I wanted to replace all underscores with that same underscore followed by a zero-width space. Here is the corrected example (I added "​" in the sed loop:

<code>

#/usr/bin/sh

sed 's/>[^>]*<\//\n&/g #This isolates strings in which I'm interested by inserting newline characters.
:loop
s/\(\n>[^>]*\)_\([^>]*<\/\)/\1_​​\​\2/ #This is supposed to replace "_" with "_​​​" but it does not; instead, it stalls sed.
t loop
s/\n//g' 1.xml #This removes the newline characters

</code>

---------- Post updated at 03:09 PM ---------- Previous update was at 02:44 PM ----------

The forum wordprocessing tool has been removing my example of the zero-width space. When I try to type an ampersand followed by the pound sign and "8203;" it is rendered as a blank space by the forum editor.

Perhaps the moderator could help with the the required escape characters in this editor?

In the meantime, I'll provide an example using different characters that aren't problematic for the text editor. Here's an example where I want to replace all underscores with that same underscore followed by the letter "q":

Code:
#/usr/bin/sh
 
sed 's/>[^>]*<\//\n&/g #This isolates strings in which I'm interested by inserting newline characters.
:loop
s/\(\n>[^>]*\)_\([^>]*<\/\)/\1_Q​​\​​\2/ #This is supposed to replace "_" with "_​​​​Q" but it does not; instead, it stalls sed.
t loop
s/\n//g' 1.xml #This removes the newline characters

Here is the code for the source file (1.xml)

Code:
<MY_BIG_TAG>This_is_a_test</MY_BIG_TAG>

My desired result with this revised example using "Q" is the following:

Code:
<MY_BIG_TAG>This_Qis_Qa_Qtest</MY_BIG_TAG>

The code above works fine as long as I don't repeat the underscore in the output; in other words, if I replace the underscore with only the letter "Q" sed is able to complete.

Is there a way I can have sed repeat the underscore followed by the letter "Q"?

Thanks,

Rob
# 4  
Old 09-15-2009
To replace "_" with "_Q" with sed:

Code:
sed 's/_/_Q/g' file

# 5  
Old 09-15-2009
Hope this works for you... Slight modification from your solution:
Code:
Input:
<MY_BIG_TAG>This_is_a_test</MY_BIG_TAG>

Code:
sed '
s/\(<[^>]*>\)\([^>]*\)\(<[^>]*>\)/\1\n\2\3/g 
:loop
s/\n\([^<_]*\)_/\1_Q\n/g 
/\n[^<_]*_/b loop
s/\n//g' a

Output:
<MY_BIG_TAG>This_Qis_Qa_Qtest</MY_BIG_TAG>

Explanation :

1. s/\(<[^>]*>\)\([^>]*\)\(<[^>]*>\)/\1\n\2\3/g
This replaces like <MY_BIG_TAG>\nThis_is_a_test<MY_BIG_TAG>
2. starts loop
3. After \n till < arrives, substitute all underscore to _Q
4. Again checks if the same pattern appears, if it is, go through the loop again.
5. Atlast replace \n with the empty ( which we replaced in line 1).
# 6  
Old 09-15-2009
The code you suggested worked very well! It replaces only the underscores betwen the tags, which is want I wanted.

Code:
sed '
s/\(<[^>]*>\)\([^>]*\)\(<[^>]*>\)/\1\n\2\3/g 
:loop
s/\n\([^<_]*\)_/\1_Q\n/g 
/\n[^<_]*_/b loop
s/\n//g' a

What I'd like to know is what are the key differences in your script that enables sed to reuse the underscore, whereas in mine, sed completely hangs if I try to use the "\1_Q" (but works if I just use "\1Q").

Any ideas?

Last edited by rhetoric101; 09-15-2009 at 04:12 PM.. Reason: Forgot closing parenthesis
# 7  
Old 09-15-2009
Quote:
Originally Posted by rhetoric101
sed completely hangs if I try to use the "\1_Q" (but works if I just use "\1Q").
You basically search for "_" and replace it with "_Q". In the next round of the loop you find what you have just replaced and replace again thereby establishing an infinite loop.

You would have to search for "_<not followed by a Q>" (in regex "_[^Q]*") to avoid that loop.

I hope this helps.

bakunin
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to repeat a character in a field if it's a single character?

I have a csv dataset like this : C,rs18768 G,rs13785 GA,rs1065 G,rs1801279 T,rs9274407 A,rs730012 I'm thinking of use like awk, sed to covert the dataset to this format: (if it's two character, then keep the same) CC,rs18768 GG,rs13785 GA,rs1065 GG,rs1801279 TT,rs9274407... (7 Replies)
Discussion started by: nengcheng
7 Replies

2. Shell Programming and Scripting

awk sed to repeat every character on same position from the upper line replacing whitespace

Hello is it possible with awk or sed to replace any white space with the previous line characters in the same position? I am asking this because the file I have doesn't always follow a pattern. For example the file I have is the result of a command to obtain windows ACLs: icacls C:\ /t... (5 Replies)
Discussion started by: nakaedu
5 Replies

3. Shell Programming and Scripting

sed searches a character string for a specified delimiter character, and returns a leading or traili

Hi, Anyone can help using SED searches a character string for a specified delimiter character, and returns a leading or trailing space/blank. Text file : "1"|"ExternalClassDEA519CF5"|"Art1" "2"|"ExternalClass563EA516C"|"Art3" "3"|"ExternalClass305ED16B8"|"Art9" ... ... ... (2 Replies)
Discussion started by: fspalero
2 Replies

4. Shell Programming and Scripting

Sed: delete on each line before a character and after a character

Hi there, A total sed noob here. Is there a way using sed to delete everything before a character AND after another character on each line in a file? The deletion should also delete the indicating characters(here: an opening and a closing parenthesis). The original file would look like... (3 Replies)
Discussion started by: bnbsd
3 Replies

5. Shell Programming and Scripting

sed help - search/copy from one file and search/paste to another

I am a newbie and would like some help with the following - Trying to search fileA for a string similar to - AS11000022010 30.4 31.7 43.7 53.8 60.5 71.1 75.2 74.7 66.9 56.6 42.7 32.5 53.3 I then want to replace that string with a string from fileB - ... (5 Replies)
Discussion started by: ncwxpanther
5 Replies

6. Shell Programming and Scripting

In Sed how can I replace starting from the 7th character to the 15th character.

Hi All, Was wondering how I can do the following.... I have a String as follows "ACCTRL000005022RRWDKKEEDKDD...." This string can be in a file called tail.out or in a Variable called $VAR2 Now I have another variable called $VAR1="000004785" (9 bytes long), I need the content of... (5 Replies)
Discussion started by: mohullah
5 Replies

7. Shell Programming and Scripting

sed to delete character 0 only when it's on its own?

Hi all I am trying to get my head around doing the following.... I have an input field that could contain either a number a blank field or a whitespace field. What I want to do is delete a 0 (zero) if it's on its own or leading the number. So:- \t0 delete the zero 0 delete the... (8 Replies)
Discussion started by: Bashingaway
8 Replies

8. Shell Programming and Scripting

use SED to replace repeat statements

Hi I need to write a script that read a input file that had same statement repeatedly to replace only 2nd & 5th time repeated statements (ex: This is UNIX forum) with another statement ( UNIX forum threads in Shell programming) with out modifying 1st,3,4th repeated statements. I am planning to do... (2 Replies)
Discussion started by: watsup
2 Replies

9. Shell Programming and Scripting

repeat character with printf

It's all in the subject. I try to figure out how to repeat a character a number of time with printf. For example to draw a line in a script output. Thks (13 Replies)
Discussion started by: ripat
13 Replies

10. Shell Programming and Scripting

Use sed to delete a character

I built a 12 million record file and made a mistake, one field is 1 character too long. The record is 40 bytes and ends always in 999. I am trying to delete the 37 character in each record. Is this possible without doing a cut and paste. (1 Reply)
Discussion started by: bthomas
1 Replies
Login or Register to Ask a Question