I'm not sure if the problem I'm seeing is an artifact of sed or simply a beginner's mistake. Here's the problem: I want to add a zero-width space following each underscore between XML tags. For example, if I had the following xml:
<MY_BIG_TAG>This_is_a_test</MY_BIG_TAG>
It should look like this after I run my sed script:
<MY_BIG_TAG>This_is_a_test</MY_BIG_TAG>
To accomplish this, I found an example from the web and modified it for my purposes. Unfortunately, the example does not allow me to search for the underscore and then re-use it in the output. The script works fine if I just want to search on the underscore character and replace it with a different character (in this case, the zero-width space); however, as soon as I try to search for the underscore and replace it with that same underscore followed by a zero-width space, sed stalls and never completes.
Here's the script:
#/usr/bin/sh
sed 's/>[^>]*<\//\n&/g #This isolates strings in which I'm interested by inserting newline characters.
:loop
s/\(\n>[^>]*\)_\([^>]*<\/\)/\1_\\2/ #This is supposed to replace "_" with "_" but it does not; instead, it stalls sed.
t loop
s/\n//g' 1.xml #This removes the newline characters
This doesn't look like a problem with the underscore character itself since I have the same problem no matter what character I search on: I'm unable to find a character and replace it with the same character followed my something new.
This seems like such a basic sed feature that I'm inclined to think I'm doing something wrong.
I don't see any difference between the given input and desired output.
To keep the forums high quality for all users, please take the time to format your posts correctly.
First of all, use Code Tags when you post any code or data samples so others can easily read your code. You can easily do this by highlighting your code and then clicking on the # in the editing menu. (You can also type code tags [code] and [/code] by hand.)
Second, avoid adding color or different fonts and font size to your posts. Selective use of color to highlight a single word or phrase can be useful at times, but using color, in general, makes the forums harder to read, especially bright colors like red.
Third, be careful when you cut-and-paste, edit any odd characters and make sure all links are working property.
I made a mistake when I posted my code example. I neglected to show that I wanted to replace all underscores with that same underscore followed by a zero-width space. Here is the corrected example (I added "" in the sed loop:
<code>
#/usr/bin/sh
sed 's/>[^>]*<\//\n&/g #This isolates strings in which I'm interested by inserting newline characters.
:loop
s/\(\n>[^>]*\)_\([^>]*<\/\)/\1_\\2/ #This is supposed to replace "_" with "_" but it does not; instead, it stalls sed.
t loop
s/\n//g' 1.xml #This removes the newline characters
</code>
---------- Post updated at 03:09 PM ---------- Previous update was at 02:44 PM ----------
The forum wordprocessing tool has been removing my example of the zero-width space. When I try to type an ampersand followed by the pound sign and "8203;" it is rendered as a blank space by the forum editor.
Perhaps the moderator could help with the the required escape characters in this editor?
In the meantime, I'll provide an example using different characters that aren't problematic for the text editor. Here's an example where I want to replace all underscores with that same underscore followed by the letter "q":
Here is the code for the source file (1.xml)
My desired result with this revised example using "Q" is the following:
The code above works fine as long as I don't repeat the underscore in the output; in other words, if I replace the underscore with only the letter "Q" sed is able to complete.
Is there a way I can have sed repeat the underscore followed by the letter "Q"?
Hope this works for you... Slight modification from your solution:
Explanation :
1. s/\(<[^>]*>\)\([^>]*\)\(<[^>]*>\)/\1\n\2\3/g
This replaces like <MY_BIG_TAG>\nThis_is_a_test<MY_BIG_TAG>
2. starts loop
3. After \n till < arrives, substitute all underscore to _Q
4. Again checks if the same pattern appears, if it is, go through the loop again.
5. Atlast replace \n with the empty ( which we replaced in line 1).
The code you suggested worked very well! It replaces only the underscores betwen the tags, which is want I wanted.
What I'd like to know is what are the key differences in your script that enables sed to reuse the underscore, whereas in mine, sed completely hangs if I try to use the "\1_Q" (but works if I just use "\1Q").
Any ideas?
Last edited by rhetoric101; 09-15-2009 at 04:12 PM..
Reason: Forgot closing parenthesis
sed completely hangs if I try to use the "\1_Q" (but works if I just use "\1Q").
You basically search for "_" and replace it with "_Q". In the next round of the loop you find what you have just replaced and replace again thereby establishing an infinite loop.
You would have to search for "_<not followed by a Q>" (in regex "_[^Q]*") to avoid that loop.
I have a csv dataset like this :
C,rs18768
G,rs13785
GA,rs1065
G,rs1801279
T,rs9274407
A,rs730012
I'm thinking of use like awk, sed to covert the dataset to this format: (if it's two character, then keep the same)
CC,rs18768
GG,rs13785
GA,rs1065
GG,rs1801279
TT,rs9274407... (7 Replies)
Hello is it possible with awk or sed to replace any white space with the previous line characters in the same position?
I am asking this because the file I have doesn't always follow a pattern.
For example the file I have is the result of a command to obtain windows ACLs:
icacls C:\ /t... (5 Replies)
Hi,
Anyone can help using SED searches a character string for a specified delimiter character, and returns a leading or trailing space/blank.
Text file :
"1"|"ExternalClassDEA519CF5"|"Art1"
"2"|"ExternalClass563EA516C"|"Art3"
"3"|"ExternalClass305ED16B8"|"Art9"
...
...
... (2 Replies)
Hi there,
A total sed noob here. Is there a way using sed to delete everything before a character AND after another character on each line in a file? The deletion should also delete the indicating characters(here: an opening and a closing parenthesis).
The original file would look like... (3 Replies)
I am a newbie and would like some help with the following -
Trying to search fileA for a string similar to -
AS11000022010 30.4 31.7 43.7 53.8 60.5 71.1 75.2 74.7 66.9 56.6 42.7 32.5 53.3
I then want to replace that string with a string from fileB - ... (5 Replies)
Hi All,
Was wondering how I can do the following....
I have a String as follows
"ACCTRL000005022RRWDKKEEDKDD...."
This string can be in a file called tail.out or in a Variable called $VAR2
Now I have another variable called $VAR1="000004785" (9 bytes long), I need the content of... (5 Replies)
Hi all
I am trying to get my head around doing the following....
I have an input field that could contain either a number a blank field or a whitespace field.
What I want to do is delete a 0 (zero) if it's on its own or leading the number.
So:-
\t0 delete the zero
0 delete the... (8 Replies)
Hi
I need to write a script that read a input file that had same statement repeatedly to replace only 2nd & 5th time repeated statements (ex: This is UNIX forum) with another statement ( UNIX forum threads in Shell programming) with out modifying 1st,3,4th repeated statements. I am planning to do... (2 Replies)
It's all in the subject. I try to figure out how to repeat a character a number of time with printf.
For example to draw a line in a script output.
Thks (13 Replies)
I built a 12 million record file and made a mistake, one field is 1 character too long.
The record is 40 bytes and ends always in 999. I am trying to delete the 37 character in each record. Is this possible without doing a cut and paste. (1 Reply)