Selectively deleting newlines with sed


Login or Register to Reply

 
Thread Tools Search this Thread
# 1  
Selectively deleting newlines with sed

I have a file that look like this:
Code:
>Muestra-1
agctgcgagctgcgaccc
gggttatata
ggaagagacacacacaccccc
>Muestra-2
agctgcg
agctgcgacccgggttatataggaagagac
acacacaccccc
>Muestra-3
agctgcgagctgcgaccc
gggttatata
ggaagagacacacacaccccc

I use the following sed script to remove newlines from lines not starting with >
Code:
sed ':a /^>/!N;s/\r\?\n\([^>]\)/\1/;ta'

I was trying to use b instead of t. So, this is what I did:
Code:
sed '/^>/!{:a;N;$!ba};s/\r\?\n//g'

but didnt get the desired result. Is there any way to use b in the second script to eliminate the newlines skipping those ones that start with >?
# 2  
Quote:
Originally Posted by Xterra
So, this is what I did:
Code:
sed '/^>/!{:a;N;$!ba};s/\r\?\n//g'

but didnt get the desired result. Is there any way to use b in the second script to eliminate the newlines skipping those ones that start with >?
The problem does not have anything to do with "t" or "b" but how sed actually works: lets say you have a sed-script like this:#

Code:
sed 'command1
     command2
     /regexp/ {
           command3
           command4
     }' /some/file

What happens is this: sed will read in the first line of the input file (this is called the "pattern space"), then apply the first line of its script to it ("command1"), then the next and so on until it reaches the end of the script. If still something is in the pattern space it will be printed to stdout, then the next line of input is read, setting the pattern space to it, then apply the first command ... So, in table format:

Code:
read line1 of input
apply "command1" to it
apply "command2" to the result of previous line
if /regexp/ matches
     apply "command3" to the result of previous line
     apply "command4" to the result of previous line
endif
read next line of input
apply "command1" to it
...

Now, what does your code do:
Code:
/^>/!            # do the following for all lines not starting with a ">"
     {:a                 # define a return point for any "t" or "b" command
     N                   # read next line immediately, not reurning to the beginning of the line
     $! ba               # if this is not the last line jump to a
     }
s/\r\?\n//g

Do you spot it? Once you are inside the condition it is never checked again, you only loop inside it, always adding more text to the pattern space but never doing anything with it - until you hit the last line. Also notice that "/^>/" is true for ANY pattern space content starting with ">". That means, for this:

Code:
> bla foo

but also for this, after adding a line:

Code:
> bla foo
more text

And the same goes the other way: not "/^>/" is true for this:

Code:
foo bar

but also for this:

Code:
foo bar
> a line starting with ">"

This means your logic is wrong, regardless of using "t" or "b". The difference is that "t" will branch only when the last s/...-command actually did something, whereas "b" will branch always. Say, this is the input file:

Code:
xxx
yyy
xxx

and this is your sed-script working on the file:

Code:
sed 's/xxx/XXX/
b end
s/yyy/YYY/
:end'

Then the substitution of "yyy" to "YYY" will never take place because ot is unconditionally skipped over. If you change the "b" to a "t" it will be executed because in the lines with no "xxx" the first substitution will do nothing and therefore the "t" will not branch to end.

I hope this helps.

bakunin
Login or Register to Reply

|
Thread Tools Search this Thread
Search this Thread:
Advanced Search

More UNIX and Linux Forum Topics You Might Find Helpful
sed a multiple line pattern that has newlines followed by text and r
v_vineeta11
Here is the text that I was to run sed on. In this text I want to insert a semi colon ';' before 'select a13.STORE_TYPE STORE_TYPE,' and after 'from ZZMR00 pa11' Input text: insert into ZZMQ01 select pa11.STATE_NBR STATE_NBR, pa11.STORE_TYPE STORE_TYPE, ...... Shell Programming and Scripting
9
Shell Programming and Scripting
sed replacing required newlines
midhun19
hi i have a requirement to replace a string with another using sed and to get the result newline separated but after sed replacement the newline vanishes below is sample code #!/bin/ksh set -x string="name sam\nage 45 \nsport soccer" echo $string string=`echo $string | sed...... Shell Programming and Scripting
2
Shell Programming and Scripting
sed remove newlines and spaces
rishav
Hi all, i am getting count from oracle 11g by spooling it to a file. Now there are some newline characters and blank spaces i need to remove these. pl provide me a awk/sed solution. the spooled file is attached. i tried this.. but not getting req o/p... Shell Programming and Scripting
6
Shell Programming and Scripting
replacing strings with newlines : sed
hkansal
Hi everyone, Since the previous time I received help from unix.com I have been encouraged to learn more. going through 1 of the articles(View Article) on sed I found, it pointed an interesting situation. Suppose the text is : Romeo and Ethel the Dancer Moves Audience to Tears. I...... Shell Programming and Scripting
3
Shell Programming and Scripting
replacing comma's with newlines using sed
newbie_coder
Hi All, silly question that I'm sure is easy to answer for a more experienced coder... I have a file called test.txt containing the following text... need, to, break, this, line, into, individual, lines using sed, I'd like to make the file look like this... need to break this line...... Shell Programming and Scripting
5
Shell Programming and Scripting