Shell Programming and Scripting

BSD, Linux, and UNIX shell scripting — Post awk, bash, csh, ksh, perl, php, python, sed, sh, shell scripts, and other shell scripting languages questions here.

Substitute newline with tab at designated field separator


👤 Login to reply

    #1  
Old 04-02-2013
yifangt yifangt is offline VIP Member  
UNIX.COM VIP Member
 
Substitute newline with tab at designated field separator

Hello, I need to replace newline with tab at certain lines of the file (every four lines is a record).
Code:
infile.fq:

@GAIIX-300
ATAGTCAAAT
+
_SZS^\\\cd
@GAIIX-300
CATACGACAT
+
hhghfdffhh
@GAIIX-300
GACGACGTAT
+
gggfc[hh]f

Code:
outfile:

@GAIIX-300    ATAGTCAAAT    +    _SZS^\\\cd
@GAIIX-300    CATACGACAT    +    hhghfdffhh
@GAIIX-300    GACGACGTAT    +    gggfc[hh]f

I used
Code:
 sed '/^@/N;/^@/N;/^@/N;s/\n/\t/g' infile.fq

without full understanding. I figured out this oneliner when I tried to understand newline substitution in sed.
1) Can anyone explain the three consecutive /^@/N; or N; which works too, for me (mean next line right)?
2) How can I use another command tr to do the job like:
Code:
!\n@ | tr  '\n' '\t' < infile.fq > outfile.tab

by adding the condition !\n@ to filter the record separator, which is @ here? I know awk can do the job much easier with RS="@", OFS="\t",
Code:
awk 'BEGIN{RS="@"; OFS="\t"} {print $1, $2, $3, $4}' infile.fq

but I want to understand how sed and tr work, if they can, in this case.
Thanks a lot!
YF

Last edited by yifangt; 04-02-2013 at 06:00 PM..
Sponsored Links
    #2  
Old 04-02-2013
rdrtx1 rdrtx1 is offline Forum Advisor  
Registered Pusher
 
try also:
Code:
paste - - - - < infile.fq > outfile

The Following User Says Thank You to rdrtx1 For This Useful Post:
yifangt (04-02-2013)
Sponsored Links
    #3  
Old 04-02-2013
yifangt yifangt is offline VIP Member  
UNIX.COM VIP Member
 
Thanks I forgot to mention that. Did you try the perl version, but not work out.
Code:
perl -pe 'BEGIN{$/="@\n"}s/\n/\t/g;$_.=$/'  infile.fq

What did I miss? Same thing as sed and tr for me to understand what is behind the scene. Thanks again!
    #4  
Old 04-02-2013
hanson44 hanson44 is offline
Registered User
 
For sed, each N appends the next line to the pattern space. At the end of the script, sed prints out the four lines glommed together, with tab subsituted for newline:
Code:
$ sed "N; N; N; s/\n/\t/g" infile.fq
@GAIIX-300      ATAGTCAAAT      +       _SZS^\\\cd
@GAIIX-300      CATACGACAT      +       hhghfdffhh
@GAIIX-300      GACGACGTAT      +       gggfc[hh]f

Using the diagnostic l command, and turning off auto-print, makes it more clear what is going on:
Code:
$ sed -n "N; N; N; l; s/\n/\t/g" infile.fq
@GAIIX-300\nATAGTCAAAT\n+\n_SZS^\\\\\\cd$
@GAIIX-300\nCATACGACAT\n+\nhhghfdffhh$
@GAIIX-300\nGACGACGTAT\n+\ngggfc[hh]f$

I totally don't understand your tr example, would forget about using tr for this.
The Following User Says Thank You to hanson44 For This Useful Post:
yifangt (04-02-2013)
Sponsored Links
    #5  
Old 04-02-2013
yifangt yifangt is offline VIP Member  
UNIX.COM VIP Member
 
Thanks Hanson!
What's in my mind with tr is: replace "\n" with "\t" if the rows not connected by "\n" and "@" (That's why I wrote, !\n@, which is not a correct syntax, obviously! ). It seems I have to forget this strategy!
Now the "N" is clear in the command. Could you explain what the "1" does? Similar script I saw with awk http://wiki.ljackson.us/Awk_Command.
Code:
# if a line ends with a backslash, append the next line to it 
# (fails if there are multiple lines ending with backslash...)
 awk '/pattern/ {sub(/\n/,"\t"); getline t; print $0 t; next}; 1' infile

Thanks again!

Last edited by yifangt; 04-02-2013 at 11:48 PM.. Reason: improve wording
Sponsored Links
    #6  
Old 04-02-2013
hanson44 hanson44 is offline
Registered User
 
Be happy to. l (ell, not one) is a basic sed command that helps diagnose what is going on. l (ell) prints the pattern space in a special format, for debugging. l (ell) is never (or extremely rarely) used for production scripts.

So in the example, l (ell) is showing what the pattern space looks like immediately before running the s (substitute) command. It shows the embedded \n characters that N introduced at each step. It also shows a $ at the end of the line. There is not really a $ there. It is just part of the special display format of the l command, to mark the end of the pattern space.

I think the mnemonic for l is "line", or maybe "list". Not sure.

BTW, if pattern space if long, l (ell) will wrap at 70 characters, which is usually not desirable. You could use "l 0" to run the l command without word wrapping.
The Following User Says Thank You to hanson44 For This Useful Post:
yifangt (04-02-2013)
Sponsored Links
    #7  
Old 04-03-2013
Scrutinizer's Unix or Linux Image
Scrutinizer Scrutinizer is offline Forum Staff  
Moderator
 
Note: '\t' is GNU sed only. Other seds need a hard TAB characters ( CTRL-V TAB )..


--
awk version:
Code:
awk 'ORS=NR%4?"\t":RS' file


Last edited by Scrutinizer; 04-03-2013 at 12:40 AM..
The Following User Says Thank You to Scrutinizer For This Useful Post:
yifangt (04-03-2013)
Sponsored Links
👤 Login to reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
change field separator only from nth field until NF beca123456 UNIX for Dummies Questions & Answers 1 08-17-2012 10:28 PM
how to convert comma delimited file to tab separator krupasindhu18 Shell Programming and Scripting 4 02-22-2012 09:24 AM
awk, comma as field separator and text inside double quotes as a field. kevintse Shell Programming and Scripting 8 11-15-2010 05:31 PM
join - using *only* tab as a field separator FrancoisCN Shell Programming and Scripting 7 06-15-2009 01:53 PM
How do I specify tab as field separator for sort? SSteve UNIX for Dummies Questions & Answers 8 04-26-2005 04:39 PM



All times are GMT -4. The time now is 01:44 PM.

Unix & Linux Forums Content Copyright©1993-2018. All Rights Reserved.
×
UNIX.COM Login
Username:
Password:  
Show Password





Not a Forum Member?
Forgot Password?