Home Man
Search
Today's Posts
Register

BSD, Linux, and UNIX shell scripting — Post awk, bash, csh, ksh, perl, php, python, sed, sh, shell scripts, and other shell scripting languages questions here.

Substitute newline with tab at designated field separator

Tags
awk, field separator, newline, sed, shell scripts, tr

Login to Reply

 
Thread Tools Search this Thread
# 1  
Old 04-02-2013
Substitute newline with tab at designated field separator

Hello, I need to replace newline with tab at certain lines of the file (every four lines is a record).
Code:
infile.fq:

@GAIIX-300
ATAGTCAAAT
+
_SZS^\\\cd
@GAIIX-300
CATACGACAT
+
hhghfdffhh
@GAIIX-300
GACGACGTAT
+
gggfc[hh]f

Code:
outfile:

@GAIIX-300    ATAGTCAAAT    +    _SZS^\\\cd
@GAIIX-300    CATACGACAT    +    hhghfdffhh
@GAIIX-300    GACGACGTAT    +    gggfc[hh]f

I used
Code:
 sed '/^@/N;/^@/N;/^@/N;s/\n/\t/g' infile.fq

without full understanding. I figured out this oneliner when I tried to understand newline substitution in sed.
1) Can anyone explain the three consecutive /^@/N; or N; which works too, for me (mean next line right)?
2) How can I use another command tr to do the job like:
Code:
!\n@ | tr  '\n' '\t' < infile.fq > outfile.tab

by adding the condition !\n@ to filter the record separator, which is @ here? I know awk can do the job much easier with RS="@", OFS="\t",
Code:
awk 'BEGIN{RS="@"; OFS="\t"} {print $1, $2, $3, $4}' infile.fq

but I want to understand how sed and tr work, if they can, in this case.
Thanks a lot!
YF

Last edited by yifangt; 04-02-2013 at 06:00 PM..
# 2  
Old 04-02-2013
try also:
Code:
paste - - - - < infile.fq > outfile

The Following User Says Thank You to rdrtx1 For This Useful Post:
yifangt (04-02-2013)
# 3  
Old 04-02-2013
Thanks I forgot to mention that. Did you try the perl version, but not work out.
Code:
perl -pe 'BEGIN{$/="@\n"}s/\n/\t/g;$_.=$/'  infile.fq

What did I miss? Same thing as sed and tr for me to understand what is behind the scene. Thanks again!
# 4  
Old 04-02-2013
For sed, each N appends the next line to the pattern space. At the end of the script, sed prints out the four lines glommed together, with tab subsituted for newline:
Code:
$ sed "N; N; N; s/\n/\t/g" infile.fq
@GAIIX-300      ATAGTCAAAT      +       _SZS^\\\cd
@GAIIX-300      CATACGACAT      +       hhghfdffhh
@GAIIX-300      GACGACGTAT      +       gggfc[hh]f

Using the diagnostic l command, and turning off auto-print, makes it more clear what is going on:
Code:
$ sed -n "N; N; N; l; s/\n/\t/g" infile.fq
@GAIIX-300\nATAGTCAAAT\n+\n_SZS^\\\\\\cd$
@GAIIX-300\nCATACGACAT\n+\nhhghfdffhh$
@GAIIX-300\nGACGACGTAT\n+\ngggfc[hh]f$

I totally don't understand your tr example, would forget about using tr for this.
The Following User Says Thank You to hanson44 For This Useful Post:
yifangt (04-02-2013)
# 5  
Old 04-02-2013
Thanks Hanson!
What's in my mind with tr is: replace "\n" with "\t" if the rows not connected by "\n" and "@" (That's why I wrote, !\n@, which is not a correct syntax, obviously! ). It seems I have to forget this strategy!
Now the "N" is clear in the command. Could you explain what the "1" does? Similar script I saw with awk http://wiki.ljackson.us/Awk_Command.
Code:
# if a line ends with a backslash, append the next line to it 
# (fails if there are multiple lines ending with backslash...)
 awk '/pattern/ {sub(/\n/,"\t"); getline t; print $0 t; next}; 1' infile

Thanks again!

Last edited by yifangt; 04-02-2013 at 11:48 PM.. Reason: improve wording
# 6  
Old 04-02-2013
Be happy to. l (ell, not one) is a basic sed command that helps diagnose what is going on. l (ell) prints the pattern space in a special format, for debugging. l (ell) is never (or extremely rarely) used for production scripts.

So in the example, l (ell) is showing what the pattern space looks like immediately before running the s (substitute) command. It shows the embedded \n characters that N introduced at each step. It also shows a $ at the end of the line. There is not really a $ there. It is just part of the special display format of the l command, to mark the end of the pattern space.

I think the mnemonic for l is "line", or maybe "list". Not sure.

BTW, if pattern space if long, l (ell) will wrap at 70 characters, which is usually not desirable. You could use "l 0" to run the l command without word wrapping.
The Following User Says Thank You to hanson44 For This Useful Post:
yifangt (04-02-2013)
# 7  
Old 04-03-2013
Note: '\t' is GNU sed only. Other seds need a hard TAB characters ( CTRL-V TAB )..


--
awk version:
Code:
awk 'ORS=NR%4?"\t":RS' file


Last edited by Scrutinizer; 04-03-2013 at 12:40 AM..
The Following User Says Thank You to Scrutinizer For This Useful Post:
yifangt (04-03-2013)
Login to Reply

« Previous Thread | Next Thread »
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Inserting a field without disturbing field separator on other fields am24 Shell Programming and Scripting 15 05-16-2016 03:16 AM
Field separator m6248m Shell Programming and Scripting 11 06-10-2015 02:33 PM
awk field separator help - rveri Shell Programming and Scripting 9 08-29-2013 12:39 AM
awk field separator locoroco Shell Programming and Scripting 2 01-28-2013 01:11 AM
change field separator only from nth field until NF beca123456 UNIX for Dummies Questions & Answers 1 08-17-2012 10:28 PM
echo field separator locoroco Shell Programming and Scripting 3 08-08-2011 08:29 PM
Field separator X'1F' rohan10k Shell Programming and Scripting 2 05-24-2011 09:17 AM
awk, comma as field separator and text inside double quotes as a field. kevintse Shell Programming and Scripting 8 11-15-2010 05:31 PM
Field separator in awk aoussenko Shell Programming and Scripting 2 03-29-2010 12:59 PM
Can't figure out what field separator to use in awk.... thom.mattson UNIX for Dummies Questions & Answers 3 06-10-2009 02:42 AM


All times are GMT -4. The time now is 05:59 AM.

Unix & Linux Forums Content Copyright©1993-2018. All Rights Reserved.
UNIX.COM Login
Username:
Password:  
Show Password