Substitute newline with tab at designated field separator

Tags
awk, field separator, newline, sed, shell scripts, tr

Login to Reply

 
Thread Tools Search this Thread
# 1  
Old 04-02-2013
Substitute newline with tab at designated field separator

Hello, I need to replace newline with tab at certain lines of the file (every four lines is a record).
Code:
infile.fq:

@GAIIX-300
ATAGTCAAAT
+
_SZS^\\\cd
@GAIIX-300
CATACGACAT
+
hhghfdffhh
@GAIIX-300
GACGACGTAT
+
gggfc[hh]f

Code:
outfile:

@GAIIX-300    ATAGTCAAAT    +    _SZS^\\\cd
@GAIIX-300    CATACGACAT    +    hhghfdffhh
@GAIIX-300    GACGACGTAT    +    gggfc[hh]f

I used
Code:
 sed '/^@/N;/^@/N;/^@/N;s/\n/\t/g' infile.fq

without full understanding. I figured out this oneliner when I tried to understand newline substitution in sed.
1) Can anyone explain the three consecutive /^@/N; or N; which works too, for me (mean next line right)?
2) How can I use another command tr to do the job like:
Code:
!\n@ | tr  '\n' '\t' < infile.fq > outfile.tab

by adding the condition !\n@ to filter the record separator, which is @ here? I know awk can do the job much easier with RS="@", OFS="\t",
Code:
awk 'BEGIN{RS="@"; OFS="\t"} {print $1, $2, $3, $4}' infile.fq

but I want to understand how sed and tr work, if they can, in this case.
Thanks a lot!
YF

Last edited by yifangt; 04-02-2013 at 07:00 PM..
# 2  
Old 04-02-2013
try also:
Code:
paste - - - - < infile.fq > outfile

This User Gave Thanks to rdrtx1 For This Post:
yifangt (04-02-2013)
# 3  
Old 04-02-2013
Thanks I forgot to mention that. Did you try the perl version, but not work out.
Code:
perl -pe 'BEGIN{$/="@\n"}s/\n/\t/g;$_.=$/'  infile.fq

What did I miss? Same thing as sed and tr for me to understand what is behind the scene. Thanks again!
# 4  
Old 04-02-2013
For sed, each N appends the next line to the pattern space. At the end of the script, sed prints out the four lines glommed together, with tab subsituted for newline:
Code:
$ sed "N; N; N; s/\n/\t/g" infile.fq
@GAIIX-300      ATAGTCAAAT      +       _SZS^\\\cd
@GAIIX-300      CATACGACAT      +       hhghfdffhh
@GAIIX-300      GACGACGTAT      +       gggfc[hh]f

Using the diagnostic l command, and turning off auto-print, makes it more clear what is going on:
Code:
$ sed -n "N; N; N; l; s/\n/\t/g" infile.fq
@GAIIX-300\nATAGTCAAAT\n+\n_SZS^\\\\\\cd$
@GAIIX-300\nCATACGACAT\n+\nhhghfdffhh$
@GAIIX-300\nGACGACGTAT\n+\ngggfc[hh]f$

I totally don't understand your tr example, would forget about using tr for this.
This User Gave Thanks to hanson44 For This Post:
yifangt (04-03-2013)
# 5  
Old 04-03-2013
Thanks Hanson!
What's in my mind with tr is: replace "\n" with "\t" if the rows not connected by "\n" and "@" (That's why I wrote, !\n@, which is not a correct syntax, obviously! ). It seems I have to forget this strategy!
Now the "N" is clear in the command. Could you explain what the "1" does? Similar script I saw with awk http://wiki.ljackson.us/Awk_Command.
Code:
# if a line ends with a backslash, append the next line to it 
# (fails if there are multiple lines ending with backslash...)
 awk '/pattern/ {sub(/\n/,"\t"); getline t; print $0 t; next}; 1' infile

Thanks again!

Last edited by yifangt; 04-03-2013 at 12:48 AM.. Reason: improve wording
# 6  
Old 04-03-2013
Be happy to. l (ell, not one) is a basic sed command that helps diagnose what is going on. l (ell) prints the pattern space in a special format, for debugging. l (ell) is never (or extremely rarely) used for production scripts.

So in the example, l (ell) is showing what the pattern space looks like immediately before running the s (substitute) command. It shows the embedded \n characters that N introduced at each step. It also shows a $ at the end of the line. There is not really a $ there. It is just part of the special display format of the l command, to mark the end of the pattern space.

I think the mnemonic for l is "line", or maybe "list". Not sure.

BTW, if pattern space if long, l (ell) will wrap at 70 characters, which is usually not desirable. You could use "l 0" to run the l command without word wrapping.
This User Gave Thanks to hanson44 For This Post:
yifangt (04-03-2013)
# 7  
Old 04-03-2013
Note: '\t' is GNU sed only. Other seds need a hard TAB characters ( CTRL-V TAB )..


--
awk version:
Code:
awk 'ORS=NR%4?"\t":RS' file


Last edited by Scrutinizer; 04-03-2013 at 01:40 AM..
This User Gave Thanks to Scrutinizer For This Post:
yifangt (04-03-2013)
Login to Reply

|
Thread Tools Search this Thread
Search this Thread:
Advanced Search

Similar Threads More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Inserting a field without disturbing field separator on other fields am24 Shell Programming and Scripting 15 05-16-2016 04:16 AM
Field separator m6248m Shell Programming and Scripting 11 06-10-2015 03:33 PM
How to remove newline, tab, spaces in curly braces.. :( Pls Help? NY_777 Shell Programming and Scripting 6 10-01-2014 09:34 AM
awk field separator help - rveri Shell Programming and Scripting 9 08-29-2013 01:39 AM
change field separator only from nth field until NF beca123456 UNIX for Dummies Questions & Answers 1 08-17-2012 11:28 PM
Strings as Field separator pamu Shell Programming and Scripting 3 06-21-2012 04:22 AM
how to convert comma delimited file to tab separator krupasindhu18 Shell Programming and Scripting 4 02-22-2012 10:24 AM
Array and field separator Dedalus Shell Programming and Scripting 3 08-31-2011 10:01 AM
Field separator X'1F' rohan10k Shell Programming and Scripting 2 05-24-2011 10:17 AM
awk, comma as field separator and text inside double quotes as a field. kevintse Shell Programming and Scripting 8 11-15-2010 06:31 PM
sed newline to tab ,problem cola Shell Programming and Scripting 5 10-19-2010 07:23 AM
join - using *only* tab as a field separator FrancoisCN Shell Programming and Scripting 7 06-15-2009 02:53 PM
Field separator Ques. yahyaaa Shell Programming and Scripting 6 09-16-2008 02:54 PM
newline character, space and tab after a string angelina UNIX for Advanced & Expert Users 6 07-11-2008 12:10 PM
How do I specify tab as field separator for sort? SSteve UNIX for Dummies Questions & Answers 8 04-26-2005 05:39 PM
All times are GMT -4. The time now is 09:47 AM.

Unix & Linux Forums Content Copyright 1993-2018. All Rights Reserved.
UNIX.COM Login
Username:
Password:  
Show Password





Not a Forum Member?
Forgot Password?