Substitute newline with tab at designated field separator | Unix Linux Forums | Shell Programming and Scripting

  Go Back    


Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

Substitute newline with tab at designated field separator

Shell Programming and Scripting


Tags
awk, field separator, newline, sed, tr

Closed Thread    
 
Thread Tools Search this Thread Display Modes
    #1  
Old 04-02-2013
yifangt yifangt is offline
Registered User
 
Join Date: Sep 2009
Last Activity: 2 September 2014, 6:30 PM EDT
Location: Saskatchewan, Canada
Posts: 323
Thanks: 175
Thanked 5 Times in 5 Posts
Substitute newline with tab at designated field separator

Hello, I need to replace newline with tab at certain lines of the file (every four lines is a record).

Code:
infile.fq:

@GAIIX-300
ATAGTCAAAT
+
_SZS^\\\cd
@GAIIX-300
CATACGACAT
+
hhghfdffhh
@GAIIX-300
GACGACGTAT
+
gggfc[hh]f


Code:
outfile:

@GAIIX-300    ATAGTCAAAT    +    _SZS^\\\cd
@GAIIX-300    CATACGACAT    +    hhghfdffhh
@GAIIX-300    GACGACGTAT    +    gggfc[hh]f

I used
Code:
 sed '/^@/N;/^@/N;/^@/N;s/\n/\t/g' infile.fq

without full understanding. I figured out this oneliner when I tried to understand newline substitution in sed.
1) Can anyone explain the three consecutive /^@/N; or N; which works too, for me (mean next line right)?
2) How can I use another command tr to do the job like:

Code:
!\n@ | tr  '\n' '\t' < infile.fq > outfile.tab

by adding the condition !\n@ to filter the record separator, which is @ here? I know awk can do the job much easier with RS="@", OFS="\t",

Code:
awk 'BEGIN{RS="@"; OFS="\t"} {print $1, $2, $3, $4}' infile.fq

but I want to understand how sed and tr work, if they can, in this case.
Thanks a lot!
YF

Last edited by yifangt; 04-02-2013 at 06:00 PM..
Sponsored Links
    #2  
Old 04-02-2013
rdrtx1 rdrtx1 is offline
Registered User
 
Join Date: Sep 2012
Last Activity: 15 July 2014, 11:14 AM EDT
Location: Houston, Texas, USA
Posts: 675
Thanks: 0
Thanked 203 Times in 195 Posts
try also:

Code:
paste - - - - < infile.fq > outfile

The Following User Says Thank You to rdrtx1 For This Useful Post:
yifangt (04-02-2013)
Sponsored Links
    #3  
Old 04-02-2013
yifangt yifangt is offline
Registered User
 
Join Date: Sep 2009
Last Activity: 2 September 2014, 6:30 PM EDT
Location: Saskatchewan, Canada
Posts: 323
Thanks: 175
Thanked 5 Times in 5 Posts
Thanks I forgot to mention that. Did you try the perl version, but not work out.

Code:
perl -pe 'BEGIN{$/="@\n"}s/\n/\t/g;$_.=$/'  infile.fq

What did I miss? Same thing as sed and tr for me to understand what is behind the scene. Thanks again!
    #4  
Old 04-02-2013
hanson44 hanson44 is offline
Registered User
 
Join Date: Mar 2013
Last Activity: 12 May 2013, 11:33 PM EDT
Posts: 858
Thanks: 18
Thanked 180 Times in 177 Posts
For sed, each N appends the next line to the pattern space. At the end of the script, sed prints out the four lines glommed together, with tab subsituted for newline:
Code:
$ sed "N; N; N; s/\n/\t/g" infile.fq
@GAIIX-300      ATAGTCAAAT      +       _SZS^\\\cd
@GAIIX-300      CATACGACAT      +       hhghfdffhh
@GAIIX-300      GACGACGTAT      +       gggfc[hh]f

Using the diagnostic l command, and turning off auto-print, makes it more clear what is going on:

Code:
$ sed -n "N; N; N; l; s/\n/\t/g" infile.fq
@GAIIX-300\nATAGTCAAAT\n+\n_SZS^\\\\\\cd$
@GAIIX-300\nCATACGACAT\n+\nhhghfdffhh$
@GAIIX-300\nGACGACGTAT\n+\ngggfc[hh]f$

I totally don't understand your tr example, would forget about using tr for this.
The Following User Says Thank You to hanson44 For This Useful Post:
yifangt (04-02-2013)
Sponsored Links
    #5  
Old 04-02-2013
yifangt yifangt is offline
Registered User
 
Join Date: Sep 2009
Last Activity: 2 September 2014, 6:30 PM EDT
Location: Saskatchewan, Canada
Posts: 323
Thanks: 175
Thanked 5 Times in 5 Posts
Thanks Hanson!
What's in my mind with tr is: replace "\n" with "\t" if the rows not connected by "\n" and "@" (That's why I wrote, !\n@, which is not a correct syntax, obviously! ). It seems I have to forget this strategy!
Now the "N" is clear in the command. Could you explain what the "1" does? Similar script I saw with awk http://wiki.ljackson.us/Awk_Command.

Code:
# if a line ends with a backslash, append the next line to it 
# (fails if there are multiple lines ending with backslash...)
 awk '/pattern/ {sub(/\n/,"\t"); getline t; print $0 t; next}; 1' infile

Thanks again!

Last edited by yifangt; 04-02-2013 at 11:48 PM.. Reason: improve wording
Sponsored Links
    #6  
Old 04-02-2013
hanson44 hanson44 is offline
Registered User
 
Join Date: Mar 2013
Last Activity: 12 May 2013, 11:33 PM EDT
Posts: 858
Thanks: 18
Thanked 180 Times in 177 Posts
Be happy to. l (ell, not one) is a basic sed command that helps diagnose what is going on. l (ell) prints the pattern space in a special format, for debugging. l (ell) is never (or extremely rarely) used for production scripts.

So in the example, l (ell) is showing what the pattern space looks like immediately before running the s (substitute) command. It shows the embedded \n characters that N introduced at each step. It also shows a $ at the end of the line. There is not really a $ there. It is just part of the special display format of the l command, to mark the end of the pattern space.

I think the mnemonic for l is "line", or maybe "list". Not sure.

BTW, if pattern space if long, l (ell) will wrap at 70 characters, which is usually not desirable. You could use "l 0" to run the l command without word wrapping.
The Following User Says Thank You to hanson44 For This Useful Post:
yifangt (04-02-2013)
Sponsored Links
    #7  
Old 04-03-2013
Scrutinizer's Avatar
Scrutinizer Scrutinizer is online now Forum Staff  
Moderator
 
Join Date: Nov 2008
Last Activity: 3 September 2014, 2:49 AM EDT
Location: Amsterdam
Posts: 9,387
Thanks: 273
Thanked 2,349 Times in 2,108 Posts
Note: '\t' is GNU sed only. Other seds need a hard TAB characters ( CTRL-V TAB )..


--
awk version:

Code:
awk 'ORS=NR%4?"\t":RS' file


Last edited by Scrutinizer; 04-03-2013 at 12:40 AM..
The Following User Says Thank You to Scrutinizer For This Useful Post:
yifangt (04-03-2013)
Sponsored Links
Closed Thread

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
change field separator only from nth field until NF beca123456 UNIX for Dummies Questions & Answers 1 08-17-2012 10:28 PM
how to convert comma delimited file to tab separator krupasindhu18 Shell Programming and Scripting 4 02-22-2012 09:24 AM
awk, comma as field separator and text inside double quotes as a field. kevintse Shell Programming and Scripting 8 11-15-2010 05:31 PM
join - using *only* tab as a field separator FrancoisCN Shell Programming and Scripting 7 06-15-2009 01:53 PM
How do I specify tab as field separator for sort? SSteve UNIX for Dummies Questions & Answers 8 04-26-2005 04:39 PM



All times are GMT -4. The time now is 03:00 AM.