Substitute newline with tab at designated field separator


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Substitute newline with tab at designated field separator
# 1  
Old 04-02-2013
Substitute newline with tab at designated field separator

Hello, I need to replace newline with tab at certain lines of the file (every four lines is a record).
Code:
infile.fq:

@GAIIX-300
ATAGTCAAAT
+
_SZS^\\\cd
@GAIIX-300
CATACGACAT
+
hhghfdffhh
@GAIIX-300
GACGACGTAT
+
gggfc[hh]f

Code:
outfile:

@GAIIX-300    ATAGTCAAAT    +    _SZS^\\\cd
@GAIIX-300    CATACGACAT    +    hhghfdffhh
@GAIIX-300    GACGACGTAT    +    gggfc[hh]f

I used
Code:
 sed '/^@/N;/^@/N;/^@/N;s/\n/\t/g' infile.fq

without full understanding. I figured out this oneliner when I tried to understand newline substitution in sed.
1) Can anyone explain the three consecutive /^@/N; or N; which works too, for me (mean next line right)?
2) How can I use another command tr to do the job like:
Code:
!\n@ | tr  '\n' '\t' < infile.fq > outfile.tab

by adding the condition !\n@ to filter the record separator, which is @ here? I know awk can do the job much easier with RS="@", OFS="\t",
Code:
awk 'BEGIN{RS="@"; OFS="\t"} {print $1, $2, $3, $4}' infile.fq

but I want to understand how sed and tr work, if they can, in this case.
Thanks a lot!
YF

Last edited by yifangt; 04-02-2013 at 07:00 PM..
# 2  
Old 04-02-2013
try also:
Code:
paste - - - - < infile.fq > outfile

This User Gave Thanks to rdrtx1 For This Post:
# 3  
Old 04-02-2013
Thanks I forgot to mention that. Did you try the perl version, but not work out.
Code:
perl -pe 'BEGIN{$/="@\n"}s/\n/\t/g;$_.=$/'  infile.fq

What did I miss? Same thing as sed and tr for me to understand what is behind the scene. Thanks again!
# 4  
Old 04-02-2013
For sed, each N appends the next line to the pattern space. At the end of the script, sed prints out the four lines glommed together, with tab subsituted for newline:
Code:
$ sed "N; N; N; s/\n/\t/g" infile.fq
@GAIIX-300      ATAGTCAAAT      +       _SZS^\\\cd
@GAIIX-300      CATACGACAT      +       hhghfdffhh
@GAIIX-300      GACGACGTAT      +       gggfc[hh]f

Using the diagnostic l command, and turning off auto-print, makes it more clear what is going on:
Code:
$ sed -n "N; N; N; l; s/\n/\t/g" infile.fq
@GAIIX-300\nATAGTCAAAT\n+\n_SZS^\\\\\\cd$
@GAIIX-300\nCATACGACAT\n+\nhhghfdffhh$
@GAIIX-300\nGACGACGTAT\n+\ngggfc[hh]f$

I totally don't understand your tr example, would forget about using tr for this.
This User Gave Thanks to hanson44 For This Post:
# 5  
Old 04-03-2013
Thanks Hanson!
What's in my mind with tr is: replace "\n" with "\t" if the rows not connected by "\n" and "@" (That's why I wrote, !\n@, which is not a correct syntax, obviously! ). It seems I have to forget this strategy!
Now the "N" is clear in the command. Could you explain what the "1" does? Similar script I saw with awk http://wiki.ljackson.us/Awk_Command.
Code:
# if a line ends with a backslash, append the next line to it 
# (fails if there are multiple lines ending with backslash...)
 awk '/pattern/ {sub(/\n/,"\t"); getline t; print $0 t; next}; 1' infile

Thanks again!

Last edited by yifangt; 04-03-2013 at 12:48 AM.. Reason: improve wording
# 6  
Old 04-03-2013
Be happy to. l (ell, not one) is a basic sed command that helps diagnose what is going on. l (ell) prints the pattern space in a special format, for debugging. l (ell) is never (or extremely rarely) used for production scripts.

So in the example, l (ell) is showing what the pattern space looks like immediately before running the s (substitute) command. It shows the embedded \n characters that N introduced at each step. It also shows a $ at the end of the line. There is not really a $ there. It is just part of the special display format of the l command, to mark the end of the pattern space.

I think the mnemonic for l is "line", or maybe "list". Not sure.

BTW, if pattern space if long, l (ell) will wrap at 70 characters, which is usually not desirable. You could use "l 0" to run the l command without word wrapping.
This User Gave Thanks to hanson44 For This Post:
# 7  
Old 04-03-2013
Note: '\t' is GNU sed only. Other seds need a hard TAB characters ( CTRL-V TAB )..


--
awk version:
Code:
awk 'ORS=NR%4?"\t":RS' file


Last edited by Scrutinizer; 04-03-2013 at 01:40 AM..
This User Gave Thanks to Scrutinizer For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Inserting a field without disturbing field separator on other fields

Hi All, I have the input as below: cat input 032016002 2.891 97.109 16.605 27.172 24.017 32.207 0.233 0.021 39.810 0.077 0.026 19.644 13.882 0.131 11.646 0.102 11.449 76.265 23.735 16.991 83.009 8.840 91.160 0.020 99.980 52.102 47.898 44.004 55.996 39.963 18.625 0.121 1.126 40.189... (15 Replies)
Discussion started by: am24
15 Replies

2. Shell Programming and Scripting

Field separator

Hello All, I have a file, but I want to separate the file at a particular record with comma"," in the line Input file APPLE6SSAMSUNGS5PRICEPERPIECEDOLLAR600EACH010020340URX581949695US to Output file APPLE6S,SAMSUNGS5,PRICEPERPIECE,DOLLAR600EACH,010020340URX581949695,US This is for... (11 Replies)
Discussion started by: m6248m
11 Replies

3. UNIX for Dummies Questions & Answers

change field separator only from nth field until NF

Hi ! input: 111|222|333|aaa|bbb|ccc 999|888|777|nnn|kkk 444|666|555|eee|ttt|ooo|ppp With awk, I am trying to change the FS "|" to "; " only from the 4th field until the end (the number of fields vary between records). In order to get: 111|222|333|aaa; bbb; ccc 999|888|777|nnn; kkk... (1 Reply)
Discussion started by: beca123456
1 Replies

4. Shell Programming and Scripting

how to convert comma delimited file to tab separator

Hi all, How can i convert comma delimited .csv file to tab separate using sed command or script. Thanks, Krupa (4 Replies)
Discussion started by: krupasindhu18
4 Replies

5. Shell Programming and Scripting

Field separator X'1F'

Hi, I have a flat file with fields separated by a X'1F' i have to fetch 4th field from second line. please help me how to achieve it. I tried with below command and its not working. cut -f4 -d`echo -e '\x1f'` filename.txt I am using SunOS. Thanks in advance. (2 Replies)
Discussion started by: rohan10k
2 Replies

6. Shell Programming and Scripting

awk, comma as field separator and text inside double quotes as a field.

Hi, all I need to get fields in a line that are separated by commas, some of the fields are enclosed with double quotes, and they are supposed to be treated as a single field even if there are commas inside the quotes. sample input: for this line, 5 fields are supposed to be extracted, they... (8 Replies)
Discussion started by: kevintse
8 Replies

7. Shell Programming and Scripting

sed newline to tab ,problem

Input: gstreamer-plugins-good gstreamer-plugins-bad gstreamer-plugins-ugly sed 's/\n/\t/g' infile It's not working. Output should be: gstreamer-plugins-good gstreamer-plugins-bad gstreamer-plugins-ugly (5 Replies)
Discussion started by: cola
5 Replies

8. Shell Programming and Scripting

join - using *only* tab as a field separator

I have files with tabs separating the fields but those fields values can have whitespace characters (basically a text string). I want to instruct join to not consider white spaces as separators but only tabs. I have tried: join -t "<Tab>" file1 file2 join -t "<tab>" file1 file2 join -t "\t"... (7 Replies)
Discussion started by: FrancoisCN
7 Replies

9. UNIX for Advanced & Expert Users

newline character, space and tab after a string

no problem (6 Replies)
Discussion started by: angelina
6 Replies

10. UNIX for Dummies Questions & Answers

How do I specify tab as field separator for sort?

I'm trying to use sort on a file with tab-delimited fields. I can't figure out how to tell sort to use the tab character as the field separator. I'm trying this on both an HP Unix system and on OS X (using bash on both). Things I've tried: sort -t\t sort -t"\t" sort -t\"\t\" I've tried... (8 Replies)
Discussion started by: SSteve
8 Replies
Login or Register to Ask a Question