awk to split file using multiple deliminators


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk to split file using multiple deliminators
# 1  
Old 01-04-2017
awk to split file using multiple deliminators

I am trying to use awk to split a input file using multiple delimiters :-|. The input file is just one field and the output is 6 tab-delimited fields.

The awk below does run and works as expected until I add the third delimiter |, which gives the current output below. I am not sure what is wrong. Thank you Smilie.

input
Code:
chr1:1013574-1013576|ISG15	
chr1:1013984-1014478|ISG15
chr1:1020163-1020383|AGRN

Code:
awk -F'[:-|]' '{print $1 "\t" $2 "\t" $3 "\t" $1 ":" $2 "-" $3 "\t" "." "\t" $4}' input

desired output
Code:
chr1	1013574	1013576	chr1:1013574-1013576	.	ISG15	
chr1	1013984	1014478	chr1:1013984-1014478	.	ISG15
chr1	1020163	1020383	chr1:1020163-1020383	.	AGRN

current output
Code:
	
	                .	1
			.	1
			.	1


Last edited by cmccabe; 01-04-2017 at 11:07 AM.. Reason: fixed format
# 2  
Old 01-04-2017
Hello cmccabe,

Could you please try following and let me know if this helps you(should work if your Input_file is similar as shown sample).
Code:
awk -F"[:|-]" '{print $1 "\t" $2 OFS $3 OFS $1":"$2"-"$3 "\t" "." "\t" $NF}'  Input_file

Thanks,
R. Singh
This User Gave Thanks to RavinderSingh13 For This Post:
# 3  
Old 01-04-2017
Hi,

Code:
awk -F"[|:-]" ' { print $1 "\t" $2 "\t" $3 "\t" $1 ":" $2 "-" $3 "\t" "." "\t" $4}' file

Gives the desired output:
Quote:
chr1 1013574 1013576 chr1:1013574-1013576 . ISG14
chr1 1013984 1014478 chr1:1013984-1014478 . ISG15
chr1 1020163 1020383 chr1:1020163-1020383 . AGRN
This User Gave Thanks to greet_sed For This Post:
# 4  
Old 01-04-2017
I added the output of both commands and seem to be having trouble with the | symbol in the input file. I tried to split on the | that and got an empty output. Thank you Smilie.

Code:
awk -F"[:|-]" '{print $1 "\t" $2 OFS $3 OFS $1":"$2"-"$3 "\t" "." "\t" $NF}' file > output

Code:
chr1    1013574 1013576;ISG15     chr1:1013574-1013576;ISG15        .    1013576;ISG15    
chr1    1013984 1014478;ISG15 chr1:1013984-1014478;ISG15    .    1014478;ISG15
chr1    1020163 1020383;AGRN chr1:1020163-1020383;AGRN    .    1020383;AGRN

Code:
awk -F"[|:-]" ' { print $1 "\t" $2 "\t" $3 "\t" $1 ":" $2 "-" $3 "\t" "." "\t" $4}' file > output2

Code:
chr1    1013574    1013576;ISG15        chr1:1013574-1013576;ISG15        .    
chr1    1013984    1014478;ISG15    chr1:1013984-1014478;ISG15    .    
chr1    1020163    1020383;AGRN    chr1:1020163-1020383;AGRN    .


Last edited by cmccabe; 01-04-2017 at 11:40 AM.. Reason: added details
# 5  
Old 01-04-2017
Hello cmccabe,

Sorry didn't get you, is your output requirement is not the very first post one? As I could see your second post is having different output expected, kindly enlighten us all for this.

Thanks,
R. Singh
This User Gave Thanks to RavinderSingh13 For This Post:
# 6  
Old 01-04-2017
The desired output in post 1 is correct for some reason I am not getting that though. Maybe the original input file is corrupt, I will check. Thank you very much Smilie.

---------- Post updated at 09:50 AM ---------- Previous update was at 09:44 AM ----------

Both commands work great, sorry I had a space in the file that was causing an issue. Thank you Smilie.
# 7  
Old 01-04-2017
sed
Code:
sed -E  's/(\w+):([0-9]+)-([0-9]+)\|(.*)/\1\t\2\t\3\t\1:\2-\3\t.\t\4/' file

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

awk to split field twice using two deliminators

In the awk I am splitting on the : into array a, then splitting on the - into element b. I can not seem to duplicate b if there is no - after it. Lines 1,2,4 are examples. If there is a - after the number in b then the value to the right of it is $3 in the ouput. Thank you :). awk... (2 Replies)
Discussion started by: cmccabe
2 Replies

2. Shell Programming and Scripting

Split file into multiple files using awk

I have following file: FHEAD0000000001RTLG20161205110959201612055019 THEAD...... TCUST..... TITEM.... TTEND... TTAIL... THEAD...... TCUST..... TITEM.... TITEM..... TTEND... TTAIL... FTAIL<number of lines in file- 10 digits;prefix 0><number of lines in file-2 - 10 digits- perfix 0>... (6 Replies)
Discussion started by: amitdaf
6 Replies

3. Shell Programming and Scripting

Split a big file into multiple files using awk

this thread is a continuation from previous thread https://www.unix.com/shell-programming-and-scripting/223901-split-big-file-into-multiple-files-based-first-four-characters.html ..I am using awk to split file and I have a syntax error while executing the below code I am using AIX 7.2... (4 Replies)
Discussion started by: etldev
4 Replies

4. Shell Programming and Scripting

awk script to split file into multiple files based on many columns

So I have a space delimited file that I'd like to split into multiple files based on multiple column values. This is what my data looks like 1bc9A02 1 10 1000 FTDLNLVQALRQFLWSFRLPGEAQKIDRMMEAFAQRYCQCNNGVFQSTDTCYVLSFAIIMLNTSLHNPNVKDKPTVERFIAMNRGINDGGDLPEELLRNLYESIKNEPFKIPELEHHHHHH 1ku1A02 1 10... (9 Replies)
Discussion started by: viored
9 Replies

5. Shell Programming and Scripting

How to split file into multiple files using awk based on 1 field in the file?

Good day all I need some helps, say that I have data like below, each field separated by a tab DATE NAME ADDRESS 15/7/2012 LX a.b.c 15/7/2012 LX1 a.b.c 16/7/2012 AB a.b.c 16/7/2012 AB2 a.b.c 15/7/2012 LX2 a.b.c... (2 Replies)
Discussion started by: alexyyw
2 Replies

6. Shell Programming and Scripting

Awk multiple deliminators

I'm going through a list of files CLINK_0.fits CLINK_1.fits ... CLINK_11.fits and I want to grab the number. Since the number goes from single to double digits, I can't use fix widths. Currently, I'm using an ugly work around of echo $x | awk -F_ '{print $2}' | awk -F. '{print $1}' but I... (1 Reply)
Discussion started by: protargol
1 Replies

7. Shell Programming and Scripting

Split file into multiple files

Hi I have a file that has multiple sequences; the sequence name is the line starting with '>'. It looks like below: infile.txt: >HE_ER tttggtgccttgactcggattgggggacctcccttgggagatcaatcccctgtcctcctgctctttgctc cgtgaaaaggatccacctatgacctctagtcctcagacccaccagcccaaggaacatctcaccaatttca >M7B_Ho_sap... (2 Replies)
Discussion started by: jdhahbi
2 Replies

8. Shell Programming and Scripting

Split line to multiple files Awk/Sed/Shell Script help

Hi, I need help to split lines from a file into multiple files. my input look like this: 13 23 45 45 6 7 33 44 55 66 7 13 34 5 6 7 87 45 7 8 8 9 13 44 55 66 77 8 44 66 88 99 6 I want to split every 3 lines from this file to be written to individual files. (3 Replies)
Discussion started by: saint2006
3 Replies

9. UNIX for Dummies Questions & Answers

split a file into multiple files

Hi All, I have a file ABC.txt and I need to split this file on every 250 rows. And the file name should be ABC1.txt , ABC2.txt and so on. I tried with split command split -l 250 <filename> '<filename>' but the file name returned was ABC.txtaa ABC.txtab. Please... (8 Replies)
Discussion started by: kumar66
8 Replies

10. UNIX for Dummies Questions & Answers

Split a file with no pattern -- Split, Csplit, Awk

I have gone through all the threads in the forum and tested out different things. I am trying to split a 3GB file into multiple files. Some files are even larger than this. For example: split -l 3000000 filename.txt This is very slow and it splits the file with 3 million records in each... (10 Replies)
Discussion started by: madhunk
10 Replies
Login or Register to Ask a Question