awk to format file and combine two fields using comma


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk to format file and combine two fields using comma
# 1  
Old 03-29-2017
awk to format file and combine two fields using comma

I am trying to use awk to format the file below, which is tab-delimited. The desired out is space delimited and is in the order of
$9 $13 $2 $10-$11.$10 and $11 are often times multiple values separated by a comma, so the value in $10 is combined with the first value from
$11 using the comma.
I tried using the comma to combine each $10 with the corresponding $11 value, but am not able to. Thank you Smilie.


file
Code:
1799	NM_002036.3	chr1	+	159173802	159176290	159174749	159176240	2	159173802,159175250,	159174770,159176290,	0	ACKR1	cmpl	cmpl	0,0,

desired output [ICODE]$9 $13 $2 $10-$11[ICODE]
Code:
2 ACKR1 NM_00203.6.3 159173802-159174770 159175250-159176290

awk
Code:
awk -F'\t' -v OFS=, 'FNR==NR{a[++i]=$10; next} {print $9,$13,$2,a[FNR], "-",$11}' file > out


Last edited by cmccabe; 03-29-2017 at 10:40 AM.. Reason: fixed format
# 2  
Old 03-29-2017
Hello cmccabe,

Could you please check if following helps you.
Code:
awk -F'[[:space:]]+|,' '{print $9,$17,$2,$10-$13,$11-$14}'   Input_file

Thanks,
R. Singh
This User Gave Thanks to RavinderSingh13 For This Post:
# 3  
Old 03-29-2017
What about this?

Code:
awk -F '\t' '{split($10,s1,",");split($11,s2,","); print $9,$13,$2,s1[1]"-"s2[1],s1[2]"-"s2[2]}' file

Output:

Code:
2 ACKR1 NM_002036.3 159173802-159174770 159175250-159176290

This User Gave Thanks to andy391791 For This Post:
# 4  
Old 03-29-2017
Thank you both very much, works great Smilie.

---------- Post updated at 12:01 PM ---------- Previous update was at 09:34 AM ----------

Since my actual data can vary on length (that is have multiple commas in it), I can not seem to account for that. There will never less less than 1 comma to split on. Thank you Smilie.


file
Code:
1799	NM_002036.3	chr1	+	159173802	159176290	159174749	159176240	2	159173802,159175250,	159174770,159176290,	0	ACKR1	cmpl	cmpl	0,0,
978	NM_002443.3	chr10	+	51549552	51562518	51549584	51562400	4	51549552,51555730,51556750,51562270,	51549587,51555836,51556856,51562518,	0	MSMB	cmpl	cmpl	0,0,1,2,
732	NM_002404.2	chr17	-	19286754	19290532	19287774	19290468	6	19286754,19288411,19288670,19289622,19290072,19290462,	19288022,19288594,19288767,19289777,19290151,19290532,	0	MFAP4	cmpl	cmpl	1,1,0,1,0,0,
1629	NM_003467.2	chr2	-	136871918	136875725	136872438	136875630	1	136871918,	136873482,	0	CXCR4	cmpl	cmpl	0,0,

current output
Code:
awk -F '\t' '{split($10,s1,",");split($11,s2,","); print $9,$13,$2,s1[1]"-"s2[1],s1[2]"-"s2[2]}' file
2 ACKR1 NM_002036.3 159173802-159174770 159175250-159176290
4 MSMB NM_002443.3 51549552-51549587 51555730-51555836
6 MFAP4 NM_002404.2 19286754-19288022 19288411-19288594
1 CXCR4 NM_003467.2 136871918-136873482 -

desired output
Code:
2 ACKR1 NM_002036.3 159173802-159174770 159175250-159176290
4 MSMB NM_002443.3 51549552-51549587 51555730-51555836 51556750-51556856 51562270-51562518
6 MFAP4 NM_002404.2 19286754-19288022 19288411-19288594 19288670-19288767 19289622-19289777 19290072-19290151 19290462-19290532
1 CXCR4 NM_003467.2 136871918-136873482

# 5  
Old 03-30-2017
Im sure there are better ways to do it but the following will work assuming that the amount of comma separated values in $10 and $11 are the same:

Code:
awk -F '\t' '{sf="";len1=split($10,s1,",");split($11,s2,","); for (i=1;i<len1;i++){sf=sf s1[i]"-"s2[i]" "}print $9,$13,$2,sf}' file.txt

Output:

Code:
2 ACKR1 NM_002036.3 159173802-159174770 159175250-159176290
4 MSMB NM_002443.3 51549552-51549587 51555730-51555836 51556750-51556856 51562270-51562518
6 MFAP4 NM_002404.2 19286754-19288022 19288411-19288594 19288670-19288767 19289622-19289777 19290072-19290151 19290462-19290532
1 CXCR4 NM_003467.2 136871918-136873482

This User Gave Thanks to andy391791 For This Post:
# 6  
Old 03-30-2017
$1 has the number of commas in each $10 and $11 respectively.
The code seems to work though they differ in total, but will always be the same in $10 and $11. Thank you very muchSmilie.

Last edited by cmccabe; 03-30-2017 at 08:53 AM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

How to extract fields from a CSV i.e comma separated where some of the fields having comma as value?

can anyone help me!!!! How to I parse the CSV file file name : abc.csv (csv file) The above file containing data like abv,sfs,,hju,',',jkk wff,fst,,rgr,',',rgr ere,edf,erg,',',rgr,rgr I have a requirement like i have to extract different field and assign them into different... (4 Replies)
Discussion started by: J.Jena
4 Replies

2. Shell Programming and Scripting

awk to combine all matching fields in input but only print line with largest value in specific field

In the below I am trying to use awk to match all the $13 values in input, which is tab-delimited, that are in $1 of gene which is just a single column of text. However only the line with the greatest $9 value in input needs to be printed. So in the example below all the MECP2 and LTBP1... (0 Replies)
Discussion started by: cmccabe
0 Replies

3. Shell Programming and Scripting

awk to combine matches and use a field to adjust coordinates in other fields

Trying to output a result that uses the data from file to combine and subtract specific lines. If $4 matches in each line then the last $6 value is added to $2 and that becomes the new$3. Each matching line in combined into one with $1 then the original $2 then the new$3 then $5. For the cases... (4 Replies)
Discussion started by: cmccabe
4 Replies

4. Shell Programming and Scripting

Combine two Fields if there's a comma between

Hi All, Seeking for your assistance on how to combine two fields if there's a comma between them. What i did was, extract the records from database. file1.csv(extracted from db) 82|Supplies Station, Inc.|112012|KARISSA APPAREL, INC. - 112012|NON TRADE F/A AND... (2 Replies)
Discussion started by: znesotomayor
2 Replies

5. Shell Programming and Scripting

Awk: Combine multiple lines based on number of fields

If a file has following kind of data, comma delimited 1,2,3,4 1 1 1,2,3,4 1,2 2 2,3,4 My required output must have only 4 columns with comma delimited 1,2,3,4 111,2,3,4 1,222,3,4 I have tried many awk command using ORS="" but couldnt progress (10 Replies)
Discussion started by: mdkm
10 Replies

6. UNIX for Dummies Questions & Answers

Awk: Wondering how to format numbers with comma

I have tried the following commands and can't get a number to format with commas: echo 1234567.12 |awk '{printf("%-12s %20s\n", $0, comma($0)) }' This prints out value 50000 without a comma for i in *13*; do (cd $i && du -sk . && echo $i);done|grep -v 0000|gawk -F OFS="," ' {SUM += $1}... (8 Replies)
Discussion started by: newbie2010
8 Replies

7. Shell Programming and Scripting

awk print - fields separated with comma's need to ignore inbetween double quotes

I am trying to re-format a .csv file using awk. I have 6 fields in the .csv file. Some of the fields are enclosed in double quotes and contain comma's inside the quotes. awk is breaking this into multiple fields. Sample lines from the .csv file: Device Name,Personnel,Date,Solution... (1 Reply)
Discussion started by: jxrst
1 Replies

8. Shell Programming and Scripting

How to combine first,second and third fields in a file

Hi Guys, I have a file as follows: 1 2 3 There are 5 spaces between 1 and 2 and one space between 2 and 3. I want: 123 How can I do this using awk? I tried using: (8 Replies)
Discussion started by: npatwardhan
8 Replies

9. UNIX for Dummies Questions & Answers

Sort the fields in a comma delimited file

Hi, I have a comma delimited file. I want to sort the fields alphabetically and again store them in a comma delimited file. For example, My file looks like this. abc,aaa,xyz,xxx,def pqr,ggg,eee,iii,qqq zyx,lmo,pqr,abc,fff and I want my output to look like this, all fields sorted... (3 Replies)
Discussion started by: swethapatil
3 Replies

10. UNIX for Dummies Questions & Answers

Remove whitespaces between comma separated fields from file

Hello all, I am a unix dummy. I am trying to remove spaces between fields. I have the file in the following format 12332432, 2345 , asdfsdf ,100216 , 9999999 12332431, 2341 , asdfsd2 ,100213 , 9999999 &... (2 Replies)
Discussion started by: nitinbjoshi
2 Replies
Login or Register to Ask a Question