Transposing data based on 1st column


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Transposing data based on 1st column
# 1  
Old 08-11-2016
Transposing data based on 1st column

I do have a big tab delimited file of the following format
Code:
aa 344 456
aa 34 67
bb 34 90
bb 23 100
bb 1 89
d 0 12
e 45 678
e 78 90
e 56 90
....
....
....

I would like to transpose the data based on the category on column one and get the output file in the following tab delimited format format:

Code:
a 344-456, 34-67
b 34-90,23-100,1-89
d 0-12
e 45-678,78-90,56-90

Please let me know the best way to do this using awk or sed
# 2  
Old 08-11-2016
Hello kanja,

If you don't bother of the sequence of the output like how it is in Input_file then following may help you in same.
Code:
awk '{A[$1]=A[$1]?A[$1] OFS $2"-"$3:$2"-"$3} END{for(i in A){print i FS A[i]}}' OFS=", "   Input_file

Output will be as follows.
Code:
bb 34-90, 23-100, 1-89
d 0-12
e 45-678, 78-90, 56-90
aa 344-456, 34-67

If you want output in same sequence as Input_file then following may help you in same.
Code:
awk 'FNR==NR{A[$1]=A[$1]?A[$1] OFS $2"-"$3:$2"-"$3;next} ($1 in A){print $1 FS A[$1];delete A[$1]}' OFS=", "  Input_file  Input_file

Output will be as follows.
Code:
aa 344-456, 34-67
bb 34-90, 23-100, 1-89
d 0-12
e 45-678, 78-90, 56-90

Thanks,
R. Singh
# 3  
Old 08-11-2016
Code:
awk '{i=substr($0,1,1) ;v[i]=v[i]?v[i]","$2"-"$3:$2"-"$3;} END{ for (i in v) print i"\t"v[i]}' input_file


Last edited by stomp; 08-11-2016 at 05:55 PM.. Reason: replaced space with TAB in output
# 4  
Old 08-11-2016
You said that your input file has <tab> delimited fields, but the sample input and output you provided is <space> delimited.

If your input file has all records for a given first character of the first field on adjacent lines (as in your example), you could try this simpler approach which uses less memory (and, therefore, should run a little faster) and keeps the output order the same as the input order. It assumes that you want a <tab> separating fields in the output, but I assume you can see how to change that to a <space> if that is what you want:
Code:
awk '
last != substr($1, 1, 1) {
	if(NR > 1) print ""
	last = substr($1, 1, 1)
	printf("%s\t%s-%s", last, $2, $3)
	next
}
{	printf(",%s-%s", $2, $3)
}
END {	if(NR > 0) print ""
}' file

which, with your sample input, produces the output:
Code:
a	344-456,34-67
b	34-90,23-100,1-89
d	0-12
e	45-678,78-90,56-90

For any of the awk scripts suggests in this thread, if you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk.
# 5  
Old 08-15-2016
Thank you all

I see that you all assume that the first column only contains characters. If it had also some numbers along with the character in the first column , how do I modify the awk command?. Also the input file is tab-delimited.

for example:
Code:
aa123 344 456
aa123 34 67
bb34 34 90
bb34 23 100
bb34 1 89
d3 0 12
e55 45 678
e55 78 90
e55 56 90
....
....
....

# 6  
Old 08-15-2016
@Kanja

No assumptions have been made - only answers based upon your input data/expected output
This User Gave Thanks to pilnet101 For This Post:
# 7  
Old 08-15-2016
Your sample data in post #1 in this thread explicitly showed that no matter how many characters were in field 1 in your input file, you only wanted the 1st character of field 1 to appear in your output file. Have you now changed you mind on which lines are to be grouped together??? If so, please explicitly state your new requirements.

Also note that you say your input file is <tab> delimited, but every sample file you have shown us is delimited by a single <space> character; not a <tab>.

Last edited by Don Cragun; 08-15-2016 at 01:26 PM.. Reason: Fix typo: s/tabb/tab/
This User Gave Thanks to Don Cragun For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

To append new data at the end of each line based on substring of last column

Hi guys, I need to append new data at the end of each line of the files. This new data is based on substring (3rd fields) of last column. Input file xxx.csv: U1234|1-5X|orange|1-5X|Act|1-5X|0.1 /sac/orange 12345 0 U5678|1-7X|grape|1-7X|Act|1-7X|0.1 /sac/grape 5678 0... (5 Replies)
Discussion started by: null7
5 Replies

2. Shell Programming and Scripting

Inserting column data based on category assignment

please help with the following. I have 4 col data .. instrument , category, variable and value. the instruments belong to particular categories and they all measure some variables (var1 and var2 in this example), the last column is the value an instrument outputs for a variable. I have used... (0 Replies)
Discussion started by: ritakadm
0 Replies

3. Shell Programming and Scripting

Sum column values based in common identifier in 1st column.

Hi, I have a table to be imported for R as matrix or data.frame but I first need to edit it because I've got several lines with the same identifier (1st column), so I want to sum the each column (2nd -nth) of each identifier (1st column) The input is for example, after sorted: K00001 1 1 4 3... (8 Replies)
Discussion started by: sargotrons
8 Replies

4. Shell Programming and Scripting

Generate tabular data based on a column value from an existing data file

Hi, I have a data file with : 01/28/2012,1,1,98995 01/28/2012,1,2,7195 01/29/2012,1,1,98995 01/29/2012,1,2,7195 01/30/2012,1,1,98896 01/30/2012,1,2,7083 01/31/2012,1,1,98896 01/31/2012,1,2,7083 02/01/2012,1,1,98896 02/01/2012,1,2,7083 02/02/2012,1,1,98899 02/02/2012,1,2,7083 I... (1 Reply)
Discussion started by: himanish
1 Replies

5. Shell Programming and Scripting

Calculate 2nd Column Based on 1st Column

Dear All, I have input file like this. input.txt CE2_12-15 3950.00 589221.0 9849709.0 768.0 CE2_12_2012 CE2_12-15 3949.00 589199.0 9849721.0 768.0 CE2_12_2012 CE2_12-15 3948.00 589178.0 9849734.0 768.0 CE2_12_2012 CE2_12-52 1157.00 ... (3 Replies)
Discussion started by: attila
3 Replies

6. Shell Programming and Scripting

Help newbie: transposing column into row (pivot)

Hi, I have a file in this format. Name | organization John | INT Abby| DOM John | DOM John | MIX Jason | INT Anna | DOM Abby |MIX I want the output to look this. Name | organization John | INT, DOM, MIX Abby | DOM, MIX Jason | INT Anna | DOM (5 Replies)
Discussion started by: sirrtuan
5 Replies

7. Shell Programming and Scripting

Help with analysis data based on particular column content

Input file: Total_counts 1306726155 100% Number_of_count_true 855020282 Number_of_count_true_1 160014283 Number_of_count_true_2 44002825 Number_of_count_true_3 18098424 Number_of_count_true_4 24693745 Number_of_count_false 115421870 Number_of_count_true 51048447 Total_number_of_false ... (2 Replies)
Discussion started by: perl_beginner
2 Replies

8. Shell Programming and Scripting

Remove duplicate line detail based on column one data

My input file: AVI.out <detail>named as the RRM .</detail> AVI.out <detail>Contains 1 RRM .</detail> AR0.out <detail>named as the tellurite-resistance.</detail> AWG.out <detail>Contains 2 HTH .</detail> ADV.out <detail>named as the DENR family.</detail> ADV.out ... (10 Replies)
Discussion started by: patrick87
10 Replies

9. Shell Programming and Scripting

Extract data based on match against one column data from a long list data

My input file: data_5 Ali 422 2.00E-45 102/253 140/253 24 data_3 Abu 202 60.00E-45 12/23 140/23 28 data_1 Ahmad 256 7.00E-45 120/235 140/235 22 data_4 Aman 365 8.00E-45 15/65 140/65 20 data_10 Jones 869 9.00E-45 65/253 140/253 18... (12 Replies)
Discussion started by: patrick87
12 Replies

10. UNIX for Dummies Questions & Answers

Transposing data output

Hi, I've just created a shell script that produces the following output: hd1 hd3 hd9 /optnonaix/esp /optnonaix/app/oracle /u06 (564.67) (675.97) (678.90) I would like the output to be as hd1 /optnonaix/esp (564.67) hd3 /optnonaix/app/oracle (675.97) hd9 /u06 (678.90) Need some... (2 Replies)
Discussion started by: bazzabogan
2 Replies
Login or Register to Ask a Question