Linux - Pivot Rows to Columns


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Linux - Pivot Rows to Columns
# 1  
Old 01-22-2016
Linux - Pivot Rows to Columns

Morning All,

I am wanting to try and Pivot a set of data which is currently in a row format, to a column format. This will potentially need to run over a large dataset, therefore I am thinking awk may be the most efficient solution?

I would like to Pivot the data around Col1 & Col2 which are consistent, to then pivot the values in Col3 and Col4 into new columns.

Below is some example input data, and example output data which I would like to try and achieve:

Input Data

Code:
Col1 Col2     Col3      Col4
ABC  00000001 15-Dec-15 13000
ABC  00000001 31-Jan-16 13500
ABC  00000001 29-Feb-16 13700
ABC  00000001 31-May-16 14000
ABC  00000001 31-Aug-16 40000

Desired Output

Code:
Col1 Col2     Col3     Col4  Col5      Col6  Col7      Col8  Col9      Col10 Col11     Col12
ABC 00000001 15-Dec-15 13000 31-Jan-16 13500 29-Feb-16 13700 31-May-16 14000 31-Aug-16 40000

Many Thanks.

Last edited by RichZR; 01-22-2016 at 07:37 AM..
# 2  
Old 01-22-2016
Hello RichZR,

Could you please try following and let me know if this helps you.
Code:
awk 'NR>1{A[$1 OFS $2]=A[$1 OFS $2]? A[$1 OFS $2] OFS $3 OFS $4:$3 OFS $4 OFS;} END{for(u in A){num+=gsub(/[[:space:]]/," ",A[u]);};for(u=1;u<=num+2;u++){Q=Q?Q OFS "col" u:"col"u};print Q;for(i in A){print i OFS A[i]}}' OFS="\t"  Input_file

Output will be as follows.
Code:
col1    col2    col3    col4    col5    col6    col7    col8    col9    col10   col11   col12
ABC     00000001        15-Dec-15 13000  31-Jan-16 13500 29-Feb-16 13700 31-May-16 14000 31-Aug-16 40000

EDIT: Adding a non one-liner form of solution now.
Code:
awk 'NR>1 {
                A[$1 OFS $2]=A[$1 OFS $2]? A[$1 OFS $2] OFS $3 OFS $4:$3 OFS $4 OFS;
          }
     END  {
                for(u in A){
                                num+=gsub(/[[:space:]]/," ",A[u]);
                           };
                for(u=1;u<=num+2;u++){
                                        Q=Q?Q OFS "col" u:"col"u
                                     };
                print Q;
                for(i in A)          {
                                        print i OFS A[i]
                                     }
          }
    ' OFS="\t"   Input_file

Thanks,
R. Singh

Last edited by RavinderSingh13; 01-22-2016 at 09:37 AM.. Reason: Adding a non one-liner form of solution now.
# 3  
Old 01-22-2016
How is your input set of data identified/structured? Changing key ($1 and $2)? Always five lines? What if more or less data items are present?
# 4  
Old 01-22-2016
Quote:
Originally Posted by RudiC
How is your input set of data identified/structured? Changing key ($1 and $2)? Always five lines? What if more or less data items are present?
Hi Rudi,

My input data will be structured as follows:

Code:
Col1 Col2     Col3      Col4
ABC  00000001 15-Dec-15 13000
ABC  00000001 31-Jan-16 13500
ABC  00000001 29-Feb-16 13700
ABC  00000001 31-May-16 14000
ABC  00000001 31-Aug-16 40000
 ABC  00000002 15-Dec-15 13000
ABC  00000002 31-Jan-16 13500
ABC  00000002 29-Feb-16 13700
ABC  00000002 31-May-16 14000
ABC  00000002 31-Aug-16 40000

Therefore whilst Col2 is the same value from row to row, I would like it to pivot, and then upon Col2 value changing, start to create a new row.

Hope that clarifies?

---------- Post updated at 04:15 PM ---------- Previous update was at 04:05 PM ----------

Quote:
Originally Posted by RavinderSingh13
Hello RichZR,

Could you please try following and let me know if this helps you.
Code:
awk 'NR>1{A[$1 OFS $2]=A[$1 OFS $2]? A[$1 OFS $2] OFS $3 OFS $4:$3 OFS $4 OFS;} END{for(u in A){num+=gsub(/[[:space:]]/," ",A[u]);};for(u=1;u<=num+2;u++){Q=Q?Q OFS "col" u:"col"u};print Q;for(i in A){print i OFS A[i]}}' OFS="\t"  Input_file

Output will be as follows.
Code:
col1    col2    col3    col4    col5    col6    col7    col8    col9    col10   col11   col12
ABC     00000001        15-Dec-15 13000  31-Jan-16 13500 29-Feb-16 13700 31-May-16 14000 31-Aug-16 40000

EDIT: Adding a non one-liner form of solution now.
Code:
awk 'NR>1 {
                A[$1 OFS $2]=A[$1 OFS $2]? A[$1 OFS $2] OFS $3 OFS $4:$3 OFS $4 OFS;
          }
     END  {
                for(u in A){
                                num+=gsub(/[[:space:]]/," ",A[u]);
                           };
                for(u=1;u<=num+2;u++){
                                        Q=Q?Q OFS "col" u:"col"u
                                     };
                print Q;
                for(i in A)          {
                                        print i OFS A[i]
                                     }
          }
    ' OFS="\t"   Input_file

Thanks,
R. Singh
Hi Ravinder,

Thanks for the above.

This looks like a near complete solution. A few queries:

1. Which part of the code would I need to amend if my data file was to be comma separated?
2. When I am running the command, I am getting the heading upto col20. Is there a way to limit this to only the number of columns I have?

Thanks in advance.

Last edited by Don Cragun; 01-22-2016 at 11:22 PM.. Reason: Add CODE tags again.
# 5  
Old 01-22-2016
Try
Code:
awk '
NR == 1         {print "Col1\tCol2\tCol3\tCol4\tCol5\tCol6\tCol7\tCol8\tCol9\tCol10\tCol110\tCol12"
                 next
                }
!((NR-2)%5)     {printf "%s%s", SEP, $0
                 SEP = ORS
                 next
                }
                {sub ($1 "[     ]*" $2 "[       ]*", "")
                 printf "\t%s", $0
                }
END             {print ""
                }
'  file
Col1    Col2    Col3    Col4    Col5    Col6    Col7    Col8    Col9    Col10    Col110    Col12
ABC 00000001 15-Dec-15 13000    31-Jan-16 13500    29-Feb-16 13700    31-May-16 14000    31-Aug-16 40000
ABC 00000002 15-Dec-15 13000    31-Jan-16 13500    29-Feb-16 13700    31-May-16 14000    31-Aug-16 40000

---------- Post updated at 17:18 ---------- Previous update was at 17:16 ----------

Or,
Code:
awk '
NR == 1         {print "Col1\tCol2\tCol3\tCol4\tCol5\tCol6\tCol7\tCol8\tCol9\tCol10\tCol110\tCol12"
                 next
                }
                {IX = $1 FS $2
                }
IX != LAST      {printf "%s%s", SEP, $0
                 LAST = IX
                 SEP = ORS
                 next
                }
                {sub ($1 "[     ]*" $2 "[       ]*", "")
                 printf "\t%s", $0
                }
END             {print ""
                }
' file

# 6  
Old 01-22-2016
Quote:
Originally Posted by RudiC
Try
Code:
awk '
NR == 1         {print "Col1\tCol2\tCol3\tCol4\tCol5\tCol6\tCol7\tCol8\tCol9\tCol10\tCol110\tCol12"
                 next
                }
!((NR-2)%5)     {printf "%s%s", SEP, $0
                 SEP = ORS
                 next
                }
                {sub ($1 "[     ]*" $2 "[       ]*", "")
                 printf "\t%s", $0
                }
END             {print ""
                }
'  file
Col1    Col2    Col3    Col4    Col5    Col6    Col7    Col8    Col9    Col10    Col110    Col12
ABC 00000001 15-Dec-15 13000    31-Jan-16 13500    29-Feb-16 13700    31-May-16 14000    31-Aug-16 40000
ABC 00000002 15-Dec-15 13000    31-Jan-16 13500    29-Feb-16 13700    31-May-16 14000    31-Aug-16 40000

---------- Post updated at 17:18 ---------- Previous update was at 17:16 ----------

Or,
Code:
awk '
NR == 1         {print "Col1\tCol2\tCol3\tCol4\tCol5\tCol6\tCol7\tCol8\tCol9\tCol10\tCol110\tCol12"
                 next
                }
                {IX = $1 FS $2
                }
IX != LAST      {printf "%s%s", SEP, $0
                 LAST = IX
                 SEP = ORS
                 next
                }
                {sub ($1 "[     ]*" $2 "[       ]*", "")
                 printf "\t%s", $0
                }
END             {print ""
                }
' file

Hi Rudi,

Thanks for the above.

I have given this a try, and it is nearly there.

One thing I have noticed is that for the first record, it is not processing the first line correctly and missing this value (15-Dec-15), however it is working fine for the 00000002 combination:

Code:
 Col1    Col2    Col3    Col4    Col5    Col6    Col7    Col8    Col9    Col10   Col110  Col12
ABC 00000001 31-Jan-16 13500    29-Feb-16 13700 31-May-16 14000 31-Aug-16 40000
ABC 00000002 15-Dec-15 13000    31-Jan-16 13500 29-Feb-16 13700 31-May-16 14000 31-Aug-16 40000

Where abouts in the code would I specify that my data file is comma separated?

Kind Regards.

Last edited by Don Cragun; 01-22-2016 at 11:23 PM.. Reason: Add CODE tags again.
# 7  
Old 01-22-2016
Be assured that it is working on a well-formed text file. Did you examine your input file is correctly structured?

You can set the field separator by either the -F, option in front of the "script" parameter or the variable definition FS=","
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Rows to columns

Hi, I have a text file with records as below Service Contract: Account Type: Client Number: Group Number: Account Currency: I want to print 2nd,3rd and 5th as a separate column, like -> Account Type: ,Client Number: ,Account Currency: How can I do that? (1 Reply)
Discussion started by: dsid
1 Replies

2. Shell Programming and Scripting

Pivot Rows to Columns, with field separator

Hi All, I have a requirement to flatten data out, based on the value in COL_2. Our file is pipe delimited, however COL_2 contains a comma separated string, which we would like to pivot out from one row into multiple rows. Please see my example input data below: Input Data ... (4 Replies)
Discussion started by: RichZR
4 Replies

3. Shell Programming and Scripting

Compare 2 csv files by columns, then extract certain columns of matcing rows

Hi all, I'm pretty much a newbie to UNIX. I would appreciate any help with UNIX coding on comparing two large csv files (greater than 10 GB in size), and output a file with matching columns. I want to compare file1 and file2 by 'id' and 'chain' columns, then extract exact matching rows'... (5 Replies)
Discussion started by: bkane3
5 Replies

4. Shell Programming and Scripting

Columns to rows

Hi, I have a big file, with thousands of rows, and I want to put every 7 rows in a line. Input file: str1, val2, val3 str2, val4, val5 str3, val22, val33 str4, val44, val55 str5, val6, val7 str6, val77, val88 str7, val99, val00 str1, som2, som3 str2, som4, som5 str3, som22, som33 ... (11 Replies)
Discussion started by: apenkov
11 Replies

5. Shell Programming and Scripting

Evaluate 2 columns, add sum IF two columns match on two rows

Hi all, I know this sounds suspiciously like a homework course; but, it is not. My goal is to take a file, and match my "ID" column to the "Date" column, if those conditions are true, add the total number of minutes worked and place it in this file, while not printing the original rows that I... (6 Replies)
Discussion started by: mtucker6784
6 Replies

6. Shell Programming and Scripting

Deleting all the fields(columns) from a .csv file if all rows in that columns are blanks

Hi Friends, I have come across some files where some of the columns don not have data. Key, Data1,Data2,Data3,Data4,Data5 A,5,6,,10,, A,3,4,,3,, B,1,,4,5,, B,2,,3,4,, If we see the above data on Data5 column do not have any row got filled. So remove only that column(Here Data5) and... (4 Replies)
Discussion started by: ks_reddy
4 Replies

7. Shell Programming and Scripting

Rows to Columns

Hi Guru's, I have a requirement where i need to convert rows to column based on a key column. Input: Account_id|Trip_Org|Trip_Dest|City|Hotel_Nm 123|DFW|CHI|Dallas|Hyatt 123|LAS|LPA|Vegas|Hyatt Palace Output:... (6 Replies)
Discussion started by: rakesh5300
6 Replies

8. Windows & DOS: Issues & Discussions

Columns to Rows

I want to create a script with gawk. I have the following file with 2 columns: A 1 A 2 A 3 B 1 B 2 B 3 C 1 C 2 D 1 D 2 D 3 D 4 and i want to convert to: (1 Reply)
Discussion started by: sameeribraimo
1 Replies

9. Shell Programming and Scripting

Rows into columns?

I have a file thats space delimited that looks something like this: Joe Smith jsmith 43234 bill1;bill2;read;read2;schedule Andy Summers asummers 11232 bill1;read Beth McConnel bmconnel 43443 read;read2;schedule;bill Susan Fowler sfowler 09332 bill1;read;schedule I need to... (8 Replies)
Discussion started by: regexnub
8 Replies

10. Shell Programming and Scripting

# of rows and columns

Hi, Does anyone know the command to know the # of rows and columns for a file? thanks (3 Replies)
Discussion started by: kylle345
3 Replies
Login or Register to Ask a Question