Pivot Column using awk


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Pivot Column using awk
# 1  
Old 01-27-2015
Pivot Column using awk

Hello everyone

I have a 20M file which is having the below sample layout

Code:
1111,ABC,100
1111,DEF,200
1111,XYZ,300
4444,LMN,100
4444,AKH,500
4444,WCD,400
2222,ABC,100
7777,DEF,300
7777,WCD,300

I need to covert this to below format
Output

Code:
Party_ID|ABC|AKH|DEF|LMN|WCD|XYZ
1111|100|0|200|0|0|300
2222|100|0|0|0|0|0
4444|0|500|0|100|400|0
7777|0|0|300|0|300|0

How can I make this through UNIX and it should be pretty faster.My current script take huge time to pivot for 20M records

Last edited by Don Cragun; 01-27-2015 at 04:25 AM.. Reason: Add CODE tags.
# 2  
Old 01-27-2015
Can you post your script
This User Gave Thanks to anbu23 For This Post:
# 3  
Old 01-27-2015
In addition to what anbu23 has already requested, please also tell us some more information:
  1. Are the input file field2 values always the same, or do the headings in the output file have to be determined from the input file?
  2. If the headings have to be determined from the input, do the headings have to be sorted alphabetically?
  3. Are all input records for a given Party_ID adjacent in your input files (as they are in your sample input)?
  4. Are Party_ID values all numeric?
  5. Does the output need to be sorted by Party_ID, or can the output be in the same order as your input data?
# 4  
Old 01-27-2015
Try this for a starting point:
Code:
awk     '       {LN[$1]; HD[$2]; MX[$1,$2]=$3}
         END    {               printf "%10s", ""; for (i in HD) printf "%10s", i; print "";
                 for (j in LN) {printf "%10s",j;   for (i in HD) printf "%10s", MX[j,i]; print ""}
                }
        ' FS=, file
                 WCD       AKH       DEF       LMN       XYZ       ABC
      4444       400       500                 100                    
      2222                                                         100
      7777       300                 300                              
      1111                           200                 300       100

This User Gave Thanks to RudiC For This Post:
# 5  
Old 01-27-2015
This is my script and its too slow

Code:
cat File|awk -F',' '{print $1}' |sort -u >UniqueIds
cat File|awk -F',' '{print $2}' |sort -u >UniqueColumns


Counter=1
cat UniqueIds|
{
while read Line
do

cat UniqueColumns|
{
while read Col 
do
if [[ $(grep $Line File|grep $Col) == "" ]];then
echo $Line,$Col,0 >>File
fi
done
}

COL_LIST=$(cat File|grep $Line|sort -t',' -k2|awk -F',' '{print $2}' ORS='|'|awk -F'|' '{NF=NF-1;$1=$1}1' OFS="|")
VALUE_LIST=$(cat File|grep $Line|sort -t',' -k2|awk -F',' '{print $3}' ORS='|'|awk -F'|' '{NF=NF-1;$1=$1}1' OFS="|")
if [[ $Counter == 1 ]];then
echo Party_ID"|"$COL_LIST
fi
echo $Line"|"$VALUE_LIST
((Counter=Counter+1))
done
}
rm UniqueIds UniqueColumns

---------- Post updated at 10:18 AM ---------- Previous update was at 10:17 AM ----------

Quote:
Originally Posted by Don Cragun
In addition to what anbu23 has already requested, please also tell us some more information:
  1. Are the input file field2 values always the same, or do the headings in the output file have to be determined from the input file?
  2. If the headings have to be determined from the input, do the headings have to be sorted alphabetically?
  3. Are all input records for a given Party_ID adjacent in your input files (as they are in your sample input)?
  4. Are Party_ID values all numeric?
  5. Does the output need to be sorted by Party_ID, or can the output be in the same order as your input data?

Done . I send my script. I need to sort the second column

---------- Post updated at 03:10 PM ---------- Previous update was at 10:18 AM ----------

GREAT THANKS Rudi

My Script takes 3 minute for processing 30Million records Smilie .I modified little bit and here is the one

Code:
awk   '{LN[$1]; HD[$2]; MX[$1,$2]=$3}
END{
printf "%s", "PARTY_ID"; for (i in HD) printf "|%s", i; print "";
for (j in LN) {printf "%s",j;
for (i in HD) printf "|%s", MX[j,i]; print ""}
}
' FS=, file

---------- Post updated at 03:20 PM ---------- Previous update was at 03:10 PM ----------

I have made small modification to print zero too

Code:
awk   '{LN[$1]; HD[$2]; MX[$1,$2]=$3}
END{
printf "%s", "PARTY_ID"; for (i in HD) printf "|%s", i; print "";
for (j in LN) 
{
printf "%s",j;
for (i in HD) 
if (MX[j,i] =="")
  printf "|%s",0
else 
  printf "|%s", MX[j,i]; 
  print ""
}
}' FS=, file


Last edited by Don Cragun; 01-27-2015 at 06:10 PM.. Reason: Add CODE tags again.
# 6  
Old 01-27-2015
A little trick enforces a NULL value to be casted to an integer, giving 0:
Code:
printf "|%s", MX[j,i]+0

Or printf can specify an integer or floating point in its format argument:
Code:
printf "|%d", MX[j,i]

Code:
printf "|%.f", MX[j,i]

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk script to append suffix to column when column has duplicated values

Please help me to get required output for both scenario 1 and scenario 2 and need separate code for both scenario 1 and scenario 2 Scenario 1 i need to do below changes only when column1 is CR and column3 has duplicates rows/values. This inputfile can contain 100 of this duplicated rows of... (1 Reply)
Discussion started by: as7951
1 Replies

2. Shell Programming and Scripting

awk to Sum columns when other column has duplicates and append one column value to another with Care

Hi Experts, Please bear with me, i need help I am learning AWk and stuck up in one issue. First point : I want to sum up column value for column 7, 9, 11,13 and column15 if rows in column 5 are duplicates.No action to be taken for rows where value in column 5 is unique. Second point : For... (1 Reply)
Discussion started by: as7951
1 Replies

3. Shell Programming and Scripting

Awk- Pivot Table Averages

Hi everyone, Has anyone figured out yet how to do pivot table averages using AWK. I didn't see anything with regards to doing averages. For example, suppose you have the following table with various individuals and their scores in round1 and round2: SAMPLE SCORE1 SCORE2 British ... (6 Replies)
Discussion started by: Geneanalyst
6 Replies

4. Shell Programming and Scripting

Pivot data using awk

Hi My Input is like below DELETE|MPI|AUD_UPD_AGENT|MPISYS INSERT|MPI|AUD_UPD_AGENT|MPISYS SELECT|MPI|AUD_UPD_AGENT|MPISYS UPDATE|MPI|AUD_UPD_AGENT|MPISYS DELETE|MPI|BDYMOD|MPISYS INSERT|MPI|BDYMOD|MPISYS SELECT|MPI|BDYMOD|MPISYS UPDATE|MPI|BDYMOD|MPISYS DELETE|MPI|BDYMOD_DESC|MPISYS... (4 Replies)
Discussion started by: dineshaila
4 Replies

5. Shell Programming and Scripting

Problems with awk (fatal error) and paste (two variables into one column-by-column)

Hello, I have a script extracting columns of useful numbers from a data file, and manipulating the numbers with awk commands. I have problems with my script... 1. There are two lines assigning numbers to $BaseForAveraging. If I use the commented line (the first one) and let the second one... (9 Replies)
Discussion started by: vgbraymond
9 Replies

6. Shell Programming and Scripting

Pivot using awk

Hi, I am writing a code to basically pivot the data. awk -v var1="" -v var2="" -v var3="" -v var4="" -v var5="" -v Disp=0\ 'BEGIN {FS=":"; OFS="|";}\ /^Pattern1/ {var1=$2;Disp=0;} \ /^Pattern2/ {var2=$2;} \ /^Pattern3/ {var3=$2;} \ /^Pattern4/ {var4=$2;} \ /^Pattern5/... (5 Replies)
Discussion started by: tostay2003
5 Replies

7. Shell Programming and Scripting

awk Print New Column For Every Two Lines and Match On Multiple Column Values to print another column

Hi, My input files is like this axis1 0 1 10 axis2 0 1 5 axis1 1 2 -4 axis2 2 3 -3 axis1 3 4 5 axis2 3 4 -1 axis1 4 5 -6 axis2 4 5 1 Now, these are my following tasks 1. Print a first column for every two rows that has the same value followed by a string. 2. Match on the... (3 Replies)
Discussion started by: jacobs.smith
3 Replies

8. Shell Programming and Scripting

AWK script to create max value of 3rd column, grouping by first column

Hi, I need an awk script (or whatever shell-construct) that would take data like below and get the max value of 3 column, when grouping by the 1st column. clientname,day-of-month,max-users ----------------------------------- client1,20120610,5 client2,20120610,2 client3,20120610,7... (3 Replies)
Discussion started by: ckmehta
3 Replies

9. Shell Programming and Scripting

Help newbie: transposing column into row (pivot)

Hi, I have a file in this format. Name | organization John | INT Abby| DOM John | DOM John | MIX Jason | INT Anna | DOM Abby |MIX I want the output to look this. Name | organization John | INT, DOM, MIX Abby | DOM, MIX Jason | INT Anna | DOM (5 Replies)
Discussion started by: sirrtuan
5 Replies

10. Shell Programming and Scripting

for each different entry in column 1 extract maximum values from column 2 in unix/awk

Hello, I have 2 columns (1st column has multiple entries but the corresponding values in the column 2 may be the same or different.) however I want to extract unique values for each entry in column 1 by assigning the max value from column 2 SDF4 -0.211654 SDF4 0.978068 ... (1 Reply)
Discussion started by: Diya123
1 Replies
Login or Register to Ask a Question