Merging multiple lines to columns with awk, while inserting commas for missing lines


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Merging multiple lines to columns with awk, while inserting commas for missing lines
# 1  
Old 08-31-2015
Merging multiple lines to columns with awk, while inserting commas for missing lines

Hello all,
I have a large csv file where there are four types of rows I need to merge into one row per person, where there is a column for each possible code / type of row, even if that code/row isn't there for that person.

In the csv, a person may be listed from one to four times consecutively. Each row in which they are listed has a number of hours and a code (WE, MO, CE, ES) that indicates type of hours

I found a similar thread and a solution to a similar problem. The forum won't let me post links yet, but you can add this to the base URL /shell-programming-and-scripting/208027-merge-multiple-lines-same-file-common-key-using-awk.html to open that thread.

However, the output of that script tacks each of the optional lines on to the end of the first line, but that messes up the csv columns, so that the same data is not in the same place in the merged columns, afterward.

Here is a simplified version of the input data I have, which has a lot of fields per line after these, but where for each person, all fields except for the type-of-hours code, and number of hours are identical between lines the person is listed on:
Code:
1111,John Smith,WE,105,Fourty Two Cherry Lane,Nowhere,NY
1111,John Smith,MO,23,Fourty Two Cherry Lane,Nowhere,NY
2222,Elane Boozler,WE,70,Twelve Peach Court,Elsehere,NY
2222,Elane Boozler,MO,24,Twelve Peach Court,Elsehere,NY
2222,Elane Boozler,CE,30,Twelve Peach Court,Elsehere,NY
2222,Elane Boozler,ES,8,Twelve Peach Court,Elsehere,NY
3333,Hortense Gable,CE,25,Three Pinapple Ln,Somewhere,NY
3333,Hortense Gable,ES,14,Three Pinapple Ln,Somewhere,NY

I need the output, to be in the form
Code:
1111,John Smith,105,23,,,Fourty Two Cherry Lane,Nowhere, NY
2222,Elane Boozler,70,24,30,8,Peach Court,Elsehere,NY
3333,Hortense Gable,,,25,14,Three Pinapple Ln,Somewhere,NY

Please let me know how I might get this kind of output from this kind of input using awk. Thanks so much for your help!
# 2  
Old 08-31-2015
Based on the solution given in the mentioned thread, try
Code:
awk -F, '
NR!=1 && p1!=$1         {print prev
                         delete TMP
                        }
                        {p1=$1
                         TMP[$3]=$4
                         $4=sprintf("%s,%s,%s,%s", TMP["WE"], TMP["MO"], TMP["CE"], TMP["ES"])
                         sub ($3 FS, "")
                         prev=$0
                        }
END                     {print prev}
' OFS="," file
1111,John Smith,105,23,,,Fourty Two Cherry Lane,Nowhere,NY
2222,Elane Boozler,70,24,30,8,Twelve Peach Court,Elsehere,NY
3333,Hortense Gable,,,25,14,Three Pinapple Ln,Somewhere,NY

This User Gave Thanks to RudiC For This Post:
# 3  
Old 09-01-2015
Great! Thanks so much for your help : ) and sorry for the delayed response.

Could you please answer one more question? If the ID field is not the first field in the file, such as, if there are six fields to the left of it, I would have thought that I could just increment the field identifiers: $1 to $7, $3 to $9, and $4 to $10 to have the script continue to operate on the same fields, shown below.
Code:
awk -F, '
NR!=1 && p1!=$7         {print prev
                         delete TMP
                        }
                        {p1=$9
                         TMP[$9]=$10
                         $10=sprintf("%s,%s,%s,%s", TMP["WE"], TMP["MO"], TMP["CE"], TMP["ES"])
                         sub ($9 FS, "")
                         prev=$0
                        }
END                     {print prev}
' OFS="," TestIDNotFirst.csv

However, if I run that on a version of the file where the ID is the 7th field, such as below, that doesn't work.
Code:
red,up,cold,in,Japanamation,Digeridoo,1111,John Smith,WE,105,Fourty Two Cherry Lane,Nowhere,NY
red,up,cold,in,Japanamation,Digeridoo,1111,John Smith,MO,23,Fourty Two Cherry Lane,Nowhere,NY
green,down,hot,out,Collapso,Polanko,2222,Elane Boozler,WE,70,Twelve Peach Court,Elsehere,NY
green,down,hot,out,Collapso,Polanko,2222,Elane Boozler,MO,24,Twelve Peach Court,Elsehere,NY
green,down,hot,out,Collapso,Polanko,2222,Elane Boozler,CE,30,Twelve Peach Court,Elsehere,NY
green,down,hot,out,Collapso,Polanko,2222,Elane Boozler,ES,8,Twelve Peach Court,Elsehere,NY
blue,right,hot,in,Smitten,Kitten,3333,Hortense Gable,CE,25,Three Pinapple Ln,Somewhere,NY
blue,right,hot,in,Smitten,Kitten,3333,Hortense Gable,ES,14,Three Pinapple Ln,Somewhere,NY

Results in the output below, where the fields WE, MO, CE, ES aren't being put into the correct columns
Code:
red,up,cold,in,Japanamation,Digeridoo,1111,John Smith,105,,,,Fourty Two Cherry Lane,Nowhere,NY
red,up,cold,in,Japanamation,Digeridoo,1111,John Smith,,23,,,Fourty Two Cherry Lane,Nowhere,NY
green,down,hot,out,Collapso,Polanko,2222,Elane Boozler,70,,,,Twelve Peach Court,Elsehere,NY
green,down,hot,out,Collapso,Polanko,2222,Elane Boozler,,24,,,Twelve Peach Court,Elsehere,NY
green,down,hot,out,Collapso,Polanko,2222,Elane Boozler,,,30,,Twelve Peach Court,Elsehere,NY
green,down,hot,out,Collapso,Polanko,2222,Elane Boozler,,,,8,Twelve Peach Court,Elsehere,NY
blue,right,hot,in,Smitten,Kitten,3333,Hortense Gable,,,25,,Three Pinapple Ln,Somewhere,NY
blue,right,hot,in,Smitten,Kitten,3333,Hortense Gable,,,,14,Three Pinapple Ln,Somewhere,NY

In reality, I need to output specific fields other than the ID and number of hours per category. In pseudo-code, what I want to write is something like:
Code:
awk '{ print $2, $4, $7, $8 }

NR!=1 && p1!=$7         {print prev
                         delete TMP
                        }
                        {p1=$9
                         TMP[$9]=$10
                         $10=sprintf("%s,%s,%s,%s", TMP["WE"], TMP["MO"], TMP["CE"], TMP["ES"])
                         prev=$0
                        }
END                     {print prev}
{ print $10, $12}'
' OFS="," InputFile.csv

And have the code that actually worked like that, output the following:
Code:
up,in,1111,John Smith,105,23,,,Fourty Two Cherry Lane,NY
down,out,2222,Elane Boozler,70,24,30,8,Twelve Peach Court,NY
right,in,3333,Hortense Gable,,,25,14,Three Pinapple Ln,NY

You've already gone above and beyond the call of duty, but if you could turn my pseudocode into real code that outputs this, I'll send a pizza to the address of your choice!

Last edited by RalphNY; 09-01-2015 at 11:06 PM..
# 4  
Old 09-02-2015
Try,

Code:
awk -F, '
NR != 1 && p1 != $8 {
    print prev
    delete TMP
}
{
    p1 = $8
    TMP[$9] = $10
    prev = $2 FS $4 FS $7 FS $8 FS TMP["WE"] FS TMP["MO"] FS TMP["CE"] FS TMP["ES"] FS $11 FS $13
}
END {
    print prev
}' OFS="," file


Last edited by Aia; 09-02-2015 at 12:54 AM.. Reason: save directly to prev instead of $0
This User Gave Thanks to Aia For This Post:
# 5  
Old 09-02-2015
Awesome! That works! I was serious about the pizza- if you would like it, DM me your address!
# 6  
Old 09-02-2015
Quote:
Originally Posted by RalphNY
Awesome! That works! I was serious about the pizza- if you would like it, DM me your address!
No need, thank you. Keep at it.
# 7  
Old 09-02-2015
Quote:
Originally Posted by RalphNY
... ... ...
Code:
awk -F, '
NR!=1 && p1!=$7         {print prev
                         delete TMP
                        }
                        {p1=$9
                         TMP[$9]=$10
                         $10=sprintf("%s,%s,%s,%s", TMP["WE"], TMP["MO"], TMP["CE"], TMP["ES"])
                         sub ($9 FS, "")
                         prev=$0
                        }
END                     {print prev}
' OFS="," TestIDNotFirst.csv

... ... ...
Your translation of RudiC's suggestion was close... I think you would have gotten what you had expected if the 2nd line marked in red in your code above had been:
Code:
                        {p1=$7

instead of:
Code:
                        {p1=$9

The code Aia suggested should work fine as long your data never contains two people with the same name with different IDs. RudiC's code was matching on the ID (e.g., 1111, 2222, and 3333) while Aia's code is matching on the name (e.g., John Smith, Elane Boozler, and Hortense Gable). With the sample data you provided you could have matched on any field except $3, $9, and $10; but I assume that ID ($7) and name ($8) are the only two reasonable choices to consider for your real data.

Note also that the awk statement delete array_name (i.e., delete TMP) is an extension to the standards and will not work on some awk implementations. If you ever need to port this to a system that doesn't accept that statement, the following should work on any standards conforming version of awk:
Code:
                         for(i in TMP) delete TMP[i]

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Merging two lines into one (awk)

Hi, I am attempting to merge the following lines which run over two lines using awk. INITIAL OUTPUT 2019 Sep 28 10:47:24.695 hkaet9612 last message repeated 1 time 2019 Sep 28 10:47:24.695 hkaet9612 %ETHPORT-5-IF_DOWN_INTERFACE_REMOVED: Interfa ce Ethernet1/45 is down (Interface removed)... (10 Replies)
Discussion started by: sand1234
10 Replies

2. UNIX for Beginners Questions & Answers

Merging multiple lines into single line based on one column

I Want to merge multiple lines based on the 1st field and keep into single record. SRC File: AAA_POC_DB.TAB1 AAA_POC_DB.TAB2 AAA_POC_DB.TAB3 AAA_POC_DB.TAB4 BBB_POC_DB.TAB1 BBB_POC_DB.TAB2 CCC_POC_DB.TAB6 OUTPUT ----------------- 'AAA_POC_DB','TAB1','TAB2','TAB3','TAB4'... (10 Replies)
Discussion started by: raju2016
10 Replies

3. Shell Programming and Scripting

Merging multiple lines

I do have a text file with multiple lines on it. I want to put the lines of text into a single line where ever there is ";" for example ert, ryt, yvig, fgr; rtyu, hjk, uio, hyu, hjo; ghj, tyu, gho, hjp, jklo, kol; The resultant file I would like to have is ert, ryt, yvig, fgr;... (2 Replies)
Discussion started by: Kanja
2 Replies

4. Shell Programming and Scripting

Merging multiple files using lines from one file

I have been working of this script for a very long time and I have searched the internet for direction but I am stuck here. I have about 3000 files with two columns each. The length of each file is 50000. Each of these files is named this way b.4, b.5, b.6, b.7, b.8, b.9, b.10, b.11, b.12... (10 Replies)
Discussion started by: iconig
10 Replies

5. Shell Programming and Scripting

Awk match multiple columns in multiple lines in single file

Hi, Input 7488 7389 chr1.fa chr1.fa 3546 9887 chr5.fa chr9.fa 7387 7898 chrX.fa chr3.fa 7488 7389 chr21.fa chr3.fa 7488 7389 chr1.fa chr1.fa 3546 9887 chr9.fa chr5.fa 7898 7387 chrX.fa chr3.fa Desired Output 7488 7389 chr1.fa chr1.fa 2 3546 9887 chr5.fa chr9.fa 2... (2 Replies)
Discussion started by: jacobs.smith
2 Replies

6. Shell Programming and Scripting

merging multiple lines into single line

Hi, 1. Each message starts with date 2. There is blank line between each message 3. Each message does not contain same number of lines. Any help in merging multiple lines in each message to a single line is much appreciated. AIX: Korn Shell Error log file looks like below. ... (5 Replies)
Discussion started by: bala123
5 Replies

7. Shell Programming and Scripting

inserting and replacing lines with awk

Hello, I need to insert varying lines (i.e. these lines are an output of another script) between lines starting with certain fields. An example to make it more clear. This is the file where I wanna insert lines: (save it as "input.txt") ContrInMi_c_mir 2 10066 181014 200750... (12 Replies)
Discussion started by: tempestas
12 Replies

8. Shell Programming and Scripting

inserting multiple lines with awk

awk '/<login-module code="com.nlayers.seneca.security.LdapLogin" flag="sufficient">/{p++} /<login-module code="com.nlayers.seneca.security.LdapLogin" flag="sufficient">/ && p==1 {$0="Mulitple lines here\n"$0}1' login-config.xml In the above awk code inside shell script, i am having problems when... (1 Reply)
Discussion started by: sunrexstar
1 Replies

9. Shell Programming and Scripting

Matching lines across multiple csv files and merging a particular field

I have about 20 CSV's that all look like this: "","","","","","","","","","","","","","","",""What I've been told I need to produce is the exact same thing, but with each file now containing the start_code from every other file where the email matches. It doesn't matter if any of the other... (1 Reply)
Discussion started by: Demosthenes
1 Replies

10. Shell Programming and Scripting

Merging non-repeating columns of lines

Hello, I have file to work with. It has 5 columns. The first three, altogether, constitutes the position. The 4th column contains some values for downstream analysis and the fifth column contains some values that I want to add to 4th column (only if they happen to be in the same position). My... (5 Replies)
Discussion started by: menenuh
5 Replies
Login or Register to Ask a Question

Featured Tech Videos