Merge multiple lines in same file with common key using awk

Login or Register for Dates, Times and to Reply

Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Merge multiple lines in same file with common key using awk
# 1  
Merge multiple lines in same file with common key using awk

I've been a Unix admin for nearly 30 years and never learned AWK. I've seen several similar posts here, but haven't been able to adapt the answers to my situation. AWK is so damn cryptic! Smilie

I have a single file with ~900 lines (CSV list). Each line starts with an ID, but with different stuff after it which I want to merge onto a single line. Sometimes there are just a couple of lines to be merge, sometimes there are 4 or 5. The input file is already sorted.
Here's an example:

2,"abc","text","some text"
3,"uname","text","some text"
3,"ulang","text","some text"

Output should look like:

2,"URL","website","","abc","text","some text",Password","password","12345678"
3,"URL","website","","uname","text","some text""Password","password","password","ulang","text","some text"

---------- Post updated at 12:22 AM ---------- Previous update was at 12:11 AM ----------

I forgot to mention, each line in the input file only has 4 fields.
# 2  
Assuming that the first occurrence of the whole second field is at the second-field:
awk -F, 'NR!=1 && p1!=$1{print prev;prev=""}
{p1=$1;prev=(prev"")?prev FS substr($0,index($0,$2)):$0}
END{if(prev"") print prev}' file

Last edited by elixir_sinari; 11-26-2012 at 06:11 AM..
This User Gave Thanks to elixir_sinari For This Post:
# 3  

awk -F "," 's != $1 || NR ==1{s=$1;if(p){print p};p=$0;next}
{sub($1,"",$0);p=p""$0;}END{print p}' file

This User Gave Thanks to pamu For This Post:
# 4  

Thank you, that works!

I wish I could learn AWK, I know it's really powerful. I've coded in many different languages and can usually work out what stuff does, but AWK is so cryptic!

I'm looking at what you posted trying to understand. If you're up giving me a little tutorial, I'd love to understand.

Where does p1 come from?

Is prev a built-in variable?

# While NR (number of records) is not equal to 1 and p1 is not equal to the first field (the ID)
NR!=1 && p1!=$1

# Print the line and append a null so the line will join with the next
{print prev;prev=""}

# p1 gets the value of the ID field

# This is a bit cryptic, but it looks like you're assigning/merging the contents
# of previous line with the next one and stripping the first field??
prev=(prev"")?prev FS substr($0,index($0,$2)):$0}

# Also a bit cryptic, but guessing we've reached the EOF and spitting out what's left?
END{if(prev"") print prev}'

---------- Post updated at 12:52 AM ---------- Previous update was at 12:48 AM ----------

---------- Post updated at 12:55 AM ---------- Previous update was at 12:52 AM ----------

Originally Posted by pamu

awk -F "," 's != $1 || NR ==1{s=$1;if(p){print p};p=$0;next}
{sub($1,"",$0);p=p""$0;}END{print p}' file

That works well also!


The great thing about Unix and AWK is there are so many different ways to do stuff.

The thing about AWK is it's hard to tell what special meaning certain things have and where they come from.

I wish I'd learned it a long time ago.
# 5  
Where does p1 come from?
p1 is used to store the value of the first field. Since this is done at the end of each line processing, it becomes the previous value of the first field for the current line.

Is prev a built-in variable?
No. In awk, the built-in variables are all upper-case words. prev is used to store the line made up by appending segments of the lines which have common first fields. After printing each made-up line, we need to nullify the variable to make it ready for the next line to be made up.

And the cryptic line does what you thought it does. If prev"" (prev is not-null) is true, then we must be having at least 1 line in prev so we need to append a , (FS - field separator, comma in this case, and that's a special/built-in awk variable) and the substring of the line from the second field onwards. Else if prev is null, then it must be the first time since printing the last prev. So, just assign the current line ($0) to prev.

And, I think you've understood rest of the things.
# 6  
Originally Posted by elixir_sinari
awk -F, 'NR!=1 && p1!=$1{print prev;prev=""}

prev seems to magically appear where you say print prev, but where do you assign its contents?

Is it somehow defaulting to $0 ?
# 7  
prev=(prev"")?prev FS substr($0,index($0,$2)):$0

The above line does the assignment:

Evaluate (prev"").

If true, return prev FS substr($0,index($0,$2)).

If false, return $0.

And, this returned value is stored back in prev.
Login or Register for Dates, Times and to Reply

Previous Thread | Next Thread
Thread Tools Search this Thread
Search this Thread:
Advanced Search

Test Your Knowledge in Computers #313
Difficulty: Easy
The programming language Python is based on a modified version of JavaScript.
True or False?

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Join and merge multiple files with duplicate key and fill void columns

Join and merge multiple files with duplicate key and fill void columns Hi guys, I have many files that I want to merge: file1.csv: 1|abc 1|def 2|ghi 2|jkl 3|mno 3|pqr file2.csv: (5 Replies)
Discussion started by: yjacknewton
5 Replies

2. UNIX for Dummies Questions & Answers

Merge selective columns from files based on common key

Hi, I am trying to selectively merge two files based on keys reported in the 1st column. File1: #file1-header1 file1-header2 111 qwe rtz uio 198 asd fgh jkl 165 yxc 789 poi uzt rew 89 lkj File2: #file2-header2 file2-header2 165 ghz nko2 ... (2 Replies)
Discussion started by: dovah
2 Replies

3. Shell Programming and Scripting

Merge multiple files with common header

Hi all, Say i have multiple files x1 x2 x3 x4, all with common header (date, time, year, age),, How can I merge them to one singe file "X" in shell scripting Thanks for your suggestions. (2 Replies)
Discussion started by: msarguru
2 Replies

4. UNIX for Dummies Questions & Answers

Writing a loop to merge multiple files by common column

I have 100 data files labelled 250.1.txt through 250.100.txt. The second column of the data files partially match (there is about %90 overlap). Each data file has 4 columns. I want the merge all these text files by the matching values in the second column. In the output, the first column should... (1 Reply)
Discussion started by: evelibertine
1 Replies

5. Shell Programming and Scripting

Gawk / Awk Merge Lines based on Key

Hi Guys, After windows died on my netbook I installed Lubuntu and discovered Gawk about a month ago. After using Excel for 10+ years I'm amazed how quick and easily Gawk can process data but I'm stuck with a little problem merging data from multiple lines. I'm an SEO Consultant and provide... (9 Replies)
Discussion started by: Jamesfirst
9 Replies

6. Shell Programming and Scripting

"Join" or "Merge" more than 2 files into single output based on common key (column)

Hi All, I have working (Perl) code to combine 2 input files into a single output file using the join function that works to a point, but has the following limitations: 1. I am restrained to 2 input files only. 2. Only the "matched" fields are written out to the "matched" output file and... (1 Reply)
Discussion started by: Katabatic
1 Replies

7. Shell Programming and Scripting

Using Perl to Merge Multiple Lines in a File

I've hunted and hunted but nothing seems to apply to what I need. Any help will be much appreciated! My input file looks like (Unix): marker,allele1,allele2 RS1002244,1,1 RS1002244,1,3 RS1002244,3,3 RS1003719,2,2 RS1003719,2,4 RS1003719,4,4 Most markers are listed 3 times but a few... (2 Replies)
Discussion started by: Peggy White
2 Replies

8. Shell Programming and Scripting

Merge lines in a file with Awk - incorrect output

Hi, I would like: FastEthernet0/0 is up, line protocol is up 0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored 0 output errors, 0 collisions, 0 interface resets Serial1/0:0 is up, line protocol is up 0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored, 0 abort 0... (14 Replies)
Discussion started by: mv652
14 Replies

9. Shell Programming and Scripting

merge based on common, awk help

All, $ cat x.txt z 11 az x 12 ax y 13 ay $ cat y.txt ay TT ax NN Output required: y 13 ay TT x 12 ax NN (3 Replies)
Discussion started by: jkl_jkl
3 Replies

10. Shell Programming and Scripting

merge multiple lines from flat file

Hi, I have a tab delimited flat file like this: 189 Guide de lutilisateur sur lappel conférence à trois au moyen d'adaptateurs téléphoniques <TABLE><TBODY><TR><TD><DIV class=subheader>La fonction Appel conférence à trois </DIV></TD> \ <TD><?php print $navTree;?> vous permet de tenir un appel... (4 Replies)
Discussion started by: hnhegde
4 Replies

Featured Tech Videos