How to concat lines that have the same key field


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to concat lines that have the same key field
# 1  
Old 03-29-2010
How to concat lines that have the same key field

Hi, I have file like this -

ABC 123
ABC 456
ABC 321
CDE 789
CDE 345
FGH 111
FGH 222
FGH 333
XYZ 678

I need the output like this:

ABC 123,456
CDE 789,345
FGH 111, 222
XYZ 678

Meaning I want to concat the lines that have the same first column, but I only need the first two lines. If the key field (like above ABC) has more than 3 lines, I only want to concat the first two lines.

Any idea how to do this?

Thanks!!
# 2  
Old 03-29-2010
Code:
awk '{if (b[$1]<2) {a[$1]=a[$1] FS $2;b[$1]++}} END {for (i in a) print i,a[i]|"sort"}' urfile

# 3  
Old 03-30-2010
Hi Rdcwayx,

Thanks for the quick reply. My question is, I am dealing with large amout of data. Probably about 10 million. Will this have performance issue?

Thanks
# 4  
Old 03-30-2010
Are the lines sorted by the first column?
# 5  
Old 03-30-2010
You should just run it and find out if performance and resource consumption is acceptable. What does and doesn't have performance issues depends on the person, situation, priority of the job, the hardware, etc.

Only thing that can be stated by looking at the awk/sort code, is that it will require something on the order of 2 x 10 x avg_line_length megabytes of memory for a 10 million line data set, since both awk and sort will require full copies of the data in memory (worst case scenario). If the average line length is 50 characters, it could approach 1 gigabyte or ram required.

Regards,
Alister
# 6  
Old 03-30-2010
If the data is ordered by the 1e column, this should work fine:
Code:
awk '$1==k{s=s "," $2; next}
s{print s}
{s=$0;k=$1}
END{print s}' file

# 7  
Old 03-30-2010
Quote:
Originally Posted by Franklin52
If the data is ordered by the 1e column, this should work fine:
Code:
awk '$1==k{s=s "," $2; next}
s{print s}
{s=$0;k=$1}
END{print s}' file

Hello, Franklin52:

Actually, that code isn't quite right. The original problem statement requires that at most two values are matched per key. This solution will continue to assimilate as many as found. Changing
Code:
{print s}

to
Code:
s{match(s,/^[^ ]* [^,]*(,[^,]*)?/); print substr(s, RSTART, RLENGTH)}

works around that by discarding unwanted values at print time. A counter is probably a nicer fix, though:
Code:
awk '$1==k {if (++i<3) s=s "," $2; next}
s{print s}
{s=$0;k=$1;i=1}
END{print s}' file

Regards,
Alister

Last edited by alister; 03-30-2010 at 02:41 PM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk repeat one field at all lines and modify field repetitions

Hello experts I have a file with paragraphs begining with a keeping date and ending with "END": 20120301 num num John num num A keepnum1 num num kathrin num num A keepnum1 num num kathrin num num B keepnum2 num num Pete num num A keepnum1 num num Jacob num... (2 Replies)
Discussion started by: phaethon
2 Replies

2. Shell Programming and Scripting

Shell Script @ Find a key word and If the key word matches then replace next 7 lines only

Hi All, I have a XML file which is looks like as below. <<please see the attachment >> <?xml version="1.0" encoding="UTF-8"?> <esites> <esite> <name>XXX.com</name> <storeId>10001</storeId> <module> ... (4 Replies)
Discussion started by: Rajeev_hbk
4 Replies

3. UNIX for Dummies Questions & Answers

concat any two lines in a file

I have a file with line 4 : F SITE SPA_M2 SPA_M3 SPA_M4 and a line 237 with: BV_N4 VbdGO_PW Rs_NW_STI Rc_N+OD need a awk liner to concat the two lines so that line 2 sits next to line1 and looks like: F SITE SPA_M2 SPA_M3 SPA_M4 BV_N4 VbdGO_PW ... (8 Replies)
Discussion started by: awkaddict
8 Replies

4. Shell Programming and Scripting

awk or sed? change field conditional on key match

Hi. I'd appreciate if I can get some direction in this issue to get me going. Datafile1: -About 4000 records, I have to update field#4 in selected records based on a match in the key field (Field#1). -Field #1 is the key field (servername) . # of Fields may vary # comment server1 bbb ccc... (2 Replies)
Discussion started by: RascalHoudi
2 Replies

5. Shell Programming and Scripting

Split file when the key field change !

Hello, I have the following example data file: Rv.Global_Sk,1077.160523,D,16/09/2011 Rv.Global_Sk,1077.08098,D,17/09/2011 Rv.Global_Sk,1077.001445,D,18/09/2011 Rv.Global_Sk,1072.660733,D,19/09/2011 Rv.Global_Sk,1070.381557,D,20/09/2011 Rv.Global_Sk,1071.971747,D,21/09/2011... (4 Replies)
Discussion started by: csierra
4 Replies

6. Shell Programming and Scripting

awk concat lines between 2 sequent digits

I would like to print string between two sequent digits and concatenate it into one single line. input.txt 99 cord, rope, strand, twine, twist, 100 strand, twine, twist, cord, rope 101 strand, twine, twist, twine, twist, cord, rope 105 cord, rope ,twi ... (8 Replies)
Discussion started by: sdf
8 Replies

7. Shell Programming and Scripting

Using AWK to format output based on key field

I have file which contains gene lines something like this Transcript Name GO POPTR_0016s06290.1 98654 POPTR_2158s00200.1 11324 POPTR_0004s22390.1 12897 POPTR_0001s11490.1 POPTR_0016s13950.1 14532 POPTR_0015s05840.1 13455 POPTR_0013s06470.1 12344... (6 Replies)
Discussion started by: shen
6 Replies

8. Shell Programming and Scripting

Need help in concat of two lines in a file

Hi , Need help in concating two lines based on certain character, for example my file has the messages : :57A:qweqweww :58A:qeqewqeqe -}$ {1:fffff2232323}{2:123123dasds}{4: :20:121323232323232 :21:sdsadasdasddadad if the line ends with "-}$" or if a line starts with "{1:" then it... (5 Replies)
Discussion started by: ulin
5 Replies

9. Shell Programming and Scripting

Conditional concat lines awk

Hello, I have a text file like this: NONE FILE_Rename frompath: /log_audit/AIX/log/current/AIXAFTPP.log NONE FILE_Unlink filename /audit/tempfile.14041142 NONE FILE_Rename ... (8 Replies)
Discussion started by: carloskl
8 Replies

10. UNIX for Dummies Questions & Answers

Please Help:Need to Split the file into mutliple files depends on the KEY field value

Hi Gurus, I am new to UNIX(HP). my requirmnet is File needs to needs to be split into multiple files dependa on one key value. example 1 abc 333 us 2 bbc 444 ch 5 nnn 333 kk 7 hhh 555 ll 3 hhh 333 hh now the requirment is line with 333 should be copied into test1.txt and... (14 Replies)
Discussion started by: arund_01
14 Replies
Login or Register to Ask a Question