Merging rows in awk

01-16-2013

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

Quote:

Originally Posted by Chubler_XL

Standard awk only supports 100 fields per line so you're going to need gawk or mawk.

If performance is an issue and lines to be merged are always adjacent, the following should use a lot less resources:

Code:

awk '
function pl() {
    printf "%s",a;
    for(i=2;i in V;i++) printf " %s",V[i]
    printf "\n"
}
a!=$1 {
    if(a)pl()
    a=$1
    split("",V,",")
}
{
   for(i=2;i<=NF;i++) V[i]=V[i]$i
   next
}
END {pl()}' infile

Ouch! I missed the 400,000 columns note. But, I don't see anything in the POSIX Standards or the Single UNIX Specifications that allow implementations to limit the number of fields in a line. And, if the input files are sorted, it is grossly inefficient to try to read the entire input file (of at least 8,000,000,000 bytes) into memory rather than sorting the input file first and using your method. But, of course, you can't use the standard sort utility to sort a file that has lines that are at least 800,000 bytes long.

All of the standard utilities that work on text files (including awk, the editors, grep, and sort) are only defined to work on text files (which limits a line to LINE_MAX bytes per line). LINE_MAX can be as small as 2,048. I don't think I've ever used a system with LINE_MAX greater than 20,480.

The only text processing utilities in the standards that are required to work on files that would be text files if line lengths were unlimited are: cut, fold, paste, and the shell. And, for the shell it is only the length of command lines that are unlimited (the shell built-in utilities that read and write files, such as read and printf, are only defined to work if the input or output is a text file).

It would be possible to use cut to create thousands (or tens of thousands or hundreds of thousands, depending on expected field widths after merging lines) of text files that can be processed with awk and then use cut again to get rid of the first field in each file, except the first one, and then use paste to put the results back together. But, having created this file with some lines that are at least 1.2Mb long (400,000 fields * (2 bytes/joined field + 1 byte separating fields)), there isn't much you can do with it.

Last edited by Don Cragun; 01-16-2013 at 12:28 PM.. Reason: auto spell check fixed too much again...

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

01-16-2013

Moderator

12,296, 3,792

Join Date: Nov 2008

Last Activity: 1 January 2021, 1:47 AM EST

Location: Amsterdam

Posts: 12,296

Thanks Given: 679

Thanked 3,792 Times in 3,282 Posts

FWIW, I ran this on various systems:

Code:

getconf LINE_MAX; { for ((i=1; i<=400000; i++)); do printf "$i "; done ; echo ;} | awk '{$1=$1x}1' OFS='\n' | wc -l

Code:

OSX 10.8:
2048
   400000

CentOS 6.3
2048
   400000

AIX7: 
2048
   400000

Solaris 10
2048
/usr/xpg4/bin/awk: line 0 (NR=1): Record too long (LIMIT: 19999 bytes)
       0

HPUX 11i:
2048
awk: Input line 1 2 3 4 5 6 7 8 9 10 cannot be longer than 3,000 bytes.
 The source line number is 1.
0

Scrutinizer

View Public Profile for Scrutinizer

Find all posts by Scrutinizer

Shell Programming and Scripting

Merging rows in awk

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Merging two lines into one (awk)

Discussion started by: sand1234

2. UNIX for Beginners Questions & Answers

Merging rows based on same ID in First column.

Discussion started by: anjaliANJALI

3. Shell Programming and Scripting

Merging rows after matching a pattern

Discussion started by: varun22486

4. Shell Programming and Scripting

Merging rows using two common rows.

Discussion started by: empyrean

5. UNIX Desktop Questions & Answers

merging files and add missing rows

Discussion started by: A-V

6. UNIX for Dummies Questions & Answers

merging rows into new file based on rows and first column

Discussion started by: A-V

7. Shell Programming and Scripting

Merging rows with same column 1 value

Discussion started by: mdlloyd7

8. Shell Programming and Scripting

merging files using awk

Discussion started by: Diya123

9. Shell Programming and Scripting

Merging together two awk scripts

Discussion started by: kristinu

10. Shell Programming and Scripting

Merging of rows

Discussion started by: ssachins