Merging rows in awk


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Merging rows in awk
# 8  
Old 01-16-2013
Quote:
Originally Posted by Chubler_XL
Standard awk only supports 100 fields per line so you're going to need gawk or mawk.

If performance is an issue and lines to be merged are always adjacent, the following should use a lot less resources:

Code:
awk '
function pl() {
    printf "%s",a;
    for(i=2;i in V;i++) printf " %s",V[i]
    printf "\n"
}
a!=$1 {
    if(a)pl()
    a=$1
    split("",V,",")
}
{
   for(i=2;i<=NF;i++) V[i]=V[i]$i
   next
}
END {pl()}' infile

Ouch! I missed the 400,000 columns note. But, I don't see anything in the POSIX Standards or the Single UNIX Specifications that allow implementations to limit the number of fields in a line. And, if the input files are sorted, it is grossly inefficient to try to read the entire input file (of at least 8,000,000,000 bytes) into memory rather than sorting the input file first and using your method. But, of course, you can't use the standard sort utility to sort a file that has lines that are at least 800,000 bytes long.

All of the standard utilities that work on text files (including awk, the editors, grep, and sort) are only defined to work on text files (which limits a line to LINE_MAX bytes per line). LINE_MAX can be as small as 2,048. I don't think I've ever used a system with LINE_MAX greater than 20,480.

The only text processing utilities in the standards that are required to work on files that would be text files if line lengths were unlimited are: cut, fold, paste, and the shell. And, for the shell it is only the length of command lines that are unlimited (the shell built-in utilities that read and write files, such as read and printf, are only defined to work if the input or output is a text file).

It would be possible to use cut to create thousands (or tens of thousands or hundreds of thousands, depending on expected field widths after merging lines) of text files that can be processed with awk and then use cut again to get rid of the first field in each file, except the first one, and then use paste to put the results back together. But, having created this file with some lines that are at least 1.2Mb long (400,000 fields * (2 bytes/joined field + 1 byte separating fields)), there isn't much you can do with it.

Last edited by Don Cragun; 01-16-2013 at 12:28 PM.. Reason: auto spell check fixed too much again...
# 9  
Old 01-16-2013
FWIW, I ran this on various systems:
Code:
getconf LINE_MAX; { for ((i=1; i<=400000; i++)); do printf "$i "; done ; echo ;} | awk '{$1=$1x}1' OFS='\n' | wc -l

Code:
OSX 10.8:
2048
   400000

CentOS 6.3
2048
   400000

AIX7: 
2048
   400000

Solaris 10
2048
/usr/xpg4/bin/awk: line 0 (NR=1): Record too long (LIMIT: 19999 bytes)
       0

HPUX 11i:
2048
awk: Input line 1 2 3 4 5 6 7 8 9 10 cannot be longer than 3,000 bytes.
 The source line number is 1.
0

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Merging two lines into one (awk)

Hi, I am attempting to merge the following lines which run over two lines using awk. INITIAL OUTPUT 2019 Sep 28 10:47:24.695 hkaet9612 last message repeated 1 time 2019 Sep 28 10:47:24.695 hkaet9612 %ETHPORT-5-IF_DOWN_INTERFACE_REMOVED: Interfa ce Ethernet1/45 is down (Interface removed)... (10 Replies)
Discussion started by: sand1234
10 Replies

2. UNIX for Beginners Questions & Answers

Merging rows based on same ID in First column.

Hellow, I have a tab-delimited file with 3 columns : BINPACKER.13259.1.p2 SSF48239 BINPACKER.13259.1.p2 PF13243 BINPACKER.13259.1.p2 G3DSA:1.50.10.20 BINPACKER.13259.2.p2 SSF48239 BINPACKER.13259.2.p2 PF13243 BINPACKER.13259.2.p2 G3DSA:1.50.10.20... (7 Replies)
Discussion started by: anjaliANJALI
7 Replies

3. Shell Programming and Scripting

Merging rows after matching a pattern

Hi All, I have the below file where I want the lines to merged based on a pattern. AFTER CMMILAOJ CMMILAAJ AFTER CMDROPEJ CMMIMVIJ CMMIRNTJ CMMIRNRJ CMMIRNWJ CMMIRNAJ CMMIRNDJ AFTER CMMIRNTJ CMMIRNRJ CMMIRNWJ (4 Replies)
Discussion started by: varun22486
4 Replies

4. Shell Programming and Scripting

Merging rows using two common rows.

Hi.. My requirement is simple but unable to get that.. File 1 : 3 415 A G 4 421 G . 39 421 G A 2 421 G A,C 41 427 A . 4 427 A C 42 436 G . 3 436 G C 43 445 C . 2 445 C T 41 447 A . Output (4 Replies)
Discussion started by: empyrean
4 Replies

5. UNIX Desktop Questions & Answers

merging files and add missing rows

hello all, I have files that have a specific way for naming the first column they are make of five names in Pattern of 3 Y = (no case sensitive) so the files are names $Y-$Y-$Y or $X-$Y-$Z depending how we look they only exist of the pattern exist now I want to create a file from them that... (9 Replies)
Discussion started by: A-V
9 Replies

6. UNIX for Dummies Questions & Answers

merging rows into new file based on rows and first column

I have 2 files, file01= 7 columns, row unknown (but few) file02= 7 columns, row unknown (but many) now I want to create an output with the first field that is shared in both of them and then subtract the results from the rest of the fields and print there e.g. file 01 James|0|50|25|10|50|30... (1 Reply)
Discussion started by: A-V
1 Replies

7. Shell Programming and Scripting

Merging rows with same column 1 value

I have the following space-delimited input: 1 11.785710 117.857100 1 15 150 1 20 200 1 25 250 3 2.142855 21.428550 3 25 250 22 1.071435 10.714350 The first field is the ID number, the second field is the percentage of the total points that the person has and the third column is the number... (3 Replies)
Discussion started by: mdlloyd7
3 Replies

8. Shell Programming and Scripting

merging files using awk

Hi, I have 2 files. File 1 chr1 1234 2468 ABC chr1 3456 4567 DEF chr2 5643 6154 XYZ : : : : so on.... File 2 chr1 1500 2500 positive chr1 2500 3500 negative chr1 3000 4500 neutral (10 Replies)
Discussion started by: Diya123
10 Replies

9. Shell Programming and Scripting

Merging together two awk scripts

I have two awk scripts shown below. checkTrvt.awk works on file format .xt, whereas checkData.awk workds on file format .dat I want to merge the two scripts together, if I find that the user passed .xt file I do the code for .xt file, whereas if user passes .dat file, I go through the code for... (9 Replies)
Discussion started by: kristinu
9 Replies

10. Shell Programming and Scripting

Merging of rows

Hi guys, Wish you all a very Happy New Year!!!. Thanks in advance. I want to read a file and merge the rows which have '\n' in it. The rows could be > 50,000 bytes. The script should merge all the rows till the next row starts with word 'Type|'. ex.... (24 Replies)
Discussion started by: ssachins
24 Replies
Login or Register to Ask a Question