Standard awk only supports 100 fields per line so you're going to need gawk or mawk.
If performance is an issue and lines to be merged are always adjacent, the following should use a lot less resources:
Ouch! I missed the 400,000 columns note. But, I don't see anything in the POSIX Standards or the Single UNIX Specifications that allow implementations to limit the number of fields in a line. And, if the input files are sorted, it is grossly inefficient to try to read the entire input file (of at least 8,000,000,000 bytes) into memory rather than sorting the input file first and using your method. But, of course, you can't use the standard sort utility to sort a file that has lines that are at least 800,000 bytes long.
All of the standard utilities that work on text files (including awk, the editors, grep, and sort) are only defined to work on text files (which limits a line to LINE_MAX bytes per line). LINE_MAX can be as small as 2,048. I don't think I've ever used a system with LINE_MAX greater than 20,480.
The only text processing utilities in the standards that are required to work on files that would be text files if line lengths were unlimited are: cut, fold, paste, and the shell. And, for the shell it is only the length of command lines that are unlimited (the shell built-in utilities that read and write files, such as read and printf, are only defined to work if the input or output is a text file).
It would be possible to use cut to create thousands (or tens of thousands or hundreds of thousands, depending on expected field widths after merging lines) of text files that can be processed with awk and then use cut again to get rid of the first field in each file, except the first one, and then use paste to put the results back together. But, having created this file with some lines that are at least 1.2Mb long (400,000 fields * (2 bytes/joined field + 1 byte separating fields)), there isn't much you can do with it.
Last edited by Don Cragun; 01-16-2013 at 12:28 PM..
Reason: auto spell check fixed too much again...
Hi,
I am attempting to merge the following lines which run over two lines using awk.
INITIAL OUTPUT
2019 Sep 28 10:47:24.695 hkaet9612 last message repeated 1 time
2019 Sep 28 10:47:24.695 hkaet9612 %ETHPORT-5-IF_DOWN_INTERFACE_REMOVED: Interfa
ce Ethernet1/45 is down (Interface removed)... (10 Replies)
Hi All,
I have the below file where I want the lines to merged based on a pattern.
AFTER
CMMILAOJ
CMMILAAJ
AFTER
CMDROPEJ
CMMIMVIJ
CMMIRNTJ
CMMIRNRJ
CMMIRNWJ
CMMIRNAJ
CMMIRNDJ
AFTER
CMMIRNTJ
CMMIRNRJ
CMMIRNWJ (4 Replies)
Hi..
My requirement is simple but unable to get that..
File 1 :
3 415 A G
4 421 G .
39 421 G A
2 421 G A,C
41 427 A .
4 427 A C
42 436 G .
3 436 G C
43 445 C .
2 445 C T
41 447 A .
Output (4 Replies)
hello all,
I have files that have a specific way for naming the first column
they are make of five names in Pattern of 3
Y = (no case sensitive)
so the files are names $Y-$Y-$Y or $X-$Y-$Z depending how we look
they only exist of the pattern exist
now I want to create a file from them that... (9 Replies)
I have 2 files,
file01= 7 columns, row unknown (but few)
file02= 7 columns, row unknown (but many)
now I want to create an output with the first field that is shared in both of them and then subtract the results from the rest of the fields and print there
e.g.
file 01
James|0|50|25|10|50|30... (1 Reply)
I have the following space-delimited input:
1 11.785710 117.857100
1 15 150
1 20 200
1 25 250
3 2.142855 21.428550
3 25 250
22 1.071435 10.714350
The first field is the ID number, the second field is the percentage of the total points that the person has and the third column is the number... (3 Replies)
I have two awk scripts shown below. checkTrvt.awk works on file format .xt, whereas checkData.awk workds on file format .dat
I want to merge the two scripts together, if I find that the user passed .xt file I do the code for .xt file, whereas if user passes .dat file, I go through the code for... (9 Replies)
Hi guys,
Wish you all a very Happy New Year!!!.
Thanks in advance.
I want to read a file and merge the rows which have '\n' in it.
The rows could be > 50,000 bytes. The script should merge all the rows till the next row starts with word 'Type|'.
ex.... (24 Replies)