If you're careful about speed you'll naturally not use that method despite appearing to be simpler. One could argue that that could be better but one would not.
Given the dearth of detail, any optimization efforts would be aimless.
For an average implementation, on average hardware, processing an average text file, under average user expectations, the performance discrepancy between the AWK scripts will be insignificant, and there has been no indication by the OP that this situation is anything but average.
For an extraordinary situation, the details which we do not have (awk implementation? data set characteristics?) are crucial.
Testing with gawk, mawk, and busybox and two types of data, one with modest lines (100 columns, 292 bytes each) and the other with much wider lines (32,765 columns, 185,484 bytes each), yielded highly inconsistent results.
My original suggestion was sometimes the fastest, but only when lines were modestly-sized. As you correctly pointed out, my code does not scale; performance degrades drastically with increasing line length.
Casual testing suggests that you're using gawk, because otherwise the performance of your more recent suggestions regresses greatly compared to your original contribution.
Gawk running the following script was the fastest of all possible implementation/script combinations (that I tested):
Quote:
Originally Posted by konsolebox
However, that very same script under Busybox was also the slowest of all interpreter/script combinations (slower even than any run of my original sloth). This script was also the slowest of all under mawk.
The highlighted statements trigger recomputation of $0 in all three implementations, but only gawk implements an optimization to lazily avoid that overhead until $0 itself (not its fields) is referenced. For the details, follow field0_valid in gawk - field.c
There are a lot of systems out there that do not use gawk by default. Even among Linux installations, most embedded systems and most Debian derivatives (including most Ubuntu and Ubuntu-derivative versions) do not use it. For all of them, this revision is a setback.
In the absence of any specifics, in my judgement, your original solution exhibits the best balance of scalability and predictable performance across implementations. Minus the redundant split, the off-by-one in the loop condition, and the printf format string bugs:
In this specific case, though, since there is no constraint requiring AWK and since any cut implementation would outperform any AWK implementation running any of these scripts ... by a significant margin, the performance debate is academic.
Hi,
Is there any short method to print from a particular field till another filed using awk?
Example File:
File1
====
1|2|acv|vbc|......|100|342
2|3|afg|nhj|.......|100|346
Expected output:
File2
====
acv|vbc|.....|100
afg|nhj|.....|100 (8 Replies)
I have an awk script to find the maximum value of the 2nd column of a 2 column datafile, but I need to find the top 5 maximum values of the 2nd column.
Here is the script that works for the maximum value.
awk 'BEGIN { subjectmax=$1 ; max=0} $2 >= max {subjectmax=$1 ; max=$2} END {print... (3 Replies)
Is there an awk script that can easily perform the following operation?
I have a data file that is in the format of
1944-12,5.6
1945-01,9.8
1945-02,6.7
1945-03,9.3
1945-04,5.9
1945-05,0.7
1945-06,0.0
1945-07,0.0
1945-08,0.0
1945-09,0.0
1945-10,0.2
1945-11,10.5
1945-12,22.3... (3 Replies)
Hi..
May be a simple question but I just began to write unix scripts a week ago, for sorting some huge amount of experiment data, so I got no common sense about unix scripting and really need your helps...
The situation is, I want to read the nth word of mth line in a file, and then store it... (3 Replies)
Hello Members,
I have a csv file in the format below. Need help with awk statement to break nth column into 3 separate columns and export the changes to new file.
input file --> file.csv
cat file.csv|less
"product/fruit/mango","location/asia/india","type/alphonso"
need output in... (2 Replies)
Hi Guys,
Need help with logic to break Column nth in a CSV file into two
for e.g
Refer below the second column as the nth column
"abcd","","type/beta-version"
need output in a following format
"abcd","/place/asia/india/mumbai","/product/sw/tomcat","type/beta-version"
... (5 Replies)
I wanted to search for a string and replace it with other string from nth column of a file which is comma seperated which I am able to do with below
# For Comma seperated file without quotes
awk 'BEGIN{OFS=FS=","}$"'"$ColumnNo"'"=="'"$PPK"'"{$"'"$ColumnNo"'"="'"$NPK"'"}{print}' ${FileName} ... (5 Replies)
Hello Members,
Need your expert opinion how to tackle below.
I have an input file that looks like below:
USS|AWCC|AFGAW|93|70
USSAA|Roshan TDCA|AFGTD|93|72,79
ALB|Vodafone|ALBVF|355|69
ALGEE|Wataniya (Nedjma)|DZAWT|213|50,550
I like output file in below format:
... (7 Replies)