How to remove mth and nth column from a file? Post: 302841691

Sponsored Content

Top Forums Shell Programming and Scripting How to remove mth and nth column from a file? Post 302841691 by alister on Wednesday 7th of August 2013 10:22:36 PM

08-07-2013

Registered User

Quote:

Originally Posted by konsolebox

If you're careful about speed you'll naturally not use that method despite appearing to be simpler. One could argue that that could be better but one would not.

Given the dearth of detail, any optimization efforts would be aimless.

For an average implementation, on average hardware, processing an average text file, under average user expectations, the performance discrepancy between the AWK scripts will be insignificant, and there has been no indication by the OP that this situation is anything but average.

For an extraordinary situation, the details which we do not have (awk implementation? data set characteristics?) are crucial.

Testing with gawk, mawk, and busybox and two types of data, one with modest lines (100 columns, 292 bytes each) and the other with much wider lines (32,765 columns, 185,484 bytes each), yielded highly inconsistent results.

My original suggestion was sometimes the fastest, but only when lines were modestly-sized. As you correctly pointed out, my code does not scale; performance degrades drastically with increasing line length.

Casual testing suggests that you're using gawk, because otherwise the performance of your more recent suggestions regresses greatly compared to your original contribution.

Gawk running the following script was the fastest of all possible implementation/script combinations (that I tested):

Quote:

Originally Posted by konsolebox

Code:

awk -v m=3 -v n=9 -v FS=, -v OFS=, -- '{
    j = 0
    for (i = 1; i <= NF; ++i) {
        if (i == m || i == n) {
            ++j
            continue
        }
        $(i - j) = $i
    }
    NF -= j
    print
}'

However, that very same script under Busybox was also the slowest of all interpreter/script combinations (slower even than any run of my original sloth). This script was also the slowest of all under mawk.

The highlighted statements trigger recomputation of $0 in all three implementations, but only gawk implements an optimization to lazily avoid that overhead until $0 itself (not its fields) is referenced. For the details, follow field0_valid in gawk - field.c

There are a lot of systems out there that do not use gawk by default. Even among Linux installations, most embedded systems and most Debian derivatives (including most Ubuntu and Ubuntu-derivative versions) do not use it. For all of them, this revision is a setback.

In the absence of any specifics, in my judgement, your original solution exhibits the best balance of scalability and predictable performance across implementations. Minus the redundant split, the off-by-one in the loop condition, and the printf format string bugs:

Code:

{
    append = 0
    for (i = 1; i <= NF; ++i) {
        if (i != m && i != n) {
            if (append) {
                printf "%s%s", OFS, $i
            } else {
                printf "%s", $i
                append = 1
            }
        }
    }
    print ""
}

In this specific case, though, since there is no constraint requiring AWK and since any cut implementation would outperform any AWK implementation running any of these scripts ... by a significant margin, the performance debate is academic.

Regards,
Alister

Last edited by alister; 08-07-2013 at 11:38 PM..

alister

View Public Profile for alister

Find all posts by alister

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to Print from nth field to mth fields using awk

Hi, Is there any short method to print from a particular field till another filed using awk? Example File: File1 ==== 1|2|acv|vbc|......|100|342 2|3|afg|nhj|.......|100|346 Expected output: File2 ==== acv|vbc|.....|100 afg|nhj|.....|100

2. Shell Programming and Scripting

Using AWK to find top Nth values in Nth column

I have an awk script to find the maximum value of the 2nd column of a 2 column datafile, but I need to find the top 5 maximum values of the 2nd column. Here is the script that works for the maximum value. awk 'BEGIN { subjectmax=$1 ; max=0} $2 >= max {subjectmax=$1 ; max=$2} END {print...

3. Shell Programming and Scripting

Calculating average for every Nth line in the Nth column

Is there an awk script that can easily perform the following operation? I have a data file that is in the format of 1944-12,5.6 1945-01,9.8 1945-02,6.7 1945-03,9.3 1945-04,5.9 1945-05,0.7 1945-06,0.0 1945-07,0.0 1945-08,0.0 1945-09,0.0 1945-10,0.2 1945-11,10.5 1945-12,22.3...

4. Shell Programming and Scripting

Get the nth word of mth line in a file

Hi.. May be a simple question but I just began to write unix scripts a week ago, for sorting some huge amount of experiment data, so I got no common sense about unix scripting and really need your helps... The situation is, I want to read the nth word of mth line in a file, and then store it...

5. Shell Programming and Scripting

Need help with awk statement to break nth column in csv file into 3 separate columns

Hello Members, I have a csv file in the format below. Need help with awk statement to break nth column into 3 separate columns and export the changes to new file. input file --> file.csv cat file.csv|less "product/fruit/mango","location/asia/india","type/alphonso" need output in...

6. Shell Programming and Scripting

Break Column nth in a CSV file into two

Hi Guys, Need help with logic to break Column nth in a CSV file into two for e.g Refer below the second column as the nth column "abcd","","type/beta-version" need output in a following format "abcd","/place/asia/india/mumbai","/product/sw/tomcat","type/beta-version" ...

7. Shell Programming and Scripting

Remove the values from a certain column without deleting the Column name in a .CSV file

8. Shell Programming and Scripting

How to search and replace string from nth column from a file?

I wanted to search for a string and replace it with other string from nth column of a file which is comma seperated which I am able to do with below # For Comma seperated file without quotes awk 'BEGIN{OFS=FS=","}$"'"$ColumnNo"'"=="'"$PPK"'"{$"'"$ColumnNo"'"="'"$NPK"'"}{print}' ${FileName} ...

9. Shell Programming and Scripting

Taking nth column and putting its value in n+1 column using awk