Awk: Iterate over all records, stop when value < threshold

03-27-2014

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

Try - as a starting point - this:

Code:

awk     '                               {arr[NR]=$1; arr2[NR]=$2}
         ($2-arr2[NR-1])<threshold      {n++} 
         $2<cutoff && n>10              {exit}
         END                            {print NR, arr2[NR]-arr2[NR-1], arr2[NR], arr2[NR-1], $2}
        ' cutoff="4.5" threshold="1" file

Be warned - this counts every difference less than threshold, not just the recent ones. But it could give you sth. to work upon and to optimize.

RudiC

View Public Profile for RudiC

Find all posts by RudiC

03-28-2014

Registered User

89, 1

Join Date: Oct 2010

Last Activity: 19 July 2017, 8:11 AM EDT

Posts: 89

Thanks Given: 18

Thanked 1 Time in 1 Post

Code:

awk     '
        BEGIN {arr[NR]=$1; arr2[NR]=$2; arr3[NR]+=$2}
        (arr2[NR-1]-$2)<threshold       {n++} 
        {if(NR<299 && $2<cutoff)        {complete="YES"} else {complete="NO"}} 
        {if($2<cutoff && n>5)           {exit}}
        END       {printf "%s    %s%6d  %s   %s   %4.3f  %s  %4.3f  %s  %s", file, "time", NR, "ps",  "iteration ended at COM distance", $2, "average", arr3[NR]/NR, "finished pulling?", complete}
        '

Now I would like to start a second BEGIN/END cycle

Only after $2<cutoff I want to parse the remaining lines, and define the command

Code:

BEGIN
{if($2<cutoff) { arr4[NR]+=$2}} 
END {print arr4[NR]/NR}

However, my problem is:

I cannot seem to 'reset' NR / parse the file 2 times

chrisjorg

View Public Profile for chrisjorg

Find all posts by chrisjorg

03-28-2014

Registered User

23,310, 4,623

Join Date: Aug 2005

Last Activity: 7 July 2020, 11:47 AM EDT

Location: Saskatchewan

Posts: 23,310

Thanks Given: 1,331

Thanked 4,623 Times in 4,217 Posts

BEGIN happens when awk runs and before it loads any files. END happens when awk finishes reading all files it was told to open. If you want anything to happen inbetween you'll need to check for the right conditions.

FNR is the "resetting" equivalent of NR -- it goes back to 1 every time a new file is read. An old trick to set a special case for the first and only first file is (NR==FNR) { ... }

You can't tell awk to repeat a file, but if you give it the same file twice, it will read it twice. ARGIND will be different for each file, letting you detect when it repeats.

Code:

awk '(FILENUM != ARGIND) {
        if(FILENUM) {
                # "end" section of the previous file, if any
                # Process and print data
        }
        FILENUM=ARGIND;
        {
                # "begin" section for each file
                # Probably a good idea to delete any previous data here.
                for(X in A) delete A[X]
        }
}

(ARGIND == 1) && ($2<cutoff) { A[$1]=something }
(ARGIND == 2) && (something else) { A[$1]=somethingelse }' filename filename

Last edited by Corona688; 03-28-2014 at 03:23 PM..

Corona688

View Public Profile for Corona688

Visit Corona688's homepage!

Find all posts by Corona688

03-28-2014

Registered User

89, 1

Join Date: Oct 2010

Last Activity: 19 July 2017, 8:11 AM EDT

Posts: 89

Thanks Given: 18

Thanked 1 Time in 1 Post

thanks, but I still have some questions

Code:

awk '(FILENUM != ARGIND) {
        if(FILENUM) {
                # "end" section of the previous file, if any
                # Process and print data
        END {printf "%s    %s%6d  %s   %s   %4.3f  %s  %4.3f  %s  %s", file, "time", NR, "ps",  "iteration ends at COM dist", $2, "average", A3/NR, A4/NR, "finished pulling?", complete}
        }

        FILENUM=ARGIND;
        {(ARGIND == 1) && ($2<cutoff) { A[NR]=$1; A2[NR]=$2; sum+=$2 } 
                # "begin" section for each file
                # Probably a good idea to delete any previous data here.
        (A2[NR-1]-$2)<threshold       {n++} 
        {if(NR<299 && $2<cutoff)        {complete="YES"} else {complete="NO"}} 
        {if($2<cutoff && n>5)           {exit}}
        }

        {
        (ARGIND == 2) && ($2<cutoff) {A4[NR]=$1; A5[NR]=$2; sum2=+$2}
        }

}' cutoff="4" threshold="1" file=$old $old  $old > $old.result

What do you mean with

Quote:

# "end" section of the previous file, if any
# Process and print data

Is this the END section?

Code:

{(ARGIND == 1) && ($2<cutoff) { A[NR]=$1; A2[NR]=$2; sum+=$2 }

---------- Post updated at 02:06 PM ---------- Previous update was at 02:03 PM ----------

Basically, I want to re-parse the file and add up all values in $2 AFTER the first time $2<cutoff

(this is not what I'm doing above as I still need to figure out how to do this command)

chrisjorg

View Public Profile for chrisjorg

Find all posts by chrisjorg

03-28-2014

Registered User

23,310, 4,623

Join Date: Aug 2005

Last Activity: 7 July 2020, 11:47 AM EDT

Location: Saskatchewan

Posts: 23,310

Thanks Given: 1,331

Thanked 4,623 Times in 4,217 Posts

You've put everything which was outside {} brackets in extra {} brackets, totally changing their meaning. I can't even tell what that code would actually do now.

By 'begin' section for each file, it's like BEGIN, but happens each and every time awk begins reading a file.

Consider it like this:

Code:

awk '(FILENUM != ARGIND) {
        FILENUM=ARGIND;
        ##########################################
        # Put code here that you want to run on the first line of a file
        ##########################################
}

(ARGIND == 1) && (somecondition) {
        ##########################################
        # Put code here that you want to run for every line of file 1
        ##########################################
}

(ARGIND == 2) && (someothercondition) {
        ##########################################
        # Put code here to run for every line of file 2
        ##########################################
}' filename filename # Note how awk is given the same filename twice, to read it twice

This is not pseudocode. It will actually run like this. It won't do much without the stuff you need to add, also, 'somecondition' and 'someothercondition' need to be replaced with logical expressions of your choice, but this is correct syntax. You do not need to defensively surround it with more {} brackets.

Corona688

View Public Profile for Corona688

Visit Corona688's homepage!

Find all posts by Corona688

03-28-2014

Registered User

89, 1

Join Date: Oct 2010

Last Activity: 19 July 2017, 8:11 AM EDT

Posts: 89

Thanks Given: 18

Thanked 1 Time in 1 Post

OK thanks will try,

I thought maybe the easiest solution would be to split up the file into 2 separate files at

{$2=cutoff}

and then processing the second file would be simple?

chrisjorg

View Public Profile for chrisjorg

Find all posts by chrisjorg

03-28-2014

Registered User

23,310, 4,623

Join Date: Aug 2005

Last Activity: 7 July 2020, 11:47 AM EDT

Location: Saskatchewan

Posts: 23,310

Thanks Given: 1,331

Thanked 4,623 Times in 4,217 Posts

Quote:

Originally Posted by chrisjorg

OK thanks will try,

I thought maybe the easiest solution would be to split up the file into 2 separate files at

{$2=cutoff}

and then processing the second file would be simple?

Maybe I don't understand your question. It's the same data, isn't it?

Corona688

View Public Profile for Corona688

Visit Corona688's homepage!

Find all posts by Corona688

Shell Programming and Scripting

Awk: Iterate over all records, stop when value < threshold

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Using awk to assign binary values to data above/below a certain threshold?

Discussion started by: ksennin

2. Shell Programming and Scripting

Skip first and last n records with awk

Discussion started by: gotam

3. Shell Programming and Scripting

iterate through list of numbers and print specific lines with awk

Discussion started by: euval

4. Shell Programming and Scripting

Substituting variable value in AWK /start/,/stop/

Discussion started by: whomi

5. Shell Programming and Scripting

Counting records with AWK

Discussion started by: Glyn_Mo

6. UNIX for Dummies Questions & Answers

Iterate a min/max awk script over time-series temperature data

Discussion started by: jgourley

7. UNIX for Advanced & Expert Users

AWK aggregate records

Discussion started by: anaconga

8. Shell Programming and Scripting

awk - Number of records

Discussion started by: suman_jakkula

9. Shell Programming and Scripting

Stop awk adding a new line

Discussion started by: TobyR

10. UNIX for Dummies Questions & Answers

awk | stop after specified number of results

Discussion started by: evan108