Awk: Iterate over all records, stop when value < threshold


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Awk: Iterate over all records, stop when value < threshold
# 8  
Old 03-27-2014
Try - as a starting point - this:
Code:
awk     '                               {arr[NR]=$1; arr2[NR]=$2}
         ($2-arr2[NR-1])<threshold      {n++} 
         $2<cutoff && n>10              {exit}
         END                            {print NR, arr2[NR]-arr2[NR-1], arr2[NR], arr2[NR-1], $2}
        ' cutoff="4.5" threshold="1" file

Be warned - this counts every difference less than threshold, not just the recent ones. But it could give you sth. to work upon and to optimize.
# 9  
Old 03-28-2014
Code:
awk     '
        BEGIN {arr[NR]=$1; arr2[NR]=$2; arr3[NR]+=$2}
        (arr2[NR-1]-$2)<threshold       {n++} 
        {if(NR<299 && $2<cutoff)        {complete="YES"} else {complete="NO"}} 
        {if($2<cutoff && n>5)           {exit}}
        END       {printf "%s    %s%6d  %s   %s   %4.3f  %s  %4.3f  %s  %s", file, "time", NR, "ps",  "iteration ended at COM distance", $2, "average", arr3[NR]/NR, "finished pulling?", complete}
        '


Now I would like to start a second BEGIN/END cycle

Only after $2<cutoff I want to parse the remaining lines, and define the command
Code:
BEGIN
{if($2<cutoff) { arr4[NR]+=$2}} 
END {print arr4[NR]/NR}

However, my problem is:
  • I cannot seem to 'reset' NR / parse the file 2 times
# 10  
Old 03-28-2014
BEGIN happens when awk runs and before it loads any files. END happens when awk finishes reading all files it was told to open. If you want anything to happen inbetween you'll need to check for the right conditions.

FNR is the "resetting" equivalent of NR -- it goes back to 1 every time a new file is read. An old trick to set a special case for the first and only first file is (NR==FNR) { ... }

You can't tell awk to repeat a file, but if you give it the same file twice, it will read it twice. ARGIND will be different for each file, letting you detect when it repeats.

Code:
awk '(FILENUM != ARGIND) {
        if(FILENUM) {
                # "end" section of the previous file, if any
                # Process and print data
        }
        FILENUM=ARGIND;
        {
                # "begin" section for each file
                # Probably a good idea to delete any previous data here.
                for(X in A) delete A[X]
        }
}

(ARGIND == 1) && ($2<cutoff) { A[$1]=something }
(ARGIND == 2) && (something else) { A[$1]=somethingelse }' filename filename


Last edited by Corona688; 03-28-2014 at 03:23 PM..
# 11  
Old 03-28-2014
thanks, but I still have some questions

Code:
awk '(FILENUM != ARGIND) {
        if(FILENUM) {
                # "end" section of the previous file, if any
                # Process and print data
        END {printf "%s    %s%6d  %s   %s   %4.3f  %s  %4.3f  %s  %s", file, "time", NR, "ps",  "iteration ends at COM dist", $2, "average", A3/NR, A4/NR, "finished pulling?", complete}
        }

        FILENUM=ARGIND;
        {(ARGIND == 1) && ($2<cutoff) { A[NR]=$1; A2[NR]=$2; sum+=$2 } 
                # "begin" section for each file
                # Probably a good idea to delete any previous data here.
        (A2[NR-1]-$2)<threshold       {n++} 
        {if(NR<299 && $2<cutoff)        {complete="YES"} else {complete="NO"}} 
        {if($2<cutoff && n>5)           {exit}}
        }

        {
        (ARGIND == 2) && ($2<cutoff) {A4[NR]=$1; A5[NR]=$2; sum2=+$2}
        }

}' cutoff="4" threshold="1" file=$old $old  $old > $old.result


What do you mean with

Quote:
# "end" section of the previous file, if any
# Process and print data
Is this the END section?
Code:
{(ARGIND == 1) && ($2<cutoff) { A[NR]=$1; A2[NR]=$2; sum+=$2 }

---------- Post updated at 02:06 PM ---------- Previous update was at 02:03 PM ----------

Basically, I want to re-parse the file and add up all values in $2 AFTER the first time $2<cutoff


(this is not what I'm doing above as I still need to figure out how to do this command)
# 12  
Old 03-28-2014
You've put everything which was outside {} brackets in extra {} brackets, totally changing their meaning. I can't even tell what that code would actually do now.

By 'begin' section for each file, it's like BEGIN, but happens each and every time awk begins reading a file.

Consider it like this:

Code:
awk '(FILENUM != ARGIND) {
        FILENUM=ARGIND;
        ##########################################
        # Put code here that you want to run on the first line of a file
        ##########################################
}

(ARGIND == 1) && (somecondition) {
        ##########################################
        # Put code here that you want to run for every line of file 1
        ##########################################
}

(ARGIND == 2) && (someothercondition) {
        ##########################################
        # Put code here to run for every line of file 2
        ##########################################
}' filename filename # Note how awk is given the same filename twice, to read it twice

This is not pseudocode. It will actually run like this. It won't do much without the stuff you need to add, also, 'somecondition' and 'someothercondition' need to be replaced with logical expressions of your choice, but this is correct syntax. You do not need to defensively surround it with more {} brackets.
# 13  
Old 03-28-2014
OK thanks will try,

I thought maybe the easiest solution would be to split up the file into 2 separate files at

{$2=cutoff}

and then processing the second file would be simple?
# 14  
Old 03-28-2014
Quote:
Originally Posted by chrisjorg
OK thanks will try,

I thought maybe the easiest solution would be to split up the file into 2 separate files at

{$2=cutoff}

and then processing the second file would be simple?
Maybe I don't understand your question. It's the same data, isn't it?
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Using awk to assign binary values to data above/below a certain threshold?

Hello, I have files containing a large amount of values in columns, and I want to simplify the data by making the values binary, i.e. assigning 1/0 for each value above or below a certain threshold. I am curious to know if it's possible to use awk to do this (or another way in bash). I've... (2 Replies)
Discussion started by: ksennin
2 Replies

2. Shell Programming and Scripting

Skip first and last n records with awk

Hi, I have an awk code that reads an input file, checks the 4th column and tells if its fine. #!/bin/ksh { if ($4 == 0) print "fine" else print "some problem" }' FILENAME My problem is that, I dont want to check the first 3 and last 3 lines. This can be hard coded by using BEGIN and END... (9 Replies)
Discussion started by: gotam
9 Replies

3. Shell Programming and Scripting

iterate through list of numbers and print specific lines with awk

Could someone please point me in the right direction with the following? I have a program that generates logs that contains sections like this: IMAGE INPUT 81 0 0.995 2449470 0 1726 368 1 0.0635 0.3291 82 0 1.001 2448013 0 1666 365 1 0.0649 ... (4 Replies)
Discussion started by: euval
4 Replies

4. Shell Programming and Scripting

Substituting variable value in AWK /start/,/stop/

Hi all u brilient people on the forum... I am trying to call the variable value in awk command for search pattern /start/,/stop/ but i am nt able to do this .... wat i did is ..i have created two variable YESTERDAY and TODAY and passed the y'day n 2'days dates in it...like this ... (14 Replies)
Discussion started by: whomi
14 Replies

5. Shell Programming and Scripting

Counting records with AWK

I've been working with an awk script and I'm wondeing id it's possible to count records in a file which DO NOT contain, in this instance fields 12 and 13. With the one script I am wanting to display the count for the records WITH fields 12 and 13 and a seperate count of records WITHOUT fields... (2 Replies)
Discussion started by: Glyn_Mo
2 Replies

6. UNIX for Dummies Questions & Answers

Iterate a min/max awk script over time-series temperature data

I'm trying to iterate a UNIX awk script that returns min/max temperature data for each day from a monthly weather data file (01_weath.dat). The temperature data is held in $5. The temps are reported each minute so each day contains 1440 temperature enteries. The below code has gotten me as far as... (5 Replies)
Discussion started by: jgourley
5 Replies

7. UNIX for Advanced & Expert Users

AWK aggregate records

Hy all, I have a problem...can some one help me... I have a file of records sort: 30|239|ORD|447702936929 |blackberry.net |20080728|141304|00000900|2|0000000000000536|28181|0000000006|0000000001|10|1 30|239|ORD|447702936929 |blackberry.net ... (4 Replies)
Discussion started by: anaconga
4 Replies

8. Shell Programming and Scripting

awk - Number of records

Hi, Is it possible to find the total number of records processed by awk at begining. NR gives the value at the end. Is there any variable available to find the value at the begining? Thanks ---------- Suman (1 Reply)
Discussion started by: suman_jakkula
1 Replies

9. Shell Programming and Scripting

Stop awk adding a new line

In a loop, I want to append some text to a file without generating a new line (and then force a new line before re-iterating the loop). In the code below the first 'echo' command is OK as it uses '--n' for no new line. For the 'awk' line I *thought* I could solve it by using printf rather than... (1 Reply)
Discussion started by: TobyR
1 Replies

10. UNIX for Dummies Questions & Answers

awk | stop after specified number of results

I am searching some rather large text files using grep and or awk. What I would like to know is if there is a way (either with grep, awk, or realy any other unix tool) to stop the search when a predifined number of results are returned. I would like to do this for speed purpuses. When i get... (6 Replies)
Discussion started by: evan108
6 Replies
Login or Register to Ask a Question