Compute value from more than three consecutive rows

04-16-2016

Registered User

15, 0

Join Date: Jul 2008

Last Activity: 7 September 2016, 3:33 PM EDT

Posts: 15

Thanks Given: 1

Thanked 0 Times in 0 Posts

Compute value from more than three consecutive rows

Hello all, I am working on a file like below:

Code:

site Date time value1 value2
0023 2014-01-01 00:00 32.0 23.7
0023 2014-01-01 01:00 38.0 29.9
0023 2014-01-01 02:00 85.0 26.6
0023 2014-01-01 03:00 34.0 25.3
0023 2014-01-01 04:00 37.0 23.8
0023 2014-01-01 05:00 80.0 20.3
0023 2014-01-01 06:00 90.0 20.0
0023 2014-01-01 07:00 180.0 20.0
0023 2014-01-01 08:00 30.0 20.0

The first column is site, second column is date (whole year of 2014), and third represent time (from 00:00 to 23:00 for each day), fourth and fifth columns are values. I need to compare column 4 and 5 based on the condition below:

For each site (column 1), if column 4 is more than 3 times of columns 5, and this pattern last for equal or more than 3 hours continually, plus the maximum of them must be higher than 100, print all the lines that meet the standard and count how many cases exist for each site. There are totally around 150 sites and each site has hourly data each day. Here is the output I want:

Code:

0023 2014-01-01 05:00 80.0 20.3 1
0023 2014-01-01 06:00 90.0 20.0 1
0023 2014-01-01 07:00 180.0 20.0 1
0023 2014-06-30 23:00 200.0 30.3 2
0023 2014-07-01 00:00 303.0 30.3 2
0023 2014-07-01 01:00 134.0 30.3 2
0025 2014-07-01 01:00 136.0 25.3 1           
0025 2014-07-01 02:00 116.0 25.3 1
0025 2014-07-01 03:00 106.0 25.3 1

Any help is highly appreciated!

Last edited by kathy wang; 04-18-2016 at 03:53 PM.. Reason: code tags

kathy wang

View Public Profile for kathy wang

Find all posts by kathy wang

04-16-2016

Moderator

3,105, 1,603

Join Date: May 2013

Last Activity: 31 August 2020, 1:46 AM EDT

Location: Chennai

Posts: 3,105

Thanks Given: 1,269

Thanked 1,603 Times in 1,369 Posts

Hello Kathy wang,

Please use code tags as per forum rules for commands/Inputs/codes which you use in your posts.
Could you please try following and let me know if this helps you.

Code:

awk 'NR==1{print;next} {split($3, A,":");if($4/$NF>=3){if(site_id==$1){count++};if(!previous){previous=A[1]};if(A[1]-previous==1){P=P?P ORS $0 OFS count:$0 OFS count;Q++;previous=A[1];site_id=$1} else {previous=A[1];site_id=$1}} else {previous=A[1];P=Q=""};if(Q==3){print P;P=""};}'  Input_file

Output will be as follows.

Code:

site Date time value1 value2
0023 2014-01-01 05:00 80.0 20.3 1
0023 2014-01-01 06:00 90.0 20.0 2
0023 2014-01-01 07:00 180.0 20.0 3

I have not tested it with many scenarios, as per your Input_file I have tested, if you have more conditions and terms please mention them with sample Input_file and expected output into code tags and let me know on same.
EDIT: Also one more thing I wanted to know in case there are records where site ids are NOT same but they are fulfilling the other cases what should we do then? As my code above will not take care of it.
So if you want to remove this kind of condition then please do let us know with more details on your requirement. As there can be lots of permutations and combinations could be make out of this, so clear requirement is must here.
EDIT2: Adding a non-one liner form of solution now for same.

Code:

awk 'NR==1{
                print;
                next
          }
          {
                split($3, A,":");
                if($4/$NF>=3){
                                if(site_id==$1){
                                                count++
                                               };
                                if(!previous)  {
                                                previous=A[1]
                                               };
                                if(A[1]-previous==1){
                                                        P=P?P ORS $0 OFS count:$0 OFS count;
                                                        Q++;
                                                        previous=A[1];
                                                        site_id=$1
                                                    }
                                else           {
                                                        previous=A[1];
                                                        site_id=$1
                                               }
                             }
                else         {
                                previous=A[1];
                                P=Q=""
                             };
                if(Q==3)     {
                                print P;
                                P=""
                             };
          }
   '    Input_file

Thanks,
R. Singh

Last edited by RavinderSingh13; 04-16-2016 at 10:45 AM.. Reason: Added onre more condition of count of site id as per user's requirement. EDIT2: Added a note for user now. added a non-one li

This User Gave Thanks to RavinderSingh13 For This Post:

RavinderSingh13

View Public Profile for RavinderSingh13

Find all posts by RavinderSingh13

04-16-2016

Registered User

15, 0

Join Date: Jul 2008

Last Activity: 7 September 2016, 3:33 PM EDT

Posts: 15

Thanks Given: 1

Thanked 0 Times in 0 Posts

@RavinderSingh13, thank you so much for help. However I got error "previous: Event not found." I tried to search "awk keyword previous", but didn't get anything helpful. Would you please explain it more? Really appreciate.

kathy wang

View Public Profile for kathy wang

Find all posts by kathy wang

04-16-2016

Moderator

3,105, 1,603

Join Date: May 2013

Last Activity: 31 August 2020, 1:46 AM EDT

Location: Chennai

Posts: 3,105

Thanks Given: 1,269

Thanked 1,603 Times in 1,369 Posts

Hello Kathy wang,

Sorry, I couldn't understand the error. Could you please mention it more clear with complete information of your requirement and how you are getting error please. For explaination part of code, following may help you in same then.

Code:

awk 'NR==1{                                                                                     ##### When awk is reading very first line of Input_file, then do following actions.
                print;                                                                          ##### print the complete very first line here.
                next                                                                            ##### next is a built in awk keyword, which tells control NOT to go further and skip all next written statements for current(which is very first line) now.
          }
          {
                split($3, A,":");                                                               ##### Now this statement will be executed apart from the first line, I am using split built in function of awk so split 3rd field of the line whose delimiter is ":" colon and storing it into an array named A.
                if($4/$NF>=3){                                                                  ##### Now as per your requirement, I am checking here whenever 4th field is 3 times of $NF(which indicates value of LAST field of each LINE.) field of the line, if this condition is TRUE then do following actions.
                                if(site_id==$1){                                                ##### Here I am checking for a variable named site_id if it has the sae value as previous one or NOT, if it has same value as the previous line ones then execute following statement.
                                                count++                                         ##### Here increasing the value of variable named count one more now. 
                                               };
                                if(!previous)  {                                                ##### Here I am verfiying the value of variable named previous, previous is a variable which will hold the value of your time's(3rd field) 1st value, so that we could make sure the difference between last line(whenever it was satisfying the condition where $4/$NF>=3 is TRUE) and current line's TIME have only 1 hour or min difference.
                                                previous=A[1]                                   ##### Setting up value of array named previous to array A's 1st value here.
                                               };
                                if(A[1]-previous==1){                                           ##### Checking here time differences of the current time's value and the previous time's value, so difference should be one as per your requirement.
                                                        P=P?P ORS $0 OFS count:$0 OFS count;    ##### If above condition is TRUE then I am setting up the value of variable named P to current line's value with the site id's count. Moreover if P already has value then I am making sure P's value should be appended here successfully.
                                                        Q++;                                    ##### Increasing the value of variable named Q here to one, WHERE variable Q is meant for keeping track if 3 consecutive lines have come to satisfy all conditions then it should print the value of P.
                                                        previous=A[1];                          ##### Setting up variable named previous to the array A's 1st value of current line(time value, do do compare operation again for next line.).
                                                        site_id=$1                              ##### Setting up site_id value to $1(first field) of current line.
                                                    }
                                else           {                                                ##### In case difference condition of A[1]-previous is NOT TRUE then perform following actions please. 
                                                        previous=A[1];                          ##### I am setting value of previous variable to A's first value.
                                                        site_id=$1                              ##### Now setting up site_id's value to first field too.
                                               }
                             }
                else         {                                                                  ##### In case condition of $4/$NF>=3 is NOT TRUE then do following actions.
                                previous=A[1];                                                  ##### Setting up variable named previous's value to array A's 1st value for next line's comparisions.
                                P=Q=""                                                          ##### Nulliying the values of variabes named P and Q. Because already condition os FALSE and we need 3 consecutive lines to be satisfied with the conditions so no need of variable named P and Q any value here.
                             };
                if(Q==3)     {                                                                  ##### When variable Q's value is equal to 3 then do following actions.
                                print P;                                                        ##### printing the value of P, which actually will have those 3 consecutive lines which are satisfying all the conditions successfully.
                                P=""                                                            ##### Nullyfing the value of variable P, so that OLD values shouldn't print again while printing the new ones.
                             };
          }
   '    Input_file                                                                              ##### Mentioning the Input_file here.

Hope this helps you.

Thanks,
R. Singh

RavinderSingh13

View Public Profile for RavinderSingh13

Find all posts by RavinderSingh13

04-16-2016

Registered User

15, 0

Join Date: Jul 2008

Last Activity: 7 September 2016, 3:33 PM EDT

Posts: 15

Thanks Given: 1

Thanked 0 Times in 0 Posts

@RavinderSingh13, I got error message when I tried to test your script:

Code:

% awk 'NR==1{print;next} {split($3, A,":");if($4/$NF>=3){if(site_id==$1){count++};if(!previous){previous=A[1]};if(A[1]-previous==1){P=P?P ORS $0 OFS count:$0 OFS count;Q++;previous=A[1];site_id=$1} else {previous=A[1];site_id=$1}} else {previous=A[1];P=Q=""};if(Q==3){print P;P=""};}'  Inputfile
previous: Event not found.

Thank you very much.

Last edited by Don Cragun; 04-17-2016 at 12:39 AM.. Reason: Add CODE tags again.

kathy wang

View Public Profile for kathy wang

Find all posts by kathy wang

04-17-2016

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

It looks like Ravinder accidentally used a single quote in a comment inside a single-quoted script in post #4 in this thread. But, the diagnostic you have shown us doesn't seem to have come from any code Ravinder suggested.

Are you using csh again and getting errors from it mistakenly trying to use its history mechanism inside a single quoted script?

This User Gave Thanks to Don Cragun For This Post:

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

04-17-2016

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

Quote:

Originally Posted by kathy wang

.
.
.
For each site (column 1), if column 4 is more than 3 times of columns 5, and this pattern last for more than 3 hours continually, plus the maximum of them must be higher than 100, print all the lines that meet the standard an
.
.
.

None of your input meets those requirements ("more than"). Assuming each line represents one hour increments, the line number (NR) is relied upon. If that is NOT correct, add some algorithms to account for the time, but take care of "crossing midnight", which makes the calculation more difficult. Making use of the NR assumption, try

Code:

awk '
function PRT()  {if (C3 && MR)  {++CNT
                                 for (i=1;m i<=LC; i++) print LN[i], CNT
                                }
                }

$1 != SITE      {PRT()
                 SITE = $1
                 LC = CNT = MR = C3 = 0
                }

$4/$5 > 3       {LN[++LC] = $0
                 if (!ST)               ST = NR - 1
                 if (NR - ST > 3)       C3 = 1
                 if ($4 > 100)          MR = 1
                 next
                }
                {PRT()
                 ST = LC = MR = C3 = 0
                }
END             {PRT()
                }
' file

It does not produce any output as none of your requirements are met by the input sample.

If you replace if (NR - ST > 3) by if (NR - ST >= 3), the result

Code:

0023 2014-01-01 05:00 80.0 20.3 1
0023 2014-01-01 06:00 90.0 20.0 1
0023 2014-01-01 07:00 180.0 20.0 1
0023 2014-06-30 23:00 200.0 30.3 2
0023 2014-07-01 00:00 303.0 30.3 2
0023 2014-07-01 01:00 134.0 30.3 2
0025 2014-07-01 01:00 136.0 25.3 1
0025 2014-07-01 02:00 116.0 25.3 1
0025 2014-07-01 03:00 106.0 25.3 1

is what you requested in post#1 (with the input extended by some lines of your desired output - or add matching 4th and higher lines).

RudiC

View Public Profile for RudiC

Find all posts by RudiC

Shell Programming and Scripting

Compute value from more than three consecutive rows

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Forward Display from compute node

Discussion started by: Shlaeae

2. Shell Programming and Scripting

Compute in milisecond by use of mktime

Discussion started by: random_thoughts

3. Shell Programming and Scripting

Compute Difference and Edit second, third columns

Discussion started by: jacobs.smith

4. Shell Programming and Scripting

remove consecutive duplicate rows

Discussion started by: LMHmedchem

5. Web Development

How to compute previous and next buttons?

Discussion started by: JerryHone

6. Shell Programming and Scripting

compute compilation time using script

Discussion started by: zainab

7. Shell Programming and Scripting

to compute diskspace

Discussion started by: kenshinhimura

8. Shell Programming and Scripting

How to capture 2 consecutive rows when a condition is true ?

Discussion started by: Raynon

9. UNIX for Advanced & Expert Users

CPU Usage at another Compute Node?

Discussion started by: davidfrank

10. Shell Programming and Scripting

compute total from a text file

Discussion started by: rsf01