Compute value from more than three consecutive rows


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Compute value from more than three consecutive rows
# 1  
Old 04-16-2016
Compute value from more than three consecutive rows

Hello all, I am working on a file like below:

Code:
site Date time value1 value2
0023 2014-01-01 00:00 32.0 23.7
0023 2014-01-01 01:00 38.0 29.9
0023 2014-01-01 02:00 85.0 26.6
0023 2014-01-01 03:00 34.0 25.3
0023 2014-01-01 04:00 37.0 23.8
0023 2014-01-01 05:00 80.0 20.3
0023 2014-01-01 06:00 90.0 20.0
0023 2014-01-01 07:00 180.0 20.0
0023 2014-01-01 08:00 30.0 20.0

The first column is site, second column is date (whole year of 2014), and third represent time (from 00:00 to 23:00 for each day), fourth and fifth columns are values. I need to compare column 4 and 5 based on the condition below:

For each site (column 1), if column 4 is more than 3 times of columns 5, and this pattern last for equal or more than 3 hours continually, plus the maximum of them must be higher than 100, print all the lines that meet the standard and count how many cases exist for each site. There are totally around 150 sites and each site has hourly data each day. Here is the output I want:

Code:
0023 2014-01-01 05:00 80.0 20.3 1
0023 2014-01-01 06:00 90.0 20.0 1
0023 2014-01-01 07:00 180.0 20.0 1
0023 2014-06-30 23:00 200.0 30.3 2
0023 2014-07-01 00:00 303.0 30.3 2
0023 2014-07-01 01:00 134.0 30.3 2
0025 2014-07-01 01:00 136.0 25.3 1           
0025 2014-07-01 02:00 116.0 25.3 1
0025 2014-07-01 03:00 106.0 25.3 1

Any help is highly appreciated!

Last edited by kathy wang; 04-18-2016 at 03:53 PM.. Reason: code tags
# 2  
Old 04-16-2016
Hello Kathy wang,

Please use code tags as per forum rules for commands/Inputs/codes which you use in your posts.
Could you please try following and let me know if this helps you.
Code:
awk 'NR==1{print;next} {split($3, A,":");if($4/$NF>=3){if(site_id==$1){count++};if(!previous){previous=A[1]};if(A[1]-previous==1){P=P?P ORS $0 OFS count:$0 OFS count;Q++;previous=A[1];site_id=$1} else {previous=A[1];site_id=$1}} else {previous=A[1];P=Q=""};if(Q==3){print P;P=""};}'  Input_file

Output will be as follows.
Code:
site Date time value1 value2
0023 2014-01-01 05:00 80.0 20.3 1
0023 2014-01-01 06:00 90.0 20.0 2
0023 2014-01-01 07:00 180.0 20.0 3

I have not tested it with many scenarios, as per your Input_file I have tested, if you have more conditions and terms please mention them with sample Input_file and expected output into code tags and let me know on same.
EDIT: Also one more thing I wanted to know in case there are records where site ids are NOT same but they are fulfilling the other cases what should we do then? As my code above will not take care of it.
So if you want to remove this kind of condition then please do let us know with more details on your requirement. As there can be lots of permutations and combinations could be make out of this, so clear requirement is must here.
EDIT2: Adding a non-one liner form of solution now for same.
Code:
awk 'NR==1{
                print;
                next
          }
          {
                split($3, A,":");
                if($4/$NF>=3){
                                if(site_id==$1){
                                                count++
                                               };
                                if(!previous)  {
                                                previous=A[1]
                                               };
                                if(A[1]-previous==1){
                                                        P=P?P ORS $0 OFS count:$0 OFS count;
                                                        Q++;
                                                        previous=A[1];
                                                        site_id=$1
                                                    }
                                else           {
                                                        previous=A[1];
                                                        site_id=$1
                                               }
                             }
                else         {
                                previous=A[1];
                                P=Q=""
                             };
                if(Q==3)     {
                                print P;
                                P=""
                             };
          }
   '    Input_file

Thanks,
R. Singh

Last edited by RavinderSingh13; 04-16-2016 at 10:45 AM.. Reason: Added onre more condition of count of site id as per user's requirement. EDIT2: Added a note for user now. added a non-one li
This User Gave Thanks to RavinderSingh13 For This Post:
# 3  
Old 04-16-2016
@RavinderSingh13, thank you so much for help. However I got error "previous: Event not found." I tried to search "awk keyword previous", but didn't get anything helpful. Would you please explain it more? Really appreciate.
# 4  
Old 04-16-2016
Hello Kathy wang,

Sorry, I couldn't understand the error. Could you please mention it more clear with complete information of your requirement and how you are getting error please. For explaination part of code, following may help you in same then.
Code:
awk 'NR==1{                                                                                     ##### When awk is reading very first line of Input_file, then do following actions.
                print;                                                                          ##### print the complete very first line here.
                next                                                                            ##### next is a built in awk keyword, which tells control NOT to go further and skip all next written statements for current(which is very first line) now.
          }
          {
                split($3, A,":");                                                               ##### Now this statement will be executed apart from the first line, I am using split built in function of awk so split 3rd field of the line whose delimiter is ":" colon and storing it into an array named A.
                if($4/$NF>=3){                                                                  ##### Now as per your requirement, I am checking here whenever 4th field is 3 times of $NF(which indicates value of LAST field of each LINE.) field of the line, if this condition is TRUE then do following actions.
                                if(site_id==$1){                                                ##### Here I am checking for a variable named site_id if it has the sae value as previous one or NOT, if it has same value as the previous line ones then execute following statement.
                                                count++                                         ##### Here increasing the value of variable named count one more now. 
                                               };
                                if(!previous)  {                                                ##### Here I am verfiying the value of variable named previous, previous is a variable which will hold the value of your time's(3rd field) 1st value, so that we could make sure the difference between last line(whenever it was satisfying the condition where $4/$NF>=3 is TRUE) and current line's TIME have only 1 hour or min difference.
                                                previous=A[1]                                   ##### Setting up value of array named previous to array A's 1st value here.
                                               };
                                if(A[1]-previous==1){                                           ##### Checking here time differences of the current time's value and the previous time's value, so difference should be one as per your requirement.
                                                        P=P?P ORS $0 OFS count:$0 OFS count;    ##### If above condition is TRUE then I am setting up the value of variable named P to current line's value with the site id's count. Moreover if P already has value then I am making sure P's value should be appended here successfully.
                                                        Q++;                                    ##### Increasing the value of variable named Q here to one, WHERE variable Q is meant for keeping track if 3 consecutive lines have come to satisfy all conditions then it should print the value of P.
                                                        previous=A[1];                          ##### Setting up variable named previous to the array A's 1st value of current line(time value, do do compare operation again for next line.).
                                                        site_id=$1                              ##### Setting up site_id value to $1(first field) of current line.
                                                    }
                                else           {                                                ##### In case difference condition of A[1]-previous is NOT TRUE then perform following actions please. 
                                                        previous=A[1];                          ##### I am setting value of previous variable to A's first value.
                                                        site_id=$1                              ##### Now setting up site_id's value to first field too.
                                               }
                             }
                else         {                                                                  ##### In case condition of $4/$NF>=3 is NOT TRUE then do following actions.
                                previous=A[1];                                                  ##### Setting up variable named previous's value to array A's 1st value for next line's comparisions.
                                P=Q=""                                                          ##### Nulliying the values of variabes named P and Q. Because already condition os FALSE and we need 3 consecutive lines to be satisfied with the conditions so no need of variable named P and Q any value here.
                             };
                if(Q==3)     {                                                                  ##### When variable Q's value is equal to 3 then do following actions.
                                print P;                                                        ##### printing the value of P, which actually will have those 3 consecutive lines which are satisfying all the conditions successfully.
                                P=""                                                            ##### Nullyfing the value of variable P, so that OLD values shouldn't print again while printing the new ones.
                             };
          }
   '    Input_file                                                                              ##### Mentioning the Input_file here.

Hope this helps you.

Thanks,
R. Singh
# 5  
Old 04-16-2016
@RavinderSingh13, I got error message when I tried to test your script:

Code:
% awk 'NR==1{print;next} {split($3, A,":");if($4/$NF>=3){if(site_id==$1){count++};if(!previous){previous=A[1]};if(A[1]-previous==1){P=P?P ORS $0 OFS count:$0 OFS count;Q++;previous=A[1];site_id=$1} else {previous=A[1];site_id=$1}} else {previous=A[1];P=Q=""};if(Q==3){print P;P=""};}'  Inputfile
previous: Event not found.


Thank you very much.

Last edited by Don Cragun; 04-17-2016 at 12:39 AM.. Reason: Add CODE tags again.
# 6  
Old 04-17-2016
It looks like Ravinder accidentally used a single quote in a comment inside a single-quoted script in post #4 in this thread. But, the diagnostic you have shown us doesn't seem to have come from any code Ravinder suggested.

Are you using csh again and getting errors from it mistakenly trying to use its history mechanism inside a single quoted script?
This User Gave Thanks to Don Cragun For This Post:
# 7  
Old 04-17-2016
Quote:
Originally Posted by kathy wang
.
.
.
For each site (column 1), if column 4 is more than 3 times of columns 5, and this pattern last for more than 3 hours continually, plus the maximum of them must be higher than 100, print all the lines that meet the standard an
.
.
.
None of your input meets those requirements ("more than"). Assuming each line represents one hour increments, the line number (NR) is relied upon. If that is NOT correct, add some algorithms to account for the time, but take care of "crossing midnight", which makes the calculation more difficult. Making use of the NR assumption, try

Code:
awk '
function PRT()  {if (C3 && MR)  {++CNT
                                 for (i=1;m i<=LC; i++) print LN[i], CNT
                                }
                }

$1 != SITE      {PRT()
                 SITE = $1
                 LC = CNT = MR = C3 = 0
                }

$4/$5 > 3       {LN[++LC] = $0
                 if (!ST)               ST = NR - 1
                 if (NR - ST > 3)       C3 = 1
                 if ($4 > 100)          MR = 1
                 next
                }
                {PRT()
                 ST = LC = MR = C3 = 0
                }
END             {PRT()
                }
' file

It does not produce any output as none of your requirements are met by the input sample.

If you replace if (NR - ST > 3) by if (NR - ST >= 3), the result
Code:
0023 2014-01-01 05:00 80.0 20.3 1
0023 2014-01-01 06:00 90.0 20.0 1
0023 2014-01-01 07:00 180.0 20.0 1
0023 2014-06-30 23:00 200.0 30.3 2
0023 2014-07-01 00:00 303.0 30.3 2
0023 2014-07-01 01:00 134.0 30.3 2
0025 2014-07-01 01:00 136.0 25.3 1
0025 2014-07-01 02:00 116.0 25.3 1
0025 2014-07-01 03:00 106.0 25.3 1

is what you requested in post#1 (with the input extended by some lines of your desired output - or add matching 4th and higher lines).
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Forward Display from compute node

Hello, sorry if this is an easy thing, but I tried to make it run the whole morning, but it doesn't work so far... I am logged in on a visualization node of a high performance cluster. On this node I can run several gui-based software for the postprocessing of my cfd-data. As the... (1 Reply)
Discussion started by: Shlaeae
1 Replies

2. Shell Programming and Scripting

Compute in milisecond by use of mktime

Hi, I want to calculate diff b/w these starttime and endtime with use of mktime. I need response time in milisecond. I am using mktime to get these times. last three digits are in milisecond Starttime 2013-04-03 08:54:19,989 End time 2013-04-03 08:54:39,389 (9 Replies)
Discussion started by: random_thoughts
9 Replies

3. Shell Programming and Scripting

Compute Difference and Edit second, third columns

Hi Friends, My input file is like this chr1 100 200 chr1 300 330 chr1 2000 2000 chr1 5000 5000 chr2 7790 7890 chr2 8000 8000 If the difference of third and second columns is zero, then subtract 500 from second column and add 500 to the third column. So, my output would be chr1... (1 Reply)
Discussion started by: jacobs.smith
1 Replies

4. Shell Programming and Scripting

remove consecutive duplicate rows

I have some data that looks like, 1 3300665.mol 3300665 5177008 102.093 2 3300665.mol 3300665 5177008 102.093 3 3294015.mol 3294015 5131552 102.114 4 3294015.mol 3294015 5131552 102.114 5 3293734.mol 3293734 5129625 104.152 6 3293734.mol ... (13 Replies)
Discussion started by: LMHmedchem
13 Replies

5. Web Development

How to compute previous and next buttons?

I have a project to migrate my club's membership database from Access to web based using MySQL/PHP, but I have a problem I can't get my head around and would appreciate some help... Background... I want to be able to display each member's data on screen and add a Previous and Next button to move... (2 Replies)
Discussion started by: JerryHone
2 Replies

6. Shell Programming and Scripting

compute compilation time using script

Hi, I use this script to compute compilation time several time to get system performance and compare different system: #!/bin/sh # measure the different between time before and # after the compilation of benchmark # Start at iteration 1 num=1 while do # Add one to the iteration... (3 Replies)
Discussion started by: zainab
3 Replies

7. Shell Programming and Scripting

to compute diskspace

Guys, have any idea for the script like this? also to compute w/ decimal. thanks a=10 b=20 c=30 d=40 if a < b then ( a -b)*1024 = free space b + (c -d) = total space if a > b then (b / d)*1024 = cpu (3 Replies)
Discussion started by: kenshinhimura
3 Replies

8. Shell Programming and Scripting

How to capture 2 consecutive rows when a condition is true ?

Hi All, i have an input below. As long as "x= 1" , i would want to capture 2 lines using sed or awk for eg : 0001 x= 1 $---------------------------------..-.--.. 0001 tt= 137 171 423 1682 2826 0 Pls help. Thanks in advance. Note that the number of lines in each block do... (37 Replies)
Discussion started by: Raynon
37 Replies

9. UNIX for Advanced & Expert Users

CPU Usage at another Compute Node?

Hi, I am trying to get the cpu usage of a job/process. The machine that has these jobs is a cluster with 1 master node and 10 compute nodes. Now, the complexity is that not all jobs are on the master node. So, in order to get the cpu usage of a job at another node, I have to ssh into it and... (0 Replies)
Discussion started by: davidfrank
0 Replies

10. Shell Programming and Scripting

compute total from a text file

Hi, I've encountered a problem with a perl and ksh script that totals a certain field in a text file. The computed total of the script is 295540304 but the expected is 297959288, a 2 million difference. The KSH script reads from bottom to top, and the discrepancy started on line 47 (1279th MAN... (1 Reply)
Discussion started by: rsf01
1 Replies
Login or Register to Ask a Question