awk Moving Average


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk Moving Average
# 1  
Old 08-04-2017
awk Moving Average

Hi, I'm using awk to try and get a moving average for the second column of numbers ($2) in the below example broken out by unique identifier in column 1 ($1) :

Code:
H1,1.2
H1,2.3
H1,5.5
H1,6.6
H1,8.7
H1,4.1
H1,6.4
H1,7.8
H1,9.6
H1,3.2
H5,50.1
H5,54.2
H5,58.8
H5,60.9
H5,54.3
H5,52.7
H5,53.8
H5,60.1
H5,59.9
H5,59.4
H5,57.1

My expected output is:

Code:
H1,1.2,3.0 
H1,2.3,3.9 
H1,5.5,4.9 
H1,6.6,5.4 
H1,8.7,6.3 
H1,4.1,6.7 
H1,6.4,7.3 
H1,7.8,6.2 
H1,9.6,6.8 
H1,3.2,6.9 
H5,50.1,54.4 
H5,54.2,56.0 
H5,58.8,55.7 
H5,60.9,56.2 
H5,54.3,56.1 
H5,52.7,56.4 
H5,53.8,56.2 
H5,60.1,57.2 
H5,59.9,58.1 
H5,59.4,59.1 
H5,57.1,58.8

The window for the moving average is 5, except for the first 2 row numbers and the last two row numbers for each unique identifier where the window will be 3 and 4.

I have tried modifying the below code. The first problem i'm having is setting up an associative array e.g. n[$1]=$2. The second problem is the changing window for the moving average. Any help would be appreciated Smilie

Code:
gawk -F"," 'BEGIN {N=5} {n[NR]=$2} NR>=N {x=0; delete n[NR-N]; for(i in n) x+=n[i]; print $1,n[NR-(N-1)],x/N}' OFS=, Test.csv

# 2  
Old 08-04-2017
Not too simple nor straightforward, really messy, but try
Code:
awk -F, '
function PRE()  {for (j=CNT-DT1+1; j<=CNT; j++) {SUM = 0
                                                 for (i=j-DT1; i<=CNT; i++) SUM += VAL[i]
                                                 printf "%s,%s,%.1f\n", LAST, VAL[j], SUM/(CNT-j+2*DT1-1)
                                                }
                }

NR == 1         {DT1 = int(TGTN/2)
                }

NR > 1 &&
$1 != LAST      {PRE()
                 CNT = 0
                }

                {VAL[++CNT] = $2
                 LAST       = $1
                 DT2        = CNT - TGTN
                 if (CNT > DT1) {ST = CNT - 2 * DT1 - (DT2<0?DT2:0)
                                 SUM = 0
                                 for (i=ST; i<=CNT; i++) SUM += VAL[i]
                                 printf "%s,%s,%.1f\n", LAST, VAL[CNT-DT1], SUM/(CNT-ST+1)
                                }
                }
END             {PRE()
                }
' OFS=,  TGTN=5 file
H1,1.2,3.0
H1,2.3,3.9
H1,5.5,4.9
H1,6.6,5.4
H1,8.7,6.3
H1,4.1,6.7
H1,6.4,7.3
H1,7.8,6.2
H1,9.6,6.7
H1,3.2,6.9
H5,50.1,54.4
H5,54.2,56.0
H5,58.8,55.7
H5,60.9,56.2
H5,54.3,56.1
H5,52.7,56.4
H5,53.8,56.2
H5,60.1,57.2
H5,59.9,58.1
H5,59.4,59.1
H5,57.1,58.8

There seems to be a rounding discrepancy in line 9.

Last edited by RudiC; 08-04-2017 at 05:27 PM..
This User Gave Thanks to RudiC For This Post:
# 3  
Old 08-04-2017
Wow, thanks so much RudiC. I'm still going over your reply trying to digest parts of your code. Much appreciated.
# 4  
Old 08-05-2017
Code:
$ 
$ cat data.txt
H1,1.2
H1,2.3
H1,5.5
H1,6.6
H1,8.7
H1,4.1
H1,6.4
H1,7.8
H1,9.6
H1,3.2
H5,50.1
H5,54.2
H5,58.8
H5,60.9
H5,54.3
H5,52.7
H5,53.8
H5,60.1
H5,59.9
H5,59.4
H5,57.1
$ 
$ awk -F, '
    BEGIN {window=5; half=1+int(window/2); iter=1; cnt=1; n=1; off=1; ptr=1}
    { if ($1 != prev) {
          while (cnt > half) {
              cnt--
              split(line[off], delta, ",")
              sum -= delta[2]
              off++
              avg = sum / cnt
              printf("%s,%.1f\n", line[ptr], avg)
              ptr++
          }
          delete line
          sum = 0; iter = 1; cnt = 1;
          off = 1; ptr = 1; n = 1
      }
      line[n] = $0
      n++
      sum += $2
      if (iter >= half) {
          if (iter == half) {
              cnt = half
          } else {
              cnt++
              if (cnt > window) {
                  cnt = window
                  split(line[off], delta, ",")
                  sum -= delta[2]
                  off++
              }
          }
          avg = sum / cnt
          printf("%s,%.1f\n", line[ptr], avg)
          ptr++
      }
      prev = $1
      iter++
    }
    END {
        while (cnt > half) {
            cnt--
            split(line[off], delta, ",")
            sum -= delta[2]
            off++
            avg = sum / cnt
            printf("%s,%.1f\n", line[ptr], avg)
            ptr++
        }
    }
' data.txt
H1,1.2,3.0
H1,2.3,3.9
H1,5.5,4.9
H1,6.6,5.4
H1,8.7,6.3
H1,4.1,6.7
H1,6.4,7.3
H1,7.8,6.2
H1,9.6,6.8
H1,3.2,6.9
H5,50.1,54.4
H5,54.2,56.0
H5,58.8,55.7
H5,60.9,56.2
H5,54.3,56.1
H5,52.7,56.4
H5,53.8,56.2
H5,60.1,57.2
H5,59.9,58.1
H5,59.4,59.1
H5,57.1,58.8
$ 
$

---------- Post updated at 03:34 AM ---------- Previous update was at 02:46 AM ----------

Calculating moving average appears to be pretty convoluted in imperative languages; however a declarative language like (modern) SQL provides syntactic sugar for solving this problem.

By any chance if you are extracting your data from a database, then you may want to check and leverage the power of SQL (specifically, "window" functions) to calculate moving average.
Here's an example with PostgreSQL 9.6, although window functions are available in Oracle, DB2, SQL Server, MySQL and MariaDB as well.

Code:
mydb=> 
mydb=> SELECT VERSION();
                                              version                                              
---------------------------------------------------------------------------------------------------
 PostgreSQL 9.6.3 on x86_64-pc-linux-gnu, compiled by gcc (Debian 6.3.0-18) 6.3.0 20170516, 64-bit
(1 row)

mydb=> 
mydb=> SELECT * FROM data;
 id | val  
----+------
 H1 |  1.2
 H1 |  2.3
 H1 |  5.5
 H1 |  6.6
 H1 |  8.7
 H1 |  4.1
 H1 |  6.4
 H1 |  7.8
 H1 |  9.6
 H1 |  3.2
 H5 | 50.1
 H5 | 54.2
 H5 | 58.8
 H5 | 60.9
 H5 | 54.3
 H5 | 52.7
 H5 | 53.8
 H5 | 60.1
 H5 | 59.9
 H5 | 59.4
 H5 | 57.1
(21 rows)

mydb=> 
mydb=> SELECT id,
mydb->        val,
mydb->        ROUND(
mydb(>            AVG(val) OVER (PARTITION BY id
mydb(>                           ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING
mydb(>                          )::NUMERIC,
mydb(>            1
mydb(>        ) AS moving_avg
mydb->   FROM data
mydb-> ;
 id | val  | moving_avg 
----+------+------------
 H1 |  1.2 |        3.0
 H1 |  2.3 |        3.9
 H1 |  5.5 |        4.9
 H1 |  6.6 |        5.4
 H1 |  8.7 |        6.3
 H1 |  4.1 |        6.7
 H1 |  6.4 |        7.3
 H1 |  7.8 |        6.2
 H1 |  9.6 |        6.8
 H1 |  3.2 |        6.9
 H5 | 50.1 |       54.4
 H5 | 54.2 |       56.0
 H5 | 58.8 |       55.7
 H5 | 60.9 |       56.2
 H5 | 54.3 |       56.1
 H5 | 52.7 |       56.4
 H5 | 53.8 |       56.2
 H5 | 60.1 |       57.2
 H5 | 59.9 |       58.1
 H5 | 59.4 |       59.1
 H5 | 57.1 |       58.8
(21 rows)

mydb=> 
mydb=>

# 5  
Old 08-05-2017
Thanks Durden_Tyler. I am extracting the table from a database and do have access to SQL Server. I never tried using SQL Server but from what you tell me it will make my life easier and maybe i should give it a go. Thanks again for the code and extra tips.
Theflamingmoe.
# 6  
Old 08-07-2017
Quote:
Originally Posted by theflamingmoe
...
I am extracting the table from a database and do have access to SQL Server. I never tried using SQL Server ...
...
If you have the SQL Server client installed in your computer, then you will have the GUI tool "Microsoft SQL Server Management Studio", where you can write and execute SQL queries.
Window or analytic functions were introduced in SQL Server 2012.
You could determine the SQL Server version by issuing the following query:
Code:
select @@version
go

For SQL Server 2012 and higher, either of the following queries should work - assuming the columns "id", and "val" in the table "data" have the data you presented in your 1st post.

Code:
-- Query1 : Tested on MS SQL Server 2014 at sqlfiddle.com
select x.id,
       x.val,
       format(avg(x.val) over (partition by x.id
                               order by x.rn
                               rows between 2 preceding and 2 following
                              ),
              'N1'
             ) as mov_avg
  from (
            select id, val,
                   row_number() over (partition by 1 order by (select 1) asc) as rn
              from data
       ) x
go
  
-- Query2 : Tested on MS SQL Server 2014 at sqlfiddle.com
with x (id, val, rn) as (
    select id, val,
           row_number() over (partition by 1 order by (select 1) asc) as rn
      from data
)
select x.id,
       x.val,
       format(avg(x.val) over (partition by x.id
                               order by x.rn
                               rows between 2 preceding and 2 following
                              ),
              'N1'
             ) as mov_avg
  from x
go

For versions lower than 2012 that do not have analytic functions, you could join the "data" table by itself and determine 5-day moving averages:

Code:
with x (id, val, rn) as (
    select id, val,
           row_number() over (partition by id order by (select 1) asc) as rn
      from data
)
select x1.id as id,
       x1.val as val,
       convert(decimal(6,1), sum(x2.val)/count(x2.id)) as moving_avg
  from x as x1
       join x as x2
       on (x2.id = x1.id and
           x1.rn >= x2.rn - 2 and
           x1.rn <= x2.rn + 2
          )
group by x1.id, x1.rn, x1.val
go

Here's a log of my execution of the final query in my very old MS SQL Server 2008.
I have used the command-line utility "sqlcmd" instead of SQL Server Management Studio.

Code:
C:\data>
C:\data>sqlcmd -S <server_name> -U <login_id>
Password: <entered_password_here>
1>
2> select @@version
3> go
 ------------------------------------------------------------------------------------------
Microsoft SQL Server 2008 R2 (SP3-OD) (KB3144114) - 10.50.6542.0 (X64)
        Feb 22 2016 18:07:23
        Copyright (c) Microsoft Corporation
        Enterprise Edition (64-bit) on Windows NT 6.1 <X64> (Build 7601: Service Pack 1)
 
(1 rows affected)
1>
2> --
3> with data (id, val) as (
4>     select 'H1', 1.2  union all
5>     select 'H1', 2.3  union all
6>     select 'H1', 5.5  union all
7>     select 'H1', 6.6  union all
8>     select 'H1', 8.7  union all
9>     select 'H1', 4.1  union all
10>     select 'H1', 6.4  union all
11>     select 'H1', 7.8  union all
12>     select 'H1', 9.6  union all
13>     select 'H1', 3.2  union all
14>     select 'H5', 50.1 union all
15>     select 'H5', 54.2 union all
16>     select 'H5', 58.8 union all
17>     select 'H5', 60.9 union all
18>     select 'H5', 54.3 union all
19>     select 'H5', 52.7 union all
20>     select 'H5', 53.8 union all
21>     select 'H5', 60.1 union all
22>     select 'H5', 59.9 union all
23>     select 'H5', 59.4 union all
24>     select 'H5', 57.1
25> ),
26> x (id, val, rn) as (
27>     select id, val,
28>            row_number() over (partition by id order by (select 1) asc) as rn
29>       from data
30> )
31> select x1.id as id,
32>        x1.val as val,
33>        convert(decimal(6,1), sum(x2.val)/count(x2.id)) as moving_avg
34>   from x as x1
35>        join x as x2
36>        on (x2.id = x1.id and
37>            x1.rn >= x2.rn - 2 and
38>            x1.rn <= x2.rn + 2
39>           )
40> group by x1.id, x1.rn, x1.val
41> go
id val   moving_avg
-- ----- ----------
H1   1.2        3.0
H1   2.3        3.9
H1   5.5        4.9
H1   6.6        5.4
H1   8.7        6.3
H1   4.1        6.7
H1   6.4        7.3
H1   7.8        6.2
H1   9.6        6.8
H1   3.2        6.9
H5  50.1       54.4
H5  54.2       56.0
H5  58.8       55.7
H5  60.9       56.2
H5  54.3       56.1
H5  52.7       56.4
H5  53.8       56.2
H5  60.1       57.2
H5  59.9       58.1
H5  59.4       59.1
H5  57.1       58.8
  
(21 rows affected)
1>
2> exit
  
C:\data>
C:\data>

# 7  
Old 08-07-2017
Thanks again Durden_Tyler, your post was very helpfull.
Regards,
Theflamingmoe
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

SMA (Single Moving Average) and Standard Deviation

Hello Team, I am using the following awk script to calculate the SMA (Single Moving Average) for an specific period but now I would like to include the standard deviation output. Could you please help me to modify this awk shell script awk -F, -v points=5 ' { a = $2; ... (4 Replies)
Discussion started by: csierra
4 Replies

2. Shell Programming and Scripting

Average across multiple columns - awk

Hi forum members, I'm trying to get an average of multiple columns in a csv file using awk. A small example of my input data is as follows: cu,u3o8,au,ag -9,20,-9,3.6 0.005,30,-9,-9 0.005,50,10,3.44 0.021,-9,8,3.35 The following code seems to do most of what I want gawk -F","... (6 Replies)
Discussion started by: theflamingmoe
6 Replies

3. Shell Programming and Scripting

awk to combine by field and average by another

In the below awk I am trying to combine all matching $4 into a single $5 (up to the -), and count the lines in $6 and average all values in $7. The awk is close but it seems to only be using the last line in the file and skipping all others. The posted input is a sample of the file that is over... (3 Replies)
Discussion started by: cmccabe
3 Replies

4. Shell Programming and Scripting

awk or Bash: Cumulative average

For the data I would like to parse down and for each parsing I want a cumulative averaging, stored in an array that can be output. I.e. 546/NR = 546 (546+344)/NR=(546+344)/2 = etc. For N record input I want N values of the average (a block averaging effectively) Any... (3 Replies)
Discussion started by: chrisjorg
3 Replies

5. Shell Programming and Scripting

Calculating average with awk

I need to find the average from a file like: data => BW:123 M:30 RTD:0 1 0 1 0 0 1 1 1 1 0 0 1 1 0' data => BW:123 N:30 RTD:0 1 0 1 0 0 1 1 1 1 0 0 1 1 0' data => BW:123 N:30 RTD:0 1 0 1 0 0 1 1 1 1 0 0 1 1 0' data => BW:123 N:30 RTD:0 1 0 1 0 0 1 1 1 1 0 0 1 1 0' data => BW:123 N:30 RTD:0 1... (4 Replies)
Discussion started by: Slagle
4 Replies

6. Shell Programming and Scripting

Calculate Average AWK

I want to calculate the average line by line of some files with several lines on them, the files are identical, just want to average the 3rd columns of those files.:wall: Example file: File 1 001 0.046 0.667267 001 0.047 0.672028 001 0.048 0.656025 001 0.049 ... (2 Replies)
Discussion started by: AriasFco
2 Replies

7. Shell Programming and Scripting

AWK novice - calculate the average

Hi, I have the following data in a file for example: P1 XXXXXXX.1 YYYYYYY.1 ZZZ.1 P1 XXXXXXX.2 YYYYYYY.2 ZZZ.2 P1 XXXXXXX.3 YYYYYYY.3 ZZZ.3 P1 XXXXXXX.4 YYYYYYY.4 ZZZ.4 P1 XXXXXXX.5 YYYYYYY.5 ZZZ.5 P1 XXXXXXX.6 YYYYYYY.6 ZZZ.6 P1 XXXXXXX.7 YYYYYYY.7 ZZZ.7 P1 XXXXXXX.8 YYYYYYY.8 ZZZ.8 P2... (6 Replies)
Discussion started by: alex2005
6 Replies

8. Shell Programming and Scripting

Partial average of a column with awk

Hello, Let's assume I have 100 files FILE_${m} (0<m<101). Each of them contains 100 lines and 10 columns. I'd like to get in a file called "result" the average value of column 3, ONLY between lines 11 and 17, in order to plot that average as a function of the parameter m. So far I can compute... (6 Replies)
Discussion started by: DMini
6 Replies

9. Shell Programming and Scripting

Average in awk

Hi I am looking for an awk script which can compute average of all the fields every 5th line. The file looks: A B C D E F G H I J K L M 1 18 13 14 12 14 13 11 12 12 15 15 15 2 17 17 13 13 13 12 12 11 12 14 15 14 3 16 16 12 12 12 11 11 12 11 16 14 13 4 15 15 11 11 11 12 11 12 11... (6 Replies)
Discussion started by: saint2006
6 Replies

10. Shell Programming and Scripting

how to average in awk

Hi, I have the data like this $1 $2 1 12 2 13 3 14 4 12 5 12 6 12 7 13 8 14 9 12 10 12 i want to compute average of $1 and $2 every 5th line (1-5 and 6-10) Please help me with awk Thank you (4 Replies)
Discussion started by: saint2006
4 Replies
Login or Register to Ask a Question