Find gaps in time data and replace missing time value and column 2 value by interpolation in awk


 
Thread Tools Search this Thread
Top Forums Programming Find gaps in time data and replace missing time value and column 2 value by interpolation in awk
# 1  
Old 01-15-2015
Find gaps in time data and replace missing time value and column 2 value by interpolation in awk

Dear all,

I am kindly seeking assistance on the following issue.

I am working with data that is sampled every 0.05 hours (that is 3 minutes intervals) here is a sample data from the file
Code:
5.00000   15.5030
5.05000   15.6680
5.10000   16.0100
5.15000   16.3450
5.20000   16.7120
5.25000   17.1020
5.30000   17.4900
5.35000   17.8870
5.40000   18.2850
5.45000   18.6970
5.50000   19.1170
5.55000   19.5400
5.60000   19.9730
5.65000   20.3950
5.70000   20.8100
5.75000   21.2700
5.80000   21.7180
5.85000   22.1780
5.90000   22.6270
5.95000   23.0570
6.25000   24.0850
6.30000   24.5270
6.35000   24.9830
6.40000   25.4480
6.45000   25.9000
6.50000   26.3520
6.55000   27.3630
6.60000   27.7830
6.65000   28.2700
6.70000   28.8780
6.75000   29.3120
6.80000   29.7130
6.85000   30.0780
6.90000   30.4630
6.95000   30.9540
7.00000   31.4240

As you can see my data has a big gap here
Code:
5.95000   23.0570
6.25000   24.0850

I am trying to find a way to locate such a gap and replace the missing time values on column 1 and create corresponding values of column 2 for each missing time value by interpolation. Something like
Code:
5.95000   23.0570
6.00000
6.05000
6.10000                <- column 2 must be filled with interpolated values
6.15000
6.20000
6.25000   24.0850

I am processing my data with AWK, and so an awk solution would be easier to integrate into my code.

PLEASE NOTE: that I am creating the data by resampling some other data that looks like this
Code:
 5.00000    46.51        3   15.503
 5.00833    46.71        3   15.570
 5.01667    46.91        3   15.637
 5.02500    47.12        3   15.707
 5.03333    47.32        3   15.773
 5.04167    62.44        4   15.610
 5.05000    62.67        4   15.668
 5.05833    62.90        4   15.725
 5.06667    63.15        4   15.787
 5.07500    63.38        4   15.845
 5.08333    63.58        4   15.895
 5.09167    63.80        4   15.950
 5.10000    64.04        4   16.010
 5.10833    64.27        4   16.067
 5.11667    64.47        4   16.117
 5.12500    64.69        4   16.172
 5.13333    64.89        4   16.223
 5.14167    65.14        4   16.285
 5.15000    65.38        4   16.345
 5.15833    65.60        4   16.400
 5.16667    65.85        4   16.463
 5.17500    66.08        4   16.520
 5.18333    66.35        4   16.587
 5.19167    66.61        4   16.652
 5.20000    66.85        4   16.712
 5.20833    67.11        4   16.777
 5.21667    67.39        4   16.848
 5.22500    67.63        4   16.907
 5.23333    67.88        4   16.970
 5.24167    68.14        4   17.035
 5.25000    68.41        4   17.102
 5.25833    68.68        4   17.170
 5.26667    68.90        4   17.225
 5.27500    69.18        4   17.295
 5.28333    69.43        4   17.358
 5.29167    69.69        4   17.422
 5.30000    69.96        4   17.490
 5.30833    70.23        4   17.558
 5.31667    70.50        4   17.625
 5.32500    70.75        4   17.688
 5.33333    71.00        4   17.750
 5.34167    71.28        4   17.820
 5.35000    71.55        4   17.887
 5.35833    71.80        4   17.950
 5.36667    72.07        4   18.018
 5.37500    72.34        4   18.085
 5.38333    72.60        4   18.150
 5.39167    72.89        4   18.223
 5.40000    73.14        4   18.285
 5.40833    73.42        4   18.355
 5.41667    73.70        4   18.425
 5.42500    73.98        4   18.495
 5.43333    74.25        4   18.562
 5.44167    74.52        4   18.630
 5.45000    74.79        4   18.697
 5.45833    75.07        4   18.768
 5.46667    75.36        4   18.840
 5.47500    75.63        4   18.907
 5.48333    75.91        4   18.977
 5.49167    76.20        4   19.050
 5.50000    76.47        4   19.117
 5.50833    76.74        4   19.185
 5.51667    77.04        4   19.260
 5.52500    77.30        4   19.325
 5.53333    77.60        4   19.400
 5.54167    77.88        4   19.470
 5.55000    78.16        4   19.540
 5.55833    78.47        4   19.617
 5.56667    78.77        4   19.692
 5.57500    79.06        4   19.765
 5.58333    79.36        4   19.840
 5.59167    79.62        4   19.905
 5.60000    79.89        4   19.973
 5.60833    80.19        4   20.047
 5.61667    80.48        4   20.120
 5.62500    80.73        4   20.183
 5.63333    81.05        4   20.262
 5.64167    81.29        4   20.323
 5.65000    81.58        4   20.395
 5.65833    81.81        4   20.453
 5.66667    82.12        4   20.530
 5.67500    82.39        4   20.597
 5.68333    82.68        4   20.670
 5.69167    82.94        4   20.735
 5.70000    83.24        4   20.810
 5.70833    83.57        4   20.892
 5.71667    83.87        4   20.968
 5.72500    84.20        4   21.050
 5.73333    84.50        4   21.125
 5.74167    84.79        4   21.198
 5.75000    85.08        4   21.270
 5.75833    85.38        4   21.345
 5.76667    85.65        4   21.413
 5.77500    85.95        4   21.488
 5.78333    86.25        4   21.562
 5.79167    86.55        4   21.638
 5.80000    86.87        4   21.718
 5.80833    87.20        4   21.800
 5.81667    87.48        4   21.870
 5.82500    87.78        4   21.945
 5.83333    88.08        4   22.020
 5.84167    88.39        4   22.098
 5.85000    88.71        4   22.178
 5.85833    88.99        4   22.247
 5.86667    89.29        4   22.323
 5.87500    89.59        4   22.398
 5.88333    89.88        4   22.470
 5.89167    90.18        4   22.545
 5.90000    90.51        4   22.627
 5.90833    90.77        4   22.693
 5.91667    91.06        4   22.765
 5.92500    91.35        4   22.837
 5.93333    91.63        4   22.908
 5.94167    91.91        4   22.977
 5.95000    92.23        4   23.057
 5.95833    92.54        4   23.135
 5.96667    92.88        4   23.220
 5.97500    93.19        4   23.297
 5.98333    93.50        4   23.375
 5.99167    93.80        4   23.450
 6.25000    96.34        4   24.085
 6.25833    96.62        4   24.155
 6.26667    96.93        4   24.233
 6.27500    97.20        4   24.300
 6.28333    97.52        4   24.380
 6.29167    97.80        4   24.450
 6.30000    98.11        4   24.527
 6.30833    98.39        4   24.598
 6.31667    98.69        4   24.672
 6.32500    98.99        4   24.747
 6.33333    99.31        4   24.828
 6.34167    99.61        4   24.902
 6.35000    99.93        4   24.983
 6.35833   100.23        4   25.058
 6.36667   100.54        4   25.135
 6.37500   100.85        4   25.213
 6.38333   101.17        4   25.292
 6.39167   101.49        4   25.373
 6.40000   101.79        4   25.448
 6.40833   102.10        4   25.525
 6.41667   102.40        4   25.600
 6.42500   102.71        4   25.677
 6.43333   102.99        4   25.748
 6.44167   103.29        4   25.823
 6.45000   103.60        4   25.900
 6.45833   103.91        4   25.977
 6.46667   104.22        4   26.055
 6.47500   104.50        4   26.125
 6.48333   104.81        4   26.203
 6.49167   105.10        4   26.275
 6.50000   105.41        4   26.352
 6.50833   105.71        4   26.428
 6.51667   105.99        4   26.497
 6.52500   106.29        4   26.572
 6.53333   106.59        4   26.648
 6.54167   106.87        4   26.718
 6.55000    82.09        3   27.363
 6.55833    82.28        3   27.427
 6.56667    82.49        3   27.497
 6.57500    82.70        3   27.567
 6.58333    82.91        3   27.637
 6.59167    83.13        3   27.710
 6.60000    83.35        3   27.783
 6.60833    83.60        3   27.867
 6.61667    83.85        3   27.950
 6.62500    84.06        3   28.020
 6.63333    84.32        3   28.107
 6.64167    84.56        3   28.187
 6.65000    84.81        3   28.270
 6.65833    85.05        3   28.350
 6.66667    85.27        3   28.423
 6.67500   114.58        4   28.645
 6.68333   114.92        4   28.730
 6.69167   115.22        4   28.805
 6.70000   115.51        4   28.878
 6.70833   115.82        4   28.955
 6.71667   116.12        4   29.030
 6.72500   116.40        4   29.100
 6.73333   116.69        4   29.172
 6.74167   116.99        4   29.248
 6.75000   117.25        4   29.312
 6.75833   117.54        4   29.385
 6.76667   117.80        4   29.450
 6.77500   118.07        4   29.518
 6.78333   118.34        4   29.585
 6.79167   118.60        4   29.650
 6.80000   118.85        4   29.713
 6.80833   119.13        4   29.782
 6.81667   119.36        4   29.840
 6.82500   119.60        4   29.900
 6.83333   119.83        4   29.957
 6.84167   120.05        4   30.012
 6.85000   120.31        4   30.078
 6.85833   120.52        4   30.130
 6.86667   120.77        4   30.192
 6.87500   121.03        4   30.258
 6.88333   121.30        4   30.325
 6.89167   121.54        4   30.385
 6.90000   121.85        4   30.463
 6.90833   122.17        4   30.543
 6.91667   122.47        4   30.617
 6.92500   122.75        4   30.688
 6.93333   153.97        5   30.794
 6.94167   154.39        5   30.878
 6.95000   154.77        5   30.954
 6.95833   155.18        5   31.036
 6.96667   155.58        5   31.116
 6.97500   155.97        5   31.194
 6.98333   156.34        5   31.268
 6.99167   156.74        5   31.348
 7.00000   157.12        5   31.424
 7.00833   157.47        5   31.494

using the following code
Code:
awk 'BEGIN{n = 500}
  {if((($1 * 10000) % n) == 0) {printf "%7.5f   %-6.4f\n", $1, $4; p1 = 0}
  else if((($1 + p1) * 10000) % 500 == 0)
    {printf "%7.5f   %-6.4f\n", ($1 + p1)/2, ($4 + p4)/2; p1 = 0}
  else {p1 = $1; p4 = $4}}' inFile > outFile

this code handles small gaps but not big ones like in this situation.

I am putting this here so maybe some one can just help me modify the code to handle this gap problem
# 2  
Old 01-15-2015
This seems very similar to several earlier requests in these forums. A good one to look at for ideas would be XY interpolation by time in awk. If that doesn't give you what you need, the last post in that thread provides pointers to several similar threads.

Please look at those first, and if you can't find anything in them that helps, explain how this case is different from the problem solved in the above thread and we'll try to help you get a working solution.
# 3  
Old 01-15-2015
Dear Don,

Thank you for your reply. I have read and seen several "almost similar issues" including the "XY interpolation by time in awk" you have referred to. I have spent several days trying to see how I can adapt the solution there to my problem and am still trying but being relatively new to awk programming, I have not managed to get anywhere. My problem here is relatively different from that in that, that data is quite different from mine here. In addition, my time here is some floating point values, I have seen some posts in "Expand & Interpolation" which I thought could help by since for them they needed integer values, the issue is simpler as it can be handled by integer in the for loop. I have spent several days going through several examples which I thought could be similar to my situation but each of them seems different

I will really appreciate any help given

Last edited by malandisa; 01-15-2015 at 11:30 PM..
# 4  
Old 01-16-2015
Please be aware that your data are not a linear function of time; on top of some noise it has a small curvature, and the six data points right after the gap are somewhat lower than they should be (there's a jump in values but not in time delta at point 7).
This is a quick and dirty approximation to exactly your problem and data; no error checking etc. is done. It's sort of a linear interpolation between the given boundaries although we know the boundary to the right is questionable. On top, my mawk has a problem with the D1 > D0 comparison, sometimes the delta is -1E-16, sometimes it's +71E-16, so a few extra lines are being "interpolated". I don't have a good solution at hand; either increase the to be compared value slightly (yuck!) or use sort -u on the result (yuck!)...
However, try
Code:
awk     'NR==1  {L1=$1; L2=$2; D0=0.05}
                {D1=$1-L1; D2=$2-L2
                 if (D1 > D0)   { n=D1/D0                   # D1 sometimes is sliiightly larger than D0
                                 ST=D2/n
                                 for (i=1; i<=int(n); i++) printf "%7.5f   %6.4f <---\n", L1+D0*i, L2+ST*i}
                 L1=$1; L2=$2
                 print $0}
        ' file

This User Gave Thanks to RudiC For This Post:
# 5  
Old 01-16-2015
Dear RudiC and everyone,

Thank you. this solution works and does exactly what I really needed. The interpolation is just alright. As you can see this is time series data for some quantity that depends on the sun, which exhibits higher values during the day that at night. this is morning and the variation of the quantity more complex due to too many factors. So this interpolation does just fine. I have just increased the precision of the D0/D1 to include more significant figures and that solves the problem you indicated with the
Code:
D1 > D0

.

Again may I say thank you and am sure this will help others as well.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Awk: time intervals based on epoch time

I have a list of epoch times delimited by "-" as follows: 1335078000 - 1335176700 1335340800 - 1335527400 1335771300 - 1335945600 1336201200 - 1336218000 The corresponding dates are: 20120422 1000 - 20120423 1325 20120425 1100 - 20120427 1450 20120430 1035 - 20120502 1100 ... (3 Replies)
Discussion started by: alex2005
3 Replies

2. Shell Programming and Scripting

XY interpolation by time in awk

Hi I am a newbie in awk scripting. I'm working with a file with xy coordinates that were acquired with a time stamp. All the time stamps were recorded but not the XY coordinates. Let see an example: FFID X Y UNIX TIME TIMEGAP... (8 Replies)
Discussion started by: Joćo Noiva
8 Replies

3. Programming

Resample time series data with replacement any way to do this in awk or just bash script

I have some time series data that I need to resample or downsample at some specific time intervals. The firs column is time in decimal hours. I am tryiong to resample this data every 3 minutse. So I need a data value ever 0.05. Here is the example data and as you can see, there time slot for 0.1500... (3 Replies)
Discussion started by: malandisa
3 Replies

4. Shell Programming and Scripting

awk : collecting all data between two time frame

Hi Experts , I need your help to collect the complete data between two time frame from the log files, when I try awk it's collecting the data only which is printed with time stamp for example, awk works well from "16:00 to 17:30" but its not collecting <line*> "from 17:30 to 18:00" ... (8 Replies)
Discussion started by: zenkarthi
8 Replies

5. UNIX for Dummies Questions & Answers

Converting string date time to unix time in AWK

I'd like to convert a date string in the form of sun aug 19 09:03:10 EDT 2012, to unixtime timestamp using awk. I tried This is how each line of the file looks like, different date and time in this format Sun Aug 19 08:33:45 EDT 2012, user1(108.6.217.236) all: test on the 17th ... (2 Replies)
Discussion started by: bkkid
2 Replies

6. Shell Programming and Scripting

awk/sed to search & replace data in first column

Hi All, I need help in manipulating the data in first column in a file. The sample data looks like below, Mon Jul 18 00:32:52 EDT 2011,NULL,UAT Jul 19 2011,NULL,UAT 1] All field in the file are separated by "," 2] File is having weekly data extracted from database 3] For eg.... (8 Replies)
Discussion started by: gr8_usk
8 Replies

7. Shell Programming and Scripting

Find and Convert UTC Time to PST Time

Hello All - I have a script that grabs data from the net and outputs the following data 46029 46.144 -124.510 2010 07 26 22 50 320 4.0 6.0 2.2 9 6.8 311 1012.1 -0.9 13.3 13.5 13.3 - - 46041 47.353 -124.731 2010 07 26 22 50 250 2.0 3.0 1.6 8 6.4 - 1011.6 - ... (0 Replies)
Discussion started by: drexnefex
0 Replies

8. Shell Programming and Scripting

Searching for Gaps in Time

I am very new to shell scripting. We use C-Shell here and I know the issues that surround it. I hope a solution can be created using awk, sed, etc... instead of having to write a program. I have an input file that is sorted by date and time in ascending order ... (2 Replies)
Discussion started by: jclanc8
2 Replies

9. Shell Programming and Scripting

How to get data between the start time and end time?

Hi, Can anyone help me how can I get the line that between the start time and end time. file1.txt 15/03/2009 20:45:03 Request: - Data of this line 15/03/2009 20:45:12 Response: - Data of this line 15/03/2009 22:10:40 Request: - Data of this line 15/03/2009 22:10:42 Response: - Data of... (1 Reply)
Discussion started by: tanit
1 Replies

10. Shell Programming and Scripting

calculate the average of time series data using AWK

Hi, I have two time series data (below) merged into a file. t1 and t2 are in unit of second I want to calculate the average of V1 every second and count how many times "1" in V2 is occur within a second Input File: t1 V1 t2 V2 10.000000... (5 Replies)
Discussion started by: nica
5 Replies
Login or Register to Ask a Question