Fill in missing Data


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Fill in missing Data
# 15  
Old 03-20-2012
The numbers are going to be wrong in any case. Garbage in, garbage out. Server 2 having a value of 0.1 and server 4 having a value of 0.3 doesn't really imply server 3 has a value of 0.2... I've been hoping we'd just badly misinterpreting what you wanted, but if you really want interpolation between columns, fine... but I can't imagine why it'd be pretty or even desirable.

Code:
$ cat fill.awk

BEGIN { FS=","; OFS="," }

{
        S=NF+1

        for(N=2; N<=NF; N++)
        {
                # What number to start on when interpolating
                if($N && ($N != " "))
                {       S=N;    continue        }

                # Find blank range, and the next good number
                for(M=N; (!$M) && ($M != " ") && (M<=NF); M++);
                # If it doesn't end on a number, can't interpolate
                if(!$M) break;
                # Calculate the slope of the line
                SLOPE=($M - $S)/(M-S);
                # Fill in all the empty points between start and end
                for(Q=S+1; Q<M; Q++)
                {
                        X=Q-S;
                        # Adjust the number of decimal places to taste.
                        $Q = sprintf("%.2f", (SLOPE*X)+$S);
                }

                # Don't skip the ending number next loop.
                N=(M-1);
        }

        print
}

$ awk -f fill.awk data

Wed Feb 01 09:00:02 EST 2012,,,0.4,0.3,0.30,0.3,0.30,0.3,0.40,0.5,0.40,0.3,0.33,0.37,0.4,0.3,
Wed Feb 01 09:00:11 EST 2012,,,,,,,0.2,,,,,,,,,,
Wed Feb 01 09:00:22 EST 2012,0.7,0.4,0.5,0.4,0.3,0.4,0.40,0.4,0.5,0.6,0.3,0.4,0.25,0.1,0.5,0.4,
Wed Feb 01 09:00:42 EST 2012,0.9,1.0,0.6,0.5,0.50,0.50,0.50,0.50,0.50,0.50,0.50,0.5,,,,,
Wed Feb 01 09:01:02 EST 2012,1.0,0.8,0.75,0.70,0.65,0.6,0.60,0.60,0.60,0.60,0.60,0.6,,,,,
Wed Feb 01 09:01:22 EST 2012,,0.7,0.7,0.6,0.5,0.50,0.50,0.5,,,,,,,,,
Wed Feb 01 09:01:42 EST 2012,0.9,0.6,0.6,0.5,0.47,0.45,0.43,0.4,0.7,0.7,0.4,0.5,0.53,0.57,0.6,0.5,
Wed Feb 01 09:02:02 EST 2012,0.8,0.65,0.5,0.4,0.58,0.75,0.93,1.1,0.83,0.57,0.3,0.4,,,,,
Wed Feb 01 09:02:22 EST 2012,0.9,0.87,0.84,0.81,0.79,0.76,0.73,0.7,0.65,0.6,,,,,,,

$

# 16  
Old 03-20-2012
Quote:
Originally Posted by Corona688
The numbers are going to be wrong in any case. Garbage in, garbage out. Server 2 having a value of 0.1 and server 4 having a value of 0.3 doesn't really imply server 3 has a value of 0.2... I've been hoping we'd just badly misinterpreting what you wanted, but if you really want interpolation between columns, fine... but I can't imagine why it'd be pretty or even desirable.

Code:
$ cat fill.awk
 
BEGIN { FS=","; OFS="," }
 
{
        S=NF+1
 
        for(N=2; N<=NF; N++)
        {
                # What number to start on when interpolating
                if($N && ($N != " "))
                {       S=N;    continue        }
 
                # Find blank range, and the next good number
                for(M=N; (!$M) && ($M != " ") && (M<=NF); M++);
                # If it doesn't end on a number, can't interpolate
                if(!$M) break;
                # Calculate the slope of the line
                SLOPE=($M - $S)/(M-S);
                # Fill in all the empty points between start and end
                for(Q=S+1; Q<M; Q++)
                {
                        X=Q-S;
                        # Adjust the number of decimal places to taste.
                        $Q = sprintf("%.2f", (SLOPE*X)+$S);
                }
 
                # Don't skip the ending number next loop.
                N=(M-1);
        }
 
        print
}
 
$ awk -f fill.awk data
 
Wed Feb 01 09:00:02 EST 2012,,,0.4,0.3,0.30,0.3,0.30,0.3,0.40,0.5,0.40,0.3,0.33,0.37,0.4,0.3,
Wed Feb 01 09:00:11 EST 2012,,,,,,,0.2,,,,,,,,,,
Wed Feb 01 09:00:22 EST 2012,0.7,0.4,0.5,0.4,0.3,0.4,0.40,0.4,0.5,0.6,0.3,0.4,0.25,0.1,0.5,0.4,
Wed Feb 01 09:00:42 EST 2012,0.9,1.0,0.6,0.5,0.50,0.50,0.50,0.50,0.50,0.50,0.50,0.5,,,,,
Wed Feb 01 09:01:02 EST 2012,1.0,0.8,0.75,0.70,0.65,0.6,0.60,0.60,0.60,0.60,0.60,0.6,,,,,
Wed Feb 01 09:01:22 EST 2012,,0.7,0.7,0.6,0.5,0.50,0.50,0.5,,,,,,,,,
Wed Feb 01 09:01:42 EST 2012,0.9,0.6,0.6,0.5,0.47,0.45,0.43,0.4,0.7,0.7,0.4,0.5,0.53,0.57,0.6,0.5,
Wed Feb 01 09:02:02 EST 2012,0.8,0.65,0.5,0.4,0.58,0.75,0.93,1.1,0.83,0.57,0.3,0.4,,,,,
Wed Feb 01 09:02:22 EST 2012,0.9,0.87,0.84,0.81,0.79,0.76,0.73,0.7,0.65,0.6,,,,,,,
 
$


I know its weird but thats what the boss wants... lol. is there a way to go down the column instead of across the columns?
# 17  
Old 03-20-2012
So it is the same column now? Smilie This is why I requested the output data you wanted in the first place, that would make it a lot more obvious...

How much data are we talking about, here? The easy way to do it wouldn't work for gigabytes of it...
# 18  
Old 03-20-2012
Quote:
Originally Posted by Corona688
So it is the same column now? Smilie This is why I requested the output data you wanted in the first place, that would make it a lot more obvious...

How much data are we talking about, here? The easy way to do it wouldn't work for gigabytes of it...
i thought i said columns at the start. sorry about that. theres about 40,000+ lines. i attached a sample csv file to this post. What you see in there is what you see in shell with all the ,,,,,'s. i need to fill all those empty cells for each server with interpolated data.
# 19  
Old 03-20-2012
Again, that's the input. Showing the output you wanted would be very helpful.
# 20  
Old 03-20-2012
The output would be the same... just inputting the missing data is all thats changing. not sure how to show you that exactly..

The way you coded it works it just has to go down the column not across the row.
# 21  
Old 03-20-2012
What threw me is you showed what you wanted like "a,,c", filling in b inbetween. That had me thinking you wanted it filled in between the same rows, which of course makes no sense -- server b's data has nothing to do with a's or c's. But if you wanted extrapolation based on past data for that column, that makes more sense.

Here's some interpolation across rows. It has to load the whole dataset into memory though.

Code:
$ cat fill2.awk

BEGIN { FS="," ; OFS="," }

{       COLS=NF
        for(N=1; N<=NF; N++)    D[NR,N]=$N      }

END {
        for(N=2; N<=COLS; N++)
        {
#               printf("column %d\n", N);

                S=1
                for(L=1; L<=NR; L++)
                {
                        if(!D[L,N]) continue;

                        if(((L-S) < 2) || (!D[S,N]))
                        {
                                S=L;
                                continue;
                        }

#                       printf("[%s]", D[S,N]);

                        SLOPE=(D[L,N]-D[S,N])/(L-S);

                        for(M=S+1; M<L; M++)
                        {
                                X=M-S
                                D[M,N]=D[S,N]+(SLOPE*X);
#                               printf("\t'%s'", D[M,N]);
                        }

#                       printf("\t[%s]\n", D[L,N]);

                        S=L;
                }
        }

        for(L=1; L<=NR; L++)
        {
                printf("%s", D[L, 1]);

                for(N=2; N<=COLS; N++)  printf(OFS "%s", D[L,N]);

                printf("\n");
        }
}

$ awk -f fill2.awk data

Wed Feb 01 09:00:02 EST 2012,,,0.4,0.3,,0.3,,0.3,,0.5,,0.3,,,0.4,0.3,
Wed Feb 01 09:00:11 EST 2012,,,0.45,0.35,,0.35,0.2,0.35,,0.55,,0.35,,,0.45,0.35,
Wed Feb 01 09:00:22 EST 2012,0.7,0.4,0.5,0.4,0.3,0.4,,0.4,0.5,0.6,0.3,0.4,,0.1,0.5,0.4,
Wed Feb 01 09:00:42 EST 2012,0.9,1.0,0.6,0.5,0.366667,0.5,,0.433333,0.55,0.625,0.325,0.5,,,0.525,0.425,
Wed Feb 01 09:01:02 EST 2012,1.0,0.8,0.65,0.55,0.433333,0.6,,0.466667,0.6,0.65,0.35,0.6,,,0.55,0.45,
Wed Feb 01 09:01:22 EST 2012,0.95,0.7,0.7,0.6,0.5,,,0.5,0.65,0.675,0.375,0.55,,,0.575,0.475,
Wed Feb 01 09:01:42 EST 2012,0.9,0.6,0.6,0.5,,,,0.4,0.7,0.7,0.4,0.5,,,0.6,0.5,
Wed Feb 01 09:02:02 EST 2012,0.8,,0.5,0.4,,,,1.1,,0.65,0.3,0.4,,,,,
Wed Feb 01 09:02:22 EST 2012,0.9,,,,,,,0.7,,0.6,,,,,,,

$

It's still full of holes like swiss cheese wherever there was only one end to the data...
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Fill in missing values

Hi, I have a data sample as shown below. I want to fill in the left column so that the line will be continuous. For example, between 1 and 5 should be 2,3,4. And corresponding values in the right column will be 0. Thus the expected data should look like that: 1 1 1 10 1 2 1 3 1 5 1 6 2 0... (6 Replies)
Discussion started by: theanh0508
6 Replies

2. UNIX for Beginners Questions & Answers

Fill in missing hours and interpolate values using awk.

I have a time series data like this 40754,35.6931,51.3092,201610160700,21.0 40754,35.6931,51.3092,201610160800,23.0 40754,35.6931,51.3092,201610160900,24.0 40754,35.6931,51.3092,201610161000,24.0 40754,35.6931,51.3092,201610161300,25.0 40754,35.6931,51.3092,201610161400,23.0... (6 Replies)
Discussion started by: emirzaei
6 Replies

3. Shell Programming and Scripting

Fill in missing rows with zero to have uniform table

Hello, I have two files of same structure except some rows are missing randomly in each file. How do I fill the missing rows to have the exact ID column (S01 ~ S96) and rest columns filled with "0" with awk? The purpose of this step is to join the two files side by side. The closest thread is... (17 Replies)
Discussion started by: yifangt
17 Replies

4. Shell Programming and Scripting

Fill data if number range is given

Hi I want to get all numbers if number range is given as input. Eg: INPUT FILE 100-105 107 108-112 OUTPUT REQUIRED: 100 101 102 103 104 105 107 108 109 110 111 112 How can I do it using shell? :confused: Thanks in advance. (11 Replies)
Discussion started by: dashing201
11 Replies

5. Shell Programming and Scripting

fill in last column of data

Hello, I am fairly new to awk, and I have the following problem. My file has missing data in the last column, and the program I am pre-processing this file for cannot interpret correctly shortened rows (it just wraps the data around). Is there a way to force awk to create the same... (6 Replies)
Discussion started by: timert34
6 Replies

6. Shell Programming and Scripting

Fill missing values with 2

Hi All, I have 100 .txt files which look like this: 3 4 5 6 7 Now, some files have some numbers missing in them and they look like this: 4 5 6 (6 Replies)
Discussion started by: shoaibjameel123
6 Replies

7. Shell Programming and Scripting

Fill missing numbers in second column with zeros

Hi All, I have 100 files with names like this: 1.dat, 2.dat, 3.dat until 100.dat. My dat files look like this: 42323 0 438939 1 434 0 0.9383 3434 120.23 3 234 As you can see in the second column, some numbers are missing. I want to fill those missing places with 0's in all... (3 Replies)
Discussion started by: shoaibjameel123
3 Replies

8. Shell Programming and Scripting

fill in missing columns

It can't be that hard, but I just can't figure it out: I have file like: File Sub-brick M_1 S_1 M_2 S_2 M_4 S_4 ... xxx 1 214 731 228 621 132 578 ... and would like to get 0 0 where M_3 S_3 is missing xxx 1 214 731 228 621 0 0 132 578 ... I wrote following script, but can't figure out... (3 Replies)
Discussion started by: avvk
3 Replies

9. Shell Programming and Scripting

Fill in missing numbers in range

I need to edit a list of numbers on the following form: 1 1.0 2 1.4 5 2.1 7 1.9 I want: 1 1.0 2 1.4 3 0.0 4 0.0 5 2.1 6 0.0 7 1.9 (i want to add the missing number in column 1 together with 0.0 in column 2). I guess it is rather trivial but i didn't even manage to read column... (5 Replies)
Discussion started by: bistru
5 Replies
Login or Register to Ask a Question