Clean values in range of duplicate records


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Clean values in range of duplicate records
# 1  
Old 06-14-2014
Clean values in range of duplicate records

Dear Gents,

Could you please help me to solve this problem.

I am getting the average of a column for same duplicate records in this case locate in column 1. The average is computed from column 5 from all duplicate records in column 1. Normallly the values in column 5 is constant so the average is correct, but sometimes this value change to 9999 in this case if for some records the value is for example from 1 - 100 and there is values like 9999 the average is wrong, But in same cases also all records have the value 9999 in column 5 in this case it is ok.

Then. I would like to delete the rows were there is values with 9999 in column 5, but only if this value does not repetead in all same records ( col 1). thus the average computed with the script will be correct.

The code that I am using to get the average is:

Code:
awk     '                       {A = substr ($0,1,10); B = substr ($0, 45, 9)}\
         A != OA && NR > 1      {printf "%s%8.1f\n", substr(D,1,43), SUM1/CNT, SUM2/CNT;  CNT=0; SUM1=SUM2=""}\
         END                    {printf "%s%8.1f\n", substr(D,1,43), SUM1/CNT, SUM2/CNT;  CNT=0; SUM1=SUM2=""}\
                                {OA = A; D=$0; SUM1+=B; CNT++}\
        ' tmp1 > res-db1

The input data is.
Input raw data:
Code:
2194421564    909561  1861638.9       38.7    2237.87        5.0        5.0       0.69 
2194421564    909561  1861638.9       38.7     2240.0        6.5        5.0       0.66 
2194421564    909561  1861638.9       38.7    2245.62       3.67        5.0       0.74 
2194421564    909561  1861638.9       38.7    2266.11      91.11        5.0      -4.33 
2194421564    909561  1861638.9       38.7     2285.4      78.99        5.0       0.75 
2194421564    909561  1861638.9       38.7    2355.64      33.11        5.0      -3.36 
2194421564    909561  1861638.9       38.7    2355.64      33.11        5.0      -3.36 
2194421564    909561  1861638.9       38.7    2369.16       85.2        5.0       1.78 
2194421564    909561  1861638.9       38.7    2378.69        0.0        5.0       1.86 
2194421564    909561  1861638.9       38.7    2388.86        0.0        5.0      -3.19 
2194421564    909561  1861638.9       38.7    2393.33        0.0        5.0      -3.17 
2194421564    909561  1861638.9       38.7    2412.32      15.04        5.0       1.81 
2194421564    909561  1861638.9       38.7    2413.99       4.96        5.0       1.88 
2194421564    909561  1861638.9       38.7    2413.99       4.96        5.0       1.88 
2194421566    909604  1861663.9       39.3     2225.4       7.84        5.0      -2.19 
2194421566    909604  1861663.9       39.3    2229.73       8.03        5.0       -2.2 
2194421566    909604  1861663.9       39.3    2234.45       5.72        5.0      -2.13 
2194421566    909604  1861663.9       39.3    2239.97     262.41        5.0      -2.23 
2194421566    909604  1861663.9       39.3    2271.63      94.72        5.0      -2.12 
2194421566    909604  1861663.9       39.3    2367.32       83.3        5.0       -1.0 
2194421566    909604  1861663.9       39.3    2375.55        0.0        5.0      -0.94 
2194421566    909604  1861663.9       39.3    2393.25        0.0        5.0      -0.83 
2194421566    909604  1861663.9       39.3    2395.07        0.0        5.0      -0.85 
2194421566    909604  1861663.9       39.3    2404.03       16.5        5.0      -1.01 
2194421566    909604  1861663.9       39.3    2404.54      10.85        5.0      -0.95 
2194421566    909604  1861663.9       39.3    2404.54      10.85        5.0      -0.95 
2194421566    909604  1861663.9       39.3     9999.0       0.91        5.0      100.0 
2194421566    909604  1861663.9       39.3     9999.0       0.91        5.0      100.0 
2194421568    909648  1861688.8       39.9    2225.09        0.0        5.0       0.45 
2194421568    909648  1861688.8       39.9    2229.62      53.83        5.0       0.43 
2194421568    909648  1861688.8       39.9    2232.86     117.04        5.0     -35.38 
2194421568    909648  1861688.8       39.9    2233.17       52.1        5.0       0.51 
2194421568    909648  1861688.8       39.9    2264.54     169.88        5.0       0.51 
2194421568    909648  1861688.8       39.9    2360.05     111.67        5.0       1.55 
2194421568    909648  1861688.8       39.9    2367.58       40.6        5.0     -33.58 
2194421568    909648  1861688.8       39.9    2367.58       40.6        5.0     -33.58 
2194421568    909648  1861688.8       39.9     9999.0        0.0        5.0       1.62 
2194421568    909648  1861688.8       39.9    2388.63        0.0        5.0     -33.52 
2194421568    909648  1861688.8       39.9    2389.81        0.0        5.0     -33.51 
2194421568    909648  1861688.8       39.9    2391.63       52.1        5.0       1.55 
2194421568    909648  1861688.8       39.9    2392.28      59.25        5.0       1.59 
2194421568    909648  1861688.8       39.9    2392.28      59.25        5.0       1.59 
2195620870    894386  1853223.6        0.0       1.58        0.0        0.0        0.0 
2195620870    894386  1853223.6        0.0       1.58        0.0        5.0     -99.96 
2195620870    894386  1853223.6        0.0       1.59       0.47        5.0     -99.96 
2195620870    894386  1853223.6        0.0        1.6        0.0        5.0        0.0 
2195620870    894386  1853223.6        0.0        1.6       0.47        5.0     -99.96 
2195620870    894386  1853223.6        0.0       1.66       0.47        5.0     -99.96 
2195620870    894386  1853223.6        0.0       1.67        0.0        0.0        0.0 
2195620870    894386  1853223.6        0.0       1.67        0.0        0.0     -99.96 
2195620870    894386  1853223.6        0.0       1.67       0.47        5.0     -99.96 
2195620870    894386  1853223.6        0.0       1.67       0.48        5.0     -99.96 
2195620870    894386  1853223.6        0.0       1.67       0.49        0.0     -99.96 
2195620870    894386  1853223.6        0.0       1.68        0.0        0.0     -99.96 
2195620872    894429  1853248.8        4.8       1.59        0.0        0.0        0.0 
2195620872    894429  1853248.8        4.8       1.59        0.0        0.5        0.0 
2195620872    894429  1853248.8        4.8       1.59        0.0        5.0     -99.96 
2195620872    894429  1853248.8        4.8       1.59       0.48       3.27     -99.96 
2195620872    894429  1853248.8        4.8        1.6       0.45        5.0     -99.96 
2195620872    894429  1853248.8        4.8       1.67        0.0        0.0        0.0 
2195620872    894429  1853248.8        4.8       1.67       0.48       3.27     -99.96 
2195620872    894429  1853248.8        4.8       1.68        0.0        0.0     -99.96 
2195620872    894429  1853248.8        4.8       1.68        0.0        0.0     -99.96 
2195620872    894429  1853248.8        4.8       1.68       0.47        0.0     -99.96 
2195620872    894429  1853248.8        4.8       1.68       0.49        5.0     -99.96 
2195620872    894429  1853248.8        4.8       1.68       0.49        5.0     -99.96 
2195620874    894472  1853273.6        2.8       1.59        0.0        0.0        0.0 
2195620874    894472  1853273.6        2.8       1.59        0.0        5.0     -99.96 
2195620874    894472  1853273.6        2.8        1.6       0.42        5.0     -99.96 
2195620874    894472  1853273.6        2.8       1.61       0.42        5.0     -99.96 
2195620874    894472  1853273.6        2.8       1.62        0.0        5.0        0.0 
2195620874    894472  1853273.6        2.8       1.67        0.0        0.0     -99.96 
2195620874    894472  1853273.6        2.8       1.67       0.42        5.0     -99.96 
2195620874    894472  1853273.6        2.8       1.67       0.42        5.0     -99.96 
2195620874    894472  1853273.6        2.8       1.68        0.0        0.0        0.0 
2195620874    894472  1853273.6        2.8       1.68        0.0        0.0     -99.96 
2195620874    894472  1853273.6        2.8       1.68       0.43        0.0     -99.96 
2195620874    894472  1853273.6        2.8     9999.0        0.4        5.0     -99.96 
2195620876    894516  1853298.8        1.6       1.57        0.0        0.0        0.0 
2195620876    894516  1853298.8        1.6       1.57        0.0        5.0     -99.97 
2195620876    894516  1853298.8        1.6       1.58       0.41        5.0     -99.96 
2195620876    894516  1853298.8        1.6       1.58       0.42        5.0     -99.97 
2195620876    894516  1853298.8        1.6       1.59        0.0        5.0        0.0 
2195620876    894516  1853298.8        1.6       1.64        0.0        0.0        0.0 
2195620876    894516  1853298.8        1.6       1.64        0.0        0.0     -99.96 
2195620876    894516  1853298.8        1.6       1.65        0.0        0.0     -99.96 
2195620876    894516  1853298.8        1.6       1.65       0.41        5.0     -99.96 
2195620876    894516  1853298.8        1.6       1.65       0.43        0.0     -99.96 
2195620876    894516  1853298.8        1.6     9999.0       0.44        5.0     -99.96 
2195620876    894516  1853298.8        1.6       1.65       0.44        5.0     -99.96 
2195620878    894559  1853323.6        1.6       1.58        0.0        0.0        0.0 
2195620878    894559  1853323.6        1.6       1.58        0.0        5.0     -99.97 
2195620878    894559  1853323.6        1.6       1.59        0.0        5.0        0.0 
2195620878    894559  1853323.6        1.6       1.59       0.44        5.0     -99.96 
2195620878    894559  1853323.6        1.6       1.59       0.45        5.0     -99.97 
2195620878    894559  1853323.6        1.6       1.63       0.44        5.0     -99.96 
2195620878    894559  1853323.6        1.6       1.63       0.45        5.0     -99.97 
2195620878    894559  1853323.6        1.6       1.64        0.0        0.0        0.0 
2195620878    894559  1853323.6        1.6       1.64        0.0        0.0     -99.97 
2195620878    894559  1853323.6        1.6       1.64       0.43        5.0     -99.97 
2195620878    894559  1853323.6        1.6       1.64       0.45        0.0     -99.97 
2195620878    894559  1853323.6        1.6       1.65        0.0        0.0     -99.97 
2204021558    908231  1863642.2       34.1     9999.0       3.88        5.0      -2.11 
2204021558    908231  1863642.2       34.1     9999.0       4.75        5.0       -1.2 
2204021558    908231  1863642.2       34.1     9999.0       9.14        5.0       -0.9 
2204021558    908231  1863642.2       34.1     9999.0       9.14        5.0       -0.9 
2204021560    908275  1863667.2       34.1    2220.51       3.72        5.0      -1.96 
2204021560    908275  1863667.2       34.1    2221.44       4.87        5.0      -2.01 
2204021560    908275  1863667.2       34.1     2230.5      28.83        5.0      -1.89 
2204021560    908275  1863667.2       34.1    2259.07      34.13        5.0      -1.89 
2204021560    908275  1863667.2       34.1    2259.53       3.71        5.0      -1.87 
2204021560    908275  1863667.2       34.1    2370.47       12.6        5.0      -0.72 
2204021560    908275  1863667.2       34.1     9999.0      24.23        5.0      -0.63 
2204021560    908275  1863667.2       34.1     9999.0      24.23        5.0      -0.63 
2204021562    908318  1863692.2       34.1    2236.32       3.15        5.0      -4.99 
2204021562    908318  1863692.2       34.1    2238.04       5.17        5.0      -5.03 
2204021562    908318  1863692.2       34.1    2245.29      212.1        5.0      -4.94 
2204021562    908318  1863692.2       34.1    2281.13      45.87        5.0      -4.92 
2204021562    908318  1863692.2       34.1     2281.3        3.5        5.0      -4.91 
2204021562    908318  1863692.2       34.1     2369.2      79.74        5.0      -3.83

output from raw data

Code:
2194421564    909561  1861638.9       38.7   2339.8
2194421566    909604  1861663.9       39.3   3424.5
2194421568    909648  1861688.8       39.9   2873.9
2195620870    894386  1853223.6        0.0      1.6
2195620872    894429  1853248.8        4.8      1.6
2195620874    894472  1853273.6        2.8    834.8
2195620876    894516  1853298.8        1.6    834.7
2195620878    894559  1853323.6        1.6      1.6
2204021558    908231  1863642.2       34.1   9999.0
2204021560    908275  1863667.2       34.1   4194.9
2204021562    908318  1863692.2       34.1   2275.2

I would like to get

Code:
2194421564    909561  1861638.9       38.7   2339.8
2194421566    909604  1861663.9       39.3   2328.8
2194421568    909648  1861688.8       39.9   2325.8
2195620870    894386  1853223.6        0.0      1.6
2195620872    894429  1853248.8        4.8      1.6
2195620874    894472  1853273.6        2.8      1.6
2195620876    894516  1853298.8        1.6      1.6
2195620878    894559  1853323.6        1.6      1.6
2204021558    908231  1863642.2       34.1   9999.0
2204021560    908275  1863667.2       34.1   2260.3
2204021562    908318  1863692.2       34.1   2275.2

I try to upload the files but I was not able.

Thanks for your support
# 2  
Old 06-14-2014
Using awk, try this:

Code:
awk -vp=9999 '
($1==b && $5%p == "0") {c++ ; $5=""}
(($1!=b && NR>1)) {
    c==(f-1)&&sum=(p*f)
    printf ("%s\t%8.1f\n", a,(c==(f-1))?sum/f:sum/(f-c))
    f="";sum="";c="";
}
{a=$1OFS$2OFS$3OFS$4; b=$1; b=$1 ; f++ ; sum+=$5}
END{
c==(f-1)&&sum=(p*f)
printf ("%s\t%8.1f\n", a,(c==(f-1))?sum/f:sum/(f-c))
}' OFS="\t" tmp1 > res-db1


Last edited by pilnet101; 06-14-2014 at 10:41 AM..
This User Gave Thanks to pilnet101 For This Post:
# 3  
Old 06-14-2014
Try also
Code:
awk     'OA != $1       {if (NR>1) printf "%6.1f\n", CNT?SUM/CNT:EXC
                         OA=$1; printf "%s\t%s\t%s\t%s\t", $1, $2, $3, $4; SUM=0; CNT=0}
         $5==EXC        {next}
                        {SUM+=$5; CNT++}
         END            {printf "%6.1f\n", CNT?SUM/CNT:EXC}
        ' EXC=9999 file

These 2 Users Gave Thanks to RudiC For This Post:
# 4  
Old 06-14-2014
Dear RudiC and Pilnet101, both codes works perfec. Thanks a lot for your help and support
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Duplicate records

Gents, Please give a help file --BAD STATUS NOT RESHOOTED-- *** VP 41255/51341 in sw 2973 *** VP 41679/51521 in sw 2973 *** VP 41687/51653 in sw 2973 *** VP 41719/51629 in sw 2976 --BAD COG NOT RESHOOTED-- *** VP 41689/51497 in sw 2974 *** VP 41699/51677 in sw 2974 *** VP... (18 Replies)
Discussion started by: jiam912
18 Replies

2. Shell Programming and Scripting

Create range of values and count values.

Gents, It is possible to generate a range of values according to column 1 and count the total of rows in the range. example input 15.3 15.5 15.8 15.9 16.0 16.1 16.8 17.0 17.5 18.0 output desired 15.0 - 15.9 = 4 (10 Replies)
Discussion started by: jiam912
10 Replies

3. Shell Programming and Scripting

Find duplicate values in specific column and delete all the duplicate values

Dear folks I have a map file of around 54K lines and some of the values in the second column have the same value and I want to find them and delete all of the same values. I looked over duplicate commands but my case is not to keep one of the duplicate values. I want to remove all of the same... (4 Replies)
Discussion started by: sajmar
4 Replies

4. Shell Programming and Scripting

Duplicate records

Gents, I have a file which contends duplicate records in column 1, but the values in column 2 are different. 3099753489 3 3099753489 5 3101954341 12 3101954341 14 3102153285 3 3102153285 5 3102153297 3 3102153297 5 I will like to get something like this: output desired... (16 Replies)
Discussion started by: jiam912
16 Replies

5. Shell Programming and Scripting

[bash] wanted: function with a clean way for multiple return values

Hi, I have a small part of a project which is done as a bash script. bash was selected as an portability issue that works out of the box. In this script I have an exec shell-function, a wrapper around arbitrary commands. I want to have STDOUT, as an addon STDERR and the EXIT-CODE of a specified... (5 Replies)
Discussion started by: stomp
5 Replies

6. Shell Programming and Scripting

Convert Column Values to a Range of Values

I have a list of columns with values that I need to transform into a row containing the range of each column. For example: "Column A" 1 2 3 4 10 12 14 15 16 17 18 "Column B" 1 4 5 6 (4 Replies)
Discussion started by: newbio
4 Replies

7. Shell Programming and Scripting

Deleting duplicate records from file 1 if records from file 2 match

I have 2 files "File 1" is delimited by ";" and "File 2" is delimited by "|". File 1 below (3 record shown): Doc1;03/01/2012;New York;6 Main Street;Mr. Smith 1;Mr. Jones Doc2;03/01/2012;Syracuse;876 Broadway;John Davis;Barbara Lull Doc3;03/01/2012;Buffalo;779 Old Windy Road;Charles... (2 Replies)
Discussion started by: vestport
2 Replies

8. UNIX for Dummies Questions & Answers

Need to keep duplicate records

Consider my input is 10 10 20 then, uniq -u will give 20 and uniq -dwill return 10. But i need the output as , 10 10 How we can achieve this? Thanks (4 Replies)
Discussion started by: pandeesh
4 Replies

9. UNIX for Dummies Questions & Answers

Getting non-duplicate records

Hi, I have a file with these records abc xyz xyz pqr uvw cde cde In my o/p file , I want all the non duplicate rows to be shown. o/p abc pqr uvw Any suggestions how to do this? Thanks for the help. rs (2 Replies)
Discussion started by: rs123
2 Replies

10. Shell Programming and Scripting

Range of records using comparision \awk

Hi Gurus, I have to fetch the records from a logs as per the time stamp . I am comfortable to use awk and sed in the script . But the logic to fetch the records as per comparison is the problem. $cat my_log.log <Jul 30, 2010 7:01:12 AM EEST> <Error> <WebLogicServer> <Jul 30, 2010 8:04:12 AM... (3 Replies)
Discussion started by: posix
3 Replies
Login or Register to Ask a Question