Get the average from column, and eliminate the duplicate values.

02-23-2014

Registered User

411, 1

Join Date: Aug 2010

Last Activity: 23 May 2020, 10:33 PM EDT

Location: EEUU

Posts: 411

Thanks Given: 317

Thanked 1 Time in 1 Post

Get the average from column, and eliminate the duplicate values.

Dear Experts,

Kindly help me please,
I have a big file where there is duplicate values in col 11 till col 23, every 2 rows appers a new numbers, but in each row there is different coordinates x and y in col 57 till col 74.
Please i will like to get a single value and average of the x and y coordinates.
Example
Imput file

Code:

A         25235.0 21449.012 7 75   1  -3162771 77 43 23 865933.2 1931450.7  22.5   897 102 1                   1T N/A 54000038 0.81383
A         25235.0 21449.012 6 75   1   4163171 79 37 21 865925.2 1931462.8  23.1   897 102 1              P    1T N/A 54000038 0.81383
A         25015.0 20921.01311 75   1  -4153571 75 58 23 857254.8 1920083.9 -22.2   188 103 1              P    1T N/A 54000056 0.81382
A         25015.0 20921.01310 75   2  -4163868 76 36 19 857246.4 1920096.1 -22.2   188 103 1              P    1T N/A 54000056 0.81382
A         25233.0 21449.012 7 75   1   2142770 77 36 25 865970.9 1931408.5  22.9   896 102 1                   1T N/A 54000135 0.81383
A         25233.0 21449.012 6 75   1   3122671 78 44 28 865963.9 1931420.0  23.0   896 102 1                   1T N/A 54000135 0.81383
A         25013.0 20921.01311 75   1  -4132772 76 61 23 857279.7 1920040.5 -22.0   187 103 1                   1T N/A 54000153 0.81382
A         25013.0 20921.01310 75   2  -4122770 77 42 20 857272.1 1920051.7 -22.2   187 103 1              P    1T N/A 54000153 0.81382
A         25011.0 20921.01311 75   1   3195471 76 53 22 857305.0 1919996.0 -21.9   186 103 1              P    1T N/A 54000235 0.81382
A         25011.0 20921.01310 75   2  -4132669 75 38 21 857297.0 1920007.7 -22.1   186 103 1              P    1T N/A 54000235 0.81382
A         25231.0 21449.012 7 75   1  -3122671 78 37 30 865983.2 1931352.7  22.4   964 102 1                   1T N/A 54000253 0.81382
A         25231.0 21449.012 6 75   1  -3132571 80 40 26 865977.8 1931367.7  23.0   964 102 1                   1T N/A 54000253 0.81382

desired output

Code:

A         25235.0 21449.012 7 75   1  -3162771 77 43 23 865933.2 1931456.6  22.5   897 102 1                   1T N/A 54000038 0.81383
A         25015.0 20921.01311 75   1  -4153571 75 58 23 857250.6 1920090.0 -22.2   188 103 1              P    1T N/A 54000056 0.81382
A         25233.0 21449.012 7 75   1   2142770 77 36 25 865967.4 1931414.3  22.9   896 102 1                   1T N/A 54000135 0.81383
A         25013.0 20921.01311 75   1  -4132772 76 61 23 857275.9 1920046.1 -22.0   187 103 1                   1T N/A 54000153 0.81382
A         25011.0 20921.01311 75   1   3195471 76 53 22 857301.0 1920001.9 -21.9   186 103 1              P    1T N/A 54000235 0.81382

Thanks in advance

Attached file

input_file.txt (1.6 KB)

output_file.txt (678 Bytes)

jiam912

View Public Profile for jiam912

Find all posts by jiam912

02-23-2014

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

With over a hundred posts in these forums, we would hope that you are learning how to do things like this.

What have you tried?

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

02-23-2014

Registered User

411, 1

Join Date: Aug 2010

Last Activity: 23 May 2020, 10:33 PM EDT

Location: EEUU

Posts: 411

Thanks Given: 317

Thanked 1 Time in 1 Post

Nothing yet?

jiam912

View Public Profile for jiam912

Find all posts by jiam912

02-23-2014

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

Your input and output files are in DOS format (with lines terminated by CR and NL) rather than the normal UNIX format (with lines terminated by NL). Furthermore, the last line in both files is incomplete (with no line terminator). You have 11 complete lines and 1 partial line in your input file and 4 complete lines and 1 partial line in your output file.

Do you want to ignore the last two input lines because the last input line is incomplete? Or is there some other reason why there is nothing in your sample output file corresponding to the last two input lines?

Will all of your input files be in DOS format?

Will the last line in all of your input files be missing the line terminator?

Do you want the output file to be in DOS format or UNIX format?

Do you really want the last line of your output file to be missing the line terminator?

You said that there are new values in columns 11-23 every two lines, and that X and Y coordinates for rows with the same value should be averaged. Do we need to verify that columns 11-23 (or is it columns 11-25 or 11-27) match, or can we just get the average coordinates for every pair of lines? (If columns 11-23, 11-25, or 11-27 have the same values on more that two lines, should the coordinates on all consecutive lines with the same values in those columns be averaged, or do you just want to average pairs of lines?)

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

02-23-2014

Registered User

411, 1

Join Date: Aug 2010

Last Activity: 23 May 2020, 10:33 PM EDT

Location: EEUU

Posts: 411

Thanks Given: 317

Thanked 1 Time in 1 Post

Dear Don Cragun
Thanks for your support
I use all my files in linux only I have edited the files in windows to send to the forum maybe
This is the problem.
Yes please we need to verify that the columns 11-27 match and get the average x and y can be 2 or more rows . But as I say before in the output will be a single value 11-27.with the average of x and y
Thanks again

jiam912

View Public Profile for jiam912

Find all posts by jiam912

02-23-2014

Registered User

1,910, 488

Join Date: Sep 2008

Last Activity: 22 December 2019, 2:31 AM EST

Location: San Jose, CA

Posts: 1,910

Thanks Given: 54

Thanked 488 Times in 481 Posts

[Updated Version]

Try this...

Code:

awk '{
    k=substr($0, 11, 13)
    _xy=substr($0, 57, 18);
    split(_xy, aa)
    x[k]+=aa[1]; y[k]+=aa[2]; s[k]++
    if(k in key) next
    key[k]=$0
  }
  END{
    for(k in key){
      _xy=substr(key[k], 57, 18);
      sub(_xy, sprintf("%.1f", x[k]/s[k])" "sprintf("%.1f", y[k]/s[k]), key[k])
      print key[k]
      delete key[k]
    }
  } ' infile

The first line seen is printed after taking the average of the x,y of the subsequent records. The output will not be in the same order as the input.
If you feel, there is an issue with round off - implement the round off function from here https://www.gnu.org/software/gawk/ma...-Function.html

--ahamed

Last edited by ahamed101; 02-23-2014 at 05:31 PM..

This User Gave Thanks to ahamed101 For This Post:

ahamed101

View Public Profile for ahamed101

Find all posts by ahamed101

02-23-2014

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

You did not answer Don Cragun's question about what you had tried... and your desired output does not match your specification (first line's avarage x is wrong, and the last line is missing). However, try also

Code:

awk     '                       {A = substr ($0,11,17); B = substr ($0, 57, 8); C = substr ($0, 66, 9)}
         A != OA && NR > 1      {printf "%s%6.1f %6.1f%s\n", substr(D,1,56), SUM1/CNT, SUM2/CNT, substr(D,75); CNT=0; SUM1=SUM2=""}
         END                    {printf "%s%6.1f %6.1f%s\n", substr(D,1,56), SUM1/CNT, SUM2/CNT, substr(D,75); CNT=0; SUM1=SUM2=""}
                                {OA = A; D=$0; SUM1+=B; SUM2+=C; CNT++}
        ' file
A         25235.0 21449.012 6 75   1   4163171 79 37 21 865929.2 1931456.8  23.1   897 102 1              P    1T N/A 54000038 0.81383
A         25015.0 20921.01310 75   2  -4163868 76 36 19 857250.6 1920090.0 -22.2   188 103 1              P    1T N/A 54000056 0.81382
A         25233.0 21449.012 6 75   1   3122671 78 44 28 865967.4 1931414.2  23.0   896 102 1                   1T N/A 54000135 0.81383
A         25013.0 20921.01310 75   2  -4122770 77 42 20 857275.9 1920046.1 -22.2   187 103 1              P    1T N/A 54000153 0.81382
A         25011.0 20921.01310 75   2  -4132669 75 38 21 857301.0 1920001.9 -22.1   186 103 1              P    1T N/A 54000235 0.81382
A         25231.0 21449.012 6 75   1  -3132571 80 40 26 865980.5 1931360.2  23.0   964 102 1                   1T N/A 54000253 0.81382

Last edited by RudiC; 02-23-2014 at 05:45 PM..

This User Gave Thanks to RudiC For This Post:

RudiC

View Public Profile for RudiC

Find all posts by RudiC

Shell Programming and Scripting

Get the average from column, and eliminate the duplicate values.

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Find lines with duplicate values in a particular column

Discussion started by: kaktus

2. Shell Programming and Scripting

Find duplicate values in specific column and delete all the duplicate values

Discussion started by: sajmar

3. Shell Programming and Scripting

Remove duplicate values in a column(not in the file)

Discussion started by: ratheeshjulk

4. Shell Programming and Scripting

Filter file to remove duplicate values in first column

Discussion started by: LMHmedchem

5. Shell Programming and Scripting

Identify duplicate values at first column in csv file

Discussion started by: deadyetagain

6. Shell Programming and Scripting

Average values of duplicate rows

Discussion started by: Sanchari

7. Shell Programming and Scripting

Average of columns with values of other column with same name

Discussion started by: isildur1234

8. UNIX for Dummies Questions & Answers

[SOLVED] remove lines that have duplicate values in column two

Discussion started by: pathunkathunk

9. Shell Programming and Scripting

Average values in a column based on range

Discussion started by: bhargavpbk88

10. Shell Programming and Scripting

Find and replace duplicate column values in a row

Discussion started by: nuthalapati