awk solution for taking bins

02-26-2013

Registered User

71, 1

Join Date: Apr 2012

Last Activity: 5 February 2017, 4:01 PM EST

Posts: 71

Thanks Given: 23

Thanked 1 Time in 1 Post

awk solution for taking bins

Hi all, I'm looking for an awk solution for taking bins of data set.
For example, if I have two columns of data that I wish to use for a scatter plot, and it contains 5 million lines, how can I take averages of every 100 points, 1000, 10000 etc...
The idea is to take bins of the 5,000,000 points and reduce the density.

Code:

$ cat largefile.txt
x        y
1       45
2       46
3       87
4       34
5       36
6       36
7       23
...     ...
5mil    228

how to take bins every "n" points of y.

Thanks in advance.

torchij

View Public Profile for torchij

Find all posts by torchij

02-26-2013

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

I could interpret your request several different ways. Please show us an example of the output you're trying to produce.

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

02-28-2013

Registered User

71, 1

Join Date: Apr 2012

Last Activity: 5 February 2017, 4:01 PM EST

Posts: 71

Thanks Given: 23

Thanked 1 Time in 1 Post

Thanks for the response, here is an example:

input

Code:

$ cat file.txt
1      3
2      2
3      1
4      10
5      10
6      10
7      25
8      30
9      60

output example 1 (using bins of 3 - average every third point)

Code:

2      2
5      10
8      38.3

output example 2 (using bins of 2 - average every second point)

Code:

1.5    2.5
3.5    5.5
5.5    10
7.5    27.5

Not sure what the best tool is!

Many thanks,
Torch

torchij

View Public Profile for torchij

Find all posts by torchij

02-28-2013

Registered User

48, 3

Join Date: Oct 2008

Last Activity: 16 August 2015, 10:58 PM EDT

Location: CT

Posts: 48

Thanks Given: 7

Thanked 3 Times in 3 Posts

I am not sure I understand your question here.
What is a "bin"?
Can you please post your example with a clear explanation.

grep_me

View Public Profile for grep_me

Find all posts by grep_me

02-28-2013

Registered User

71, 1

Join Date: Apr 2012

Last Activity: 5 February 2017, 4:01 PM EST

Posts: 71

Thanks Given: 23

Thanked 1 Time in 1 Post

Hi, I'll try to be more clear with the example.

Thanks for the response, here is an example, i'll focus on just the second column

input

Code:

$ cat file.2.txt
3
2
1
10
10
10
25
30
60

The purpose is to reduce the data for an x,y scatterplot, because the file is millions of lines long. Instead of plotting every point, I want to take an average of every "n" number of points, and plot that one number. Bin might not be the correct word, perhaps a "rolling-average"? For example a bin of 3 would break the data down like so:

Code:

$ cat file.2.txt
#bin A
3
2         #average all three = 2
1

#bin B
10
10     #average all three = 10
10

#bin C
25
30     # average all 3 = 38.3
60

Output would then be:

Code:

2
10
38.3

For the case where bin is 2

Code:

$ cat file.2.txt
#bin A
3    # average = 2.5
2

#bin B
1    # average = 5.5
10

#binC
10    # average = 10
10

#bin D
25    # average = 27.5
30

#binE - ignored because only one value
60

Finally, doing this for both the x and y axis (the original file), for bin of 3:

Input:

Code:

$ cat file.txt
1      3
2      2
3      1
4      10
5      10
6      10
7      25
8      30
9      60

Output:
2      2
5      10
8      38.8

Many thanks, I hope this is more clear
Torch

torchij

View Public Profile for torchij

Find all posts by torchij

02-28-2013

Moderator

3,689, 1,352

Join Date: Jan 2012

Last Activity: 22 August 2020, 11:29 PM EDT

Location: Galactic Empire

Posts: 3,689

Thanks Given: 268

Thanked 1,352 Times in 1,258 Posts

Code:

awk -v bin=3 ' BEGIN {
                c = 1
} c <= bin {
                ++c
                f += $1
                s += $2
} c > bin {
                printf "%d %.1f\n", f / bin, s / bin
                c = 1
                f = 0
                s = 0
} ' file.txt

Yoda

View Public Profile for Yoda

Visit Yoda's homepage!

Find all posts by Yoda

02-28-2013

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

Code:

awk     '               {Xsum+=$1; Ysum+=$2}
         !(NR%bin)      {print Xsum/bin, Ysum/bin; Xsum=Ysum=0}
        ' bin=3 file.txt
2 2
5 10
8 38.3333

There's nothing foreseen for residual lines at the end, i.e. two orphan lines when bin=3. Pls specify.

This User Gave Thanks to RudiC For This Post:

RudiC

View Public Profile for RudiC

Find all posts by RudiC

UNIX for Dummies Questions & Answers

awk solution for taking bins

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Create bins with totals and percentage

Discussion started by: jiam912

2. Shell Programming and Scripting

awk to select 2D data bins

Discussion started by: chrisjorg

3. Shell Programming and Scripting

awk command line arguments not taking

Discussion started by: sri.phani

4. Shell Programming and Scripting

Taking inputs for awk

Discussion started by: sam_bd

5. Shell Programming and Scripting

Cannot get the correct ans. Using awk in taking average

Discussion started by: Shenbaga.d

6. UNIX for Dummies Questions & Answers

taking the output of awk command to a new file

Discussion started by: vagar11

7. Shell Programming and Scripting

Calculating frequency of values within bins

Discussion started by: ida1215

8. Shell Programming and Scripting

Awk solution

Discussion started by: rm -r *

9. Shell Programming and Scripting

Is there a awk solution for this??

Discussion started by: timj123

10. Shell Programming and Scripting

Bash/AWK Newbie taking on more than he can chew.

Discussion started by: Asylus