awk solution for taking bins


 
Thread Tools Search this Thread
# 1  
awk solution for taking bins

Hi all, I'm looking for an awk solution for taking bins of data set.
For example, if I have two columns of data that I wish to use for a scatter plot, and it contains 5 million lines, how can I take averages of every 100 points, 1000, 10000 etc...
The idea is to take bins of the 5,000,000 points and reduce the density.

Code:
$ cat largefile.txt
x        y
1       45
2       46
3       87
4       34
5       36
6       36
7       23
...     ...
5mil    228

how to take bins every "n" points of y.

Thanks in advance.
# 2  
I could interpret your request several different ways. Please show us an example of the output you're trying to produce.
# 3  
Thanks for the response, here is an example:

input
Code:
$ cat file.txt
1      3
2      2
3      1
4      10
5      10
6      10
7      25
8      30
9      60

output example 1 (using bins of 3 - average every third point)
Code:
2      2
5      10
8      38.3

output example 2 (using bins of 2 - average every second point)
Code:
1.5    2.5
3.5    5.5
5.5    10
7.5    27.5

Not sure what the best tool is!

Many thanks,
Torch
# 4  
I am not sure I understand your question here.
What is a "bin"?
Can you please post your example with a clear explanation.
# 5  
Hi, I'll try to be more clear with the example.

Thanks for the response, here is an example, i'll focus on just the second column

input
Code:
$ cat file.2.txt
3
2
1
10
10
10
25
30
60

The purpose is to reduce the data for an x,y scatterplot, because the file is millions of lines long. Instead of plotting every point, I want to take an average of every "n" number of points, and plot that one number. Bin might not be the correct word, perhaps a "rolling-average"? For example a bin of 3 would break the data down like so:

Code:
$ cat file.2.txt
#bin A
3
2         #average all three = 2
1

#bin B
10
10     #average all three = 10
10

#bin C
25
30     # average all 3 = 38.3
60

Output would then be:

Code:
2
10
38.3

For the case where bin is 2

Code:
$ cat file.2.txt
#bin A
3    # average = 2.5
2

#bin B
1    # average = 5.5
10

#binC
10    # average = 10
10

#bin D
25    # average = 27.5
30

#binE - ignored because only one value
60

Finally, doing this for both the x and y axis (the original file), for bin of 3:

Input:
Code:
$ cat file.txt
1      3
2      2
3      1
4      10
5      10
6      10
7      25
8      30
9      60

Output:
2      2
5      10
8      38.8

Many thanks, I hope this is more clear
Torch
# 6  
Code:
awk -v bin=3 ' BEGIN {
                c = 1
} c <= bin {
                ++c
                f += $1
                s += $2
} c > bin {
                printf "%d %.1f\n", f / bin, s / bin
                c = 1
                f = 0
                s = 0
} ' file.txt

# 7  
Code:
awk     '               {Xsum+=$1; Ysum+=$2}
         !(NR%bin)      {print Xsum/bin, Ysum/bin; Xsum=Ysum=0}
        ' bin=3 file.txt
2 2
5 10
8 38.3333

There's nothing foreseen for residual lines at the end, i.e. two orphan lines when bin=3. Pls specify.
This User Gave Thanks to RudiC For This Post:
 

Previous Thread | Next Thread
Thread Tools Search this Thread
Search this Thread:
Advanced Search

Test Your Knowledge in Computers #691
Difficulty: Medium
UnixWare is a Unix operating system originally released by Univel.
True or False?

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to select 2D data bins

I wish to use AWK to do something akin: Select all 2D data with 1<$1<2 and -7.5<$2<-6.5 But it's not working awk 'END {print ($1<=2&&$1>=1&&$2<=-6.5&&$2>=-7.5)}' bla Data: -1.06897 -8.04482 -61.469 -1.13613 -8.04482 -61.2271 -1.00182 -8.04482 -61.2081 -1.06897 -8.13518 -60.8544... (2 Replies)
Discussion started by: chrisjorg
2 Replies

2. Shell Programming and Scripting

awk command line arguments not taking

# more minusf.awk #!/bin/awk -f BEGIN { FS=":"; } { if ( $2 == "" ) { print $1 ": no password!"; } } # ./minusf.awk aa aa aa aa awk: can't open aa (6 Replies)
Discussion started by: sri.phani
6 Replies

3. Shell Programming and Scripting

Taking inputs for awk

Hi, i want to print 2nd column value with the below script. I need to take input of the string i need to search in that file and file name. How can i take these two as inputs? using read command? Getting error for below script. echo "enter SID" read SID echo "enter filename" read filename... (8 Replies)
Discussion started by: sam_bd
8 Replies

4. Shell Programming and Scripting

Cannot get the correct ans. Using awk in taking average

Hi all, I think so I’m getting the result is wrong, while using following awk commend, colval=$(awk 'FNR>1 && NR==FNR{a=$4;next;} FNR>1 {a+=$4; print $2"\t"a/3}' filename_f.tsv filename_f2.tsv filename_f3.tsv) echo $colval >> Result.tsv it’s doing the condition 2 times, first result... (5 Replies)
Discussion started by: Shenbaga.d
5 Replies

5. UNIX for Dummies Questions & Answers

taking the output of awk command to a new file

cat doc | nawk -v da="${date}" '$23>199 {print $0 > "doc"+da+".txt"}' Every time(need to run every day) i run this, i want to a create a new file "doc_01 Aug.txt". Basically, i want to create a new file with date appended in it. The above command is creating a file with name "0".... (4 Replies)
Discussion started by: vagar11
4 Replies

6. Solaris

Redirecting print to optional output bins

Guys We have a HP P4015 laserjet printer with a 5 bin mailbox attached & configured. We can print to the specific output bins from Oracle e-Business suite, however our print output format is incompatible so it prints out random characters instead of the letter content. I have looked... (2 Replies)
Discussion started by: s1977
2 Replies

7. Shell Programming and Scripting

Calculating frequency of values within bins

Hi, I am working with files containing 2 columns in which i need to come up with the frequency/count of values in col. 2 falling within specifics binned values of col. 1. the contents of a sample file is shown below: 15 12.5 15 11.2 16 0.2 16 1.4 17 1.6 18 4.5 17 5.6 12 8.6 11 7.2 9 ... (13 Replies)
Discussion started by: ida1215
13 Replies

8. Shell Programming and Scripting

Awk solution

Hello! Well, I searched and wasn't able to find a specific example of my dilemma, so hopefully someone could assist? Or maybe there was an example but I missed it? I have two files: file1 = order data file file2 = list of 65,000+ order numbers I would like to extract from 'file1' any... (5 Replies)
Discussion started by: rm -r *
5 Replies

9. Shell Programming and Scripting

Is there a awk solution for this??

I am writing a awk script that gathers certain data from certain fields. I needed a awk solution for this, because it will later become a function in the script. I have the following data that I need output on a single line, but record spans across multilple lines and records are not... (7 Replies)
Discussion started by: timj123
7 Replies

10. Shell Programming and Scripting

Bash/AWK Newbie taking on more than he can chew.

A few questions: I'm trying to use Bash (although I'm not against using AWK) to try to accomplish a few things, but I'm stumped on a few points. I'm learning most of the basics quickly: but there are a few things I can't figure out. 1. I'm trying to count the number of .txt files in a... (3 Replies)
Discussion started by: Asylus
3 Replies

Featured Tech Videos