awk solution for taking bins


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers awk solution for taking bins
# 1  
awk solution for taking bins

Hi all, I'm looking for an awk solution for taking bins of data set.
For example, if I have two columns of data that I wish to use for a scatter plot, and it contains 5 million lines, how can I take averages of every 100 points, 1000, 10000 etc...
The idea is to take bins of the 5,000,000 points and reduce the density.

Code:
$ cat largefile.txt
x        y
1       45
2       46
3       87
4       34
5       36
6       36
7       23
...     ...
5mil    228

how to take bins every "n" points of y.

Thanks in advance.
# 2  
I could interpret your request several different ways. Please show us an example of the output you're trying to produce.
# 3  
Thanks for the response, here is an example:

input
Code:
$ cat file.txt
1      3
2      2
3      1
4      10
5      10
6      10
7      25
8      30
9      60

output example 1 (using bins of 3 - average every third point)
Code:
2      2
5      10
8      38.3

output example 2 (using bins of 2 - average every second point)
Code:
1.5    2.5
3.5    5.5
5.5    10
7.5    27.5

Not sure what the best tool is!

Many thanks,
Torch
# 4  
I am not sure I understand your question here.
What is a "bin"?
Can you please post your example with a clear explanation.
# 5  
Hi, I'll try to be more clear with the example.

Thanks for the response, here is an example, i'll focus on just the second column

input
Code:
$ cat file.2.txt
3
2
1
10
10
10
25
30
60

The purpose is to reduce the data for an x,y scatterplot, because the file is millions of lines long. Instead of plotting every point, I want to take an average of every "n" number of points, and plot that one number. Bin might not be the correct word, perhaps a "rolling-average"? For example a bin of 3 would break the data down like so:

Code:
$ cat file.2.txt
#bin A
3
2         #average all three = 2
1

#bin B
10
10     #average all three = 10
10

#bin C
25
30     # average all 3 = 38.3
60

Output would then be:

Code:
2
10
38.3

For the case where bin is 2

Code:
$ cat file.2.txt
#bin A
3    # average = 2.5
2

#bin B
1    # average = 5.5
10

#binC
10    # average = 10
10

#bin D
25    # average = 27.5
30

#binE - ignored because only one value
60

Finally, doing this for both the x and y axis (the original file), for bin of 3:

Input:
Code:
$ cat file.txt
1      3
2      2
3      1
4      10
5      10
6      10
7      25
8      30
9      60

Output:
2      2
5      10
8      38.8

Many thanks, I hope this is more clear
Torch
# 6  
Code:
awk -v bin=3 ' BEGIN {
                c = 1
} c <= bin {
                ++c
                f += $1
                s += $2
} c > bin {
                printf "%d %.1f\n", f / bin, s / bin
                c = 1
                f = 0
                s = 0
} ' file.txt

# 7  
Code:
awk     '               {Xsum+=$1; Ysum+=$2}
         !(NR%bin)      {print Xsum/bin, Ysum/bin; Xsum=Ysum=0}
        ' bin=3 file.txt
2 2
5 10
8 38.3333

There's nothing foreseen for residual lines at the end, i.e. two orphan lines when bin=3. Pls specify.
This User Gave Thanks to RudiC For This Post:
 

Previous Thread | Next Thread
Thread Tools Search this Thread
Search this Thread:
Advanced Search

Test Your Knowledge in Computers #406
Difficulty: Medium
Cygwin was originally developed by Cygnus Solutions, which was later acquired by Microsoft.
True or False?

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to select 2D data bins

I wish to use AWK to do something akin: Select all 2D data with 1<$1<2 and -7.5<$2<-6.5 But it's not working awk 'END {print ($1<=2&&$1>=1&&$2<=-6.5&&$2>=-7.5)}' bla Data: -1.06897 -8.04482 -61.469 -1.13613 -8.04482 -61.2271 -1.00182 -8.04482 -61.2081 -1.06897 -8.13518 -60.8544... (2 Replies)
Discussion started by: chrisjorg
2 Replies

2. Shell Programming and Scripting

awk command line arguments not taking

# more minusf.awk #!/bin/awk -f BEGIN { FS=":"; } { if ( $2 == "" ) { print $1 ": no password!"; } } # ./minusf.awk aa aa aa aa awk: can't open aa (6 Replies)
Discussion started by: sri.phani
6 Replies

3. Shell Programming and Scripting

Taking inputs for awk

Hi, i want to print 2nd column value with the below script. I need to take input of the string i need to search in that file and file name. How can i take these two as inputs? using read command? Getting error for below script. echo "enter SID" read SID echo "enter filename" read filename... (8 Replies)
Discussion started by: sam_bd
8 Replies

4. Shell Programming and Scripting

Cannot get the correct ans. Using awk in taking average

Hi all, I think so I’m getting the result is wrong, while using following awk commend, colval=$(awk 'FNR>1 && NR==FNR{a=$4;next;} FNR>1 {a+=$4; print $2"\t"a/3}' filename_f.tsv filename_f2.tsv filename_f3.tsv) echo $colval >> Result.tsv it’s doing the condition 2 times, first result... (5 Replies)
Discussion started by: Shenbaga.d
5 Replies

5. UNIX for Dummies Questions & Answers

taking the output of awk command to a new file

cat doc | nawk -v da="${date}" '$23>199 {print $0 > "doc"+da+".txt"}' Every time(need to run every day) i run this, i want to a create a new file "doc_01 Aug.txt". Basically, i want to create a new file with date appended in it. The above command is creating a file with name "0".... (4 Replies)
Discussion started by: vagar11
4 Replies

6. Solaris

Redirecting print to optional output bins

Guys We have a HP P4015 laserjet printer with a 5 bin mailbox attached & configured. We can print to the specific output bins from Oracle e-Business suite, however our print output format is incompatible so it prints out random characters instead of the letter content. I have looked... (2 Replies)
Discussion started by: s1977
2 Replies

7. Shell Programming and Scripting

Calculating frequency of values within bins

Hi, I am working with files containing 2 columns in which i need to come up with the frequency/count of values in col. 2 falling within specifics binned values of col. 1. the contents of a sample file is shown below: 15 12.5 15 11.2 16 0.2 16 1.4 17 1.6 18 4.5 17 5.6 12 8.6 11 7.2 9 ... (13 Replies)
Discussion started by: ida1215
13 Replies

8. Shell Programming and Scripting

Awk solution

Hello! Well, I searched and wasn't able to find a specific example of my dilemma, so hopefully someone could assist? Or maybe there was an example but I missed it? I have two files: file1 = order data file file2 = list of 65,000+ order numbers I would like to extract from 'file1' any... (5 Replies)
Discussion started by: rm -r *
5 Replies

9. Shell Programming and Scripting

Is there a awk solution for this??

I am writing a awk script that gathers certain data from certain fields. I needed a awk solution for this, because it will later become a function in the script. I have the following data that I need output on a single line, but record spans across multilple lines and records are not... (7 Replies)
Discussion started by: timj123
7 Replies

10. Shell Programming and Scripting

Bash/AWK Newbie taking on more than he can chew.

A few questions: I'm trying to use Bash (although I'm not against using AWK) to try to accomplish a few things, but I'm stumped on a few points. I'm learning most of the basics quickly: but there are a few things I can't figure out. 1. I'm trying to count the number of .txt files in a... (3 Replies)
Discussion started by: Asylus
3 Replies

Featured Tech Videos