awk for histogram


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk for histogram
# 1  
Old 11-01-2012
awk for histogram

I have a single file that looks like this:
Code:
1.62816
1.62816
0.86941
0.86941
0.731465
0.731465
1.03174
1.03174
0.769444
0.769444
0.981181
0.981181
1.14681
1.14681
1.00511
1.00511
1.20385
1.20385
0.0340752
0.0340752

I am trying to plot a probability distribution/histogram, so I'd like to divide this into bins of of equal width based on the maximum and minimum values, and output the number of items in a bin such that my final output data would be like:

Code:
Range  Midpoint  No_of_data_in_range
0-0.02     0.01      2
0.02-0.04 0.03     7

I have tried using something like:
Code:
wk '{if($1>= 0.0 && $1 <=0.1) {print }}' out.dat

but it doesnt seem to work well as it is too manual. Could someone out here help me out with this?
# 2  
Old 11-01-2012
[edit] working on it.

Code:
$ awk '{
        BIN=sprintf("%d", $1*(1/BINSIZE))+0;
        DATA[BIN]++;
        if((!MIN)||(MIN>BIN)) MIN=BIN;
        if((!MAX)||(MAX<BIN)) MAX=BIN;
 }
END {
        for(BIN=MIN; BIN<=MAX; BIN++)
                printf("%+2.5f-%+2.5f\t%d\n", (BIN*BINSIZE), (BIN*BINSIZE)+(BINSIZE-0.00001), DATA[BIN]);
}' BINSIZE=0.02 datafile

+0.02000-+0.03999       2
+0.04000-+0.05999       0
+0.06000-+0.07999       0
+0.08000-+0.09999       0
+0.10000-+0.11999       0
+0.12000-+0.13999       0
+0.14000-+0.15999       0
+0.16000-+0.17999       0
+0.18000-+0.19999       0
+0.20000-+0.21999       0
+0.22000-+0.23999       0
+0.24000-+0.25999       0
+0.26000-+0.27999       0
+0.28000-+0.29999       0
+0.30000-+0.31999       0
+0.32000-+0.33999       0
+0.34000-+0.35999       0
+0.36000-+0.37999       0
+0.38000-+0.39999       0
+0.40000-+0.41999       0
+0.42000-+0.43999       0
+0.44000-+0.45999       0
+0.46000-+0.47999       0
+0.48000-+0.49999       0
+0.50000-+0.51999       0
+0.52000-+0.53999       0
+0.54000-+0.55999       0
+0.56000-+0.57999       0
+0.58000-+0.59999       0
+0.60000-+0.61999       0
+0.62000-+0.63999       0
+0.64000-+0.65999       0
+0.66000-+0.67999       0
+0.68000-+0.69999       0
+0.70000-+0.71999       0
+0.72000-+0.73999       2
+0.74000-+0.75999       0
+0.76000-+0.77999       2
+0.78000-+0.79999       0
+0.80000-+0.81999       0
+0.82000-+0.83999       0
+0.84000-+0.85999       0
+0.86000-+0.87999       2
+0.88000-+0.89999       0
+0.90000-+0.91999       0
+0.92000-+0.93999       0
+0.94000-+0.95999       0
+0.96000-+0.97999       0
+0.98000-+0.99999       2
+1.00000-+1.01999       2
+1.02000-+1.03999       2
+1.04000-+1.05999       0
+1.06000-+1.07999       0
+1.08000-+1.09999       0
+1.10000-+1.11999       0
+1.12000-+1.13999       0
+1.14000-+1.15999       2
+1.16000-+1.17999       0
+1.18000-+1.19999       0
+1.20000-+1.21999       2
+1.22000-+1.23999       0
+1.24000-+1.25999       0
+1.26000-+1.27999       0
+1.28000-+1.29999       0
+1.30000-+1.31999       0
+1.32000-+1.33999       0
+1.34000-+1.35999       0
+1.36000-+1.37999       0
+1.38000-+1.39999       0
+1.40000-+1.41999       0
+1.42000-+1.43999       0
+1.44000-+1.45999       0
+1.46000-+1.47999       0
+1.48000-+1.49999       0
+1.50000-+1.51999       0
+1.52000-+1.53999       0
+1.54000-+1.55999       0
+1.56000-+1.57999       0
+1.58000-+1.59999       0
+1.60000-+1.61999       0
+1.62000-+1.63999       2

$


Last edited by Corona688; 11-01-2012 at 03:35 PM..
This User Gave Thanks to Corona688 For This Post:
# 3  
Old 11-02-2012
0.01 was just an example. Not from the data. I'll try your solution now
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Generating histogram

Hi, I have 2 files with similar structure - reference and test that I would like to BIN both and generate the comparison. input files structure is: a 3 b 10 c 3 d 7 e 1 f 4 g 9 h 6 I would like the output to be (lets say both reference and test are the file above - no diff) BIN ... (3 Replies)
Discussion started by: yan1
3 Replies

2. Shell Programming and Scripting

awk Sort 2d histogram output from min(X,Y) to max(X,Y)

I've got Gnuplot-format 2D histogram data output which looks as follows. 6.5 -1.25 10.2804 6.5404 -1.25 10.4907 6.58081 -1.25 10.8087 6.62121 -1.25 10.4686 6.66162 -1.25 10.506 6.70202 -1.25 10.3084 6.74242 -1.25 9.68256 6.78283 -1.25 9.41229 6.82323 -1.25 9.43078 6.86364 -1.25 9.62408... (1 Reply)
Discussion started by: chrisjorg
1 Replies

3. Programming

Frequency percentage distribution histogram with R

I am new to R and would like to calculate the percentage frequency distribution of h1 and h2. How can I combine h1 and h2 in one plot? I tried the following code. h1=c(5.18,4.61,3.30,7.58,3.00,3.80,1.95,2.67,2.77,2.73,2.33,3.36,3.50,1.91,4.25,3.87,2.86,2.26,2.00,3.86,3.33,3.59,4.00)... (0 Replies)
Discussion started by: ayyappa342
0 Replies

4. UNIX for Dummies Questions & Answers

Help with xmgrace histogram

Hi All, I am new to Grace and would like to plot histograms. My input files have one column for frequencies and another column for distances, for example: 1 2.6 4 2.7 5 2.8 2 3.9 2 4.0 4 4.7 4 4.8 4 4.9 ... I want to plot a histogram ranging from 0 to 10 with... (0 Replies)
Discussion started by: sxiong
0 Replies

5. Shell Programming and Scripting

HELP with AWK one-liner. Need to employ an If condition inside AWK to check for array variable ?

Hello experts, I'm stuck with this script for three days now. Here's what i need. I need to split a large delimited (,) file into 2 files based on the value present in the last field. Samp: Something.csv bca,adc,asdf,123,12C bca,adc,asdf,123,13C def,adc,asdf,123,12A I need this split... (6 Replies)
Discussion started by: shell_boy23
6 Replies

6. Shell Programming and Scripting

awk command to compare a file with set of files in a directory using 'awk'

Hi, I have a situation to compare one file, say file1.txt with a set of files in directory.The directory contains more than 100 files. To be more precise, the requirement is to compare the first field of file1.txt with the first field in all the files in the directory.The files in the... (10 Replies)
Discussion started by: anandek
10 Replies

7. Shell Programming and Scripting

AWK counting interval / histogram data

My data looks like this: frame phi psi 0 68.466774 -58.170494 1 75.128593 -51.646816 2 76.083946 -64.300102 3 77.578056 -76.464218 4 63.180199 -76.067680 5 77.203979 -58.560757 6 66.574913 -60.000214 7 73.218269 -70.978203 8 70.956879 -76.096558 9 65.538872 -76.716568... (19 Replies)
Discussion started by: chrisjorg
19 Replies

8. UNIX for Dummies Questions & Answers

histogram or counter

Dear all, I have numerous dat files (a.dat, b.dat...) containing 500 numeric values each. I would like to count them, based on their range and obtain a histogram or a counter. INPUT: a.dat 1.3 2.16 0.34 ...... b.dat 1.54 0.94 3.13 ..... ... (2 Replies)
Discussion started by: chen.xiao.po
2 Replies

9. UNIX for Dummies Questions & Answers

gnuplot easy histogram

hello experts, I have been trying to plot a histogram of a data like; -54 -56 -43 -65 -67 -78 ... I have 156.000 rows of these kind of values between 0 and -100. I just want to make x axis takes values 5 spacing in between like; -100 -95 -90 .............. 0 And I want y... (4 Replies)
Discussion started by: enes71
4 Replies

10. UNIX for Dummies Questions & Answers

Gnuplot - Histogram Fitting

Hi I am trying to fit my histogram data with a gaussian model and am encountering two problems: 1. I can't seem to fit the histogram data with a model fit y(x) 'bin.txt' using 2:xtic(1) via a,b,c (error: need 2 to 7 using specs) 2. Even when I manually guess the correct parameters for my fit and... (1 Reply)
Discussion started by: goffinj
1 Replies
Login or Register to Ask a Question