awk - calculation of probability density


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk - calculation of probability density
# 1  
Old 07-20-2010
awk - calculation of probability density

Hi all!

I have the following problem: I would like to calculate using awk a probability of appearing of a pair of numbers x and y. In other words how frequently do these numbers appear?

In the case of only one integer number x ranged for example from 1 to 100 awk one liner has the form:
Code:
awk 'BEGIN{for(i=1;i<=100;i++) h[i]=0}{h[$1]+=1}END{for(i=1;i<=100;i++) print i, h[i]/NR}' datafile

where datafile contains the number x:
Code:
#x
2
65
100
...

My question is how to extend above awk one-liner for a pair of number x and y? In this case datafiles looks as follows:
Code:
#x   #y
23     15
35     1
23     15
...



Thanks in advance.

Last edited by Franklin52; 07-20-2010 at 01:00 PM.. Reason: Please use code tags
# 2  
Old 07-20-2010
something like this:

Code:
#  cat infile
23 15
35 1
23 15

#  awk '{h[$1" "$2]++}END{for (i in h){print i,h[i]/NR}}' infile
35 1 0.333333
23 15 0.666667

HTH
This User Gave Thanks to Tytalus For This Post:
# 3  
Old 07-21-2010
How to extend your one-liner to the case where non-integer numbers are present in the infile?

I was trying with this:

awk '{h[int($1/10)" "int($2/10)]++}END{for (i in h){print i*10,h[i]/NR}}' infile

but it does not work.

Last edited by jarowit; 07-21-2010 at 12:11 PM..
# 4  
Old 07-21-2010
It would work as is, if you want to group by values like 2.31313 and 2.31314, which may not be very useful - depends on the analysis you need to do. Otherwise you want to truncate decimals e.g., 2.31313 -> 2.31
Code:
awk '{h[sprintf("%.2f",$1) " " sprintf("%.2f",$2)]++}END{for (i in h){print i,h[i]/NR}}' infile

sprintf("%f.2", number) rounds a real to 2 decimals.
# 5  
Old 08-03-2010
How to change this formula in a such a way that it will return not only propabilty - h[i] but also the pair of number for which h[i] correspond ?
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk calculation with zero as N/A

In the below awk, I am trying to calculate percent for a given id. It is very close the problem is when the # being used in the calculation is zero. I am not sure how to code this condition into the awk as it happens frequently. The portion in italics was an attempt but that lead to an error. Thank... (13 Replies)
Discussion started by: cmccabe
13 Replies

2. Shell Programming and Scripting

awk split and awk calculation in the same command

I am trying to run the awk below. My question is when I split the input, then run anotherawk to perform a calculation using that splitas the input there are no issues. When I try to combine them the output is not correct, is the split not working or did I do it wrong? Thank you :). input ... (8 Replies)
Discussion started by: cmccabe
8 Replies

3. Programming

awk script for finding probability of distribution of numbers

Dear All I am having data file containing 0 to 40,000 like this... 0 5 1 65 2 159 3 356 ... ... 40000 19 I want to find the probability of distribution between the numbers. The second column values are angles from 0 to 360 and the 1st column is number of files. I am expecting... (2 Replies)
Discussion started by: bala06
2 Replies

4. Programming

arithmetic calculation using awk

hi there again, i need to do a simple division with my data with a number of rows. i think i wanted to have a simple output like this one: col1 col2 col3 val1 val2 val1/val2 valn valm valn/valm any suggestion is very much appreciated. thanks much. (2 Replies)
Discussion started by: ida1215
2 Replies

5. Shell Programming and Scripting

Calculation in Multiple files using awk

Hi All, I have some 10 files named samp1.csv, samp2.csv,... samp10.csv Each file having the same number of fields like, Count, field1, field2, field3. And a source.csv file which has three fields field1, field2, field3. Now, i want to find the total count by taking the field1,... (8 Replies)
Discussion started by: johnwilliams.sp
8 Replies

6. Solaris

newfs – i where to look for changed inode density

Hi All, While creating the ufs file system with newfs - i where can I see the change, I mean if the density of inode has been increased where I can see it. I tried with fstyp –v <slice> however not sure as where to look for the information. Will appreciate if I can get... (0 Replies)
Discussion started by: kumarmani
0 Replies

7. Shell Programming and Scripting

awk calculation problem

I have a list of coordinate data, sampled below. 54555209 784672723 I want it as: 545552.09 7846727.23 Below is my script: BEGIN {FS= " "; OFS= ","} {print $1*.01,$2*.01} This is my outcome: 5.5e7 7.8e8 How do I tell awk that I want to keep all the digits instead of outputting... (1 Reply)
Discussion started by: ndnkyd
1 Replies

8. UNIX for Advanced & Expert Users

Reattemps Calculation using awk

Dear All How are you I have files which look like this : 20080406_12:43:55.779 ISC Sprint- 39 21624032999 218925866728 20080406_12:44:07.811 ISC Sprint- 20 21620241815 218927736810 20080406_12:44:00.485 ISC Sprint- 50 21621910404 218913568053... (0 Replies)
Discussion started by: zanetti321
0 Replies

9. Shell Programming and Scripting

awk calculation

Hallo all, I have a script which creates an output ... see below: root@a7germ:/tmp/pax > cat 20061117.txt 523.047 521.273 521.034 517.367 516.553 517.793 513.114 513.940 I would like to use awk to calculate the (a)total sum of the numbers (b) The average of the numbers. Please... (4 Replies)
Discussion started by: kekanap
4 Replies

10. Programming

Calculate scores and probability -- Syntax issue

Hi, I am totally new to C programming on Sun Solaris environment. I am an active member on the UNIX forum and a good shell programmer. I am trying to achieve some calculations in C programming. I have the pseudo code written down but don't know the syntax. I am reading a couple of books on C... (4 Replies)
Discussion started by: madhunk
4 Replies
Login or Register to Ask a Question