Sponsored Content
Top Forums Shell Programming and Scripting awk - calculation of probability density Post 302438615 by jarowit on Tuesday 20th of July 2010 07:40:21 AM
Old 07-20-2010
awk - calculation of probability density

Hi all!

I have the following problem: I would like to calculate using awk a probability of appearing of a pair of numbers x and y. In other words how frequently do these numbers appear?

In the case of only one integer number x ranged for example from 1 to 100 awk one liner has the form:
Code:
awk 'BEGIN{for(i=1;i<=100;i++) h[i]=0}{h[$1]+=1}END{for(i=1;i<=100;i++) print i, h[i]/NR}' datafile

where datafile contains the number x:
Code:
#x
2
65
100
...

My question is how to extend above awk one-liner for a pair of number x and y? In this case datafiles looks as follows:
Code:
#x   #y
23     15
35     1
23     15
...



Thanks in advance.

Last edited by Franklin52; 07-20-2010 at 01:00 PM.. Reason: Please use code tags
 

10 More Discussions You Might Find Interesting

1. Programming

Calculate scores and probability -- Syntax issue

Hi, I am totally new to C programming on Sun Solaris environment. I am an active member on the UNIX forum and a good shell programmer. I am trying to achieve some calculations in C programming. I have the pseudo code written down but don't know the syntax. I am reading a couple of books on C... (4 Replies)
Discussion started by: madhunk
4 Replies

2. Shell Programming and Scripting

awk calculation

Hallo all, I have a script which creates an output ... see below: root@a7germ:/tmp/pax > cat 20061117.txt 523.047 521.273 521.034 517.367 516.553 517.793 513.114 513.940 I would like to use awk to calculate the (a)total sum of the numbers (b) The average of the numbers. Please... (4 Replies)
Discussion started by: kekanap
4 Replies

3. UNIX for Advanced & Expert Users

Reattemps Calculation using awk

Dear All How are you I have files which look like this : 20080406_12:43:55.779 ISC Sprint- 39 21624032999 218925866728 20080406_12:44:07.811 ISC Sprint- 20 21620241815 218927736810 20080406_12:44:00.485 ISC Sprint- 50 21621910404 218913568053... (0 Replies)
Discussion started by: zanetti321
0 Replies

4. Shell Programming and Scripting

awk calculation problem

I have a list of coordinate data, sampled below. 54555209 784672723 I want it as: 545552.09 7846727.23 Below is my script: BEGIN {FS= " "; OFS= ","} {print $1*.01,$2*.01} This is my outcome: 5.5e7 7.8e8 How do I tell awk that I want to keep all the digits instead of outputting... (1 Reply)
Discussion started by: ndnkyd
1 Replies

5. Solaris

newfs – i where to look for changed inode density

Hi All, While creating the ufs file system with newfs - i where can I see the change, I mean if the density of inode has been increased where I can see it. I tried with fstyp –v <slice> however not sure as where to look for the information. Will appreciate if I can get... (0 Replies)
Discussion started by: kumarmani
0 Replies

6. Shell Programming and Scripting

Calculation in Multiple files using awk

Hi All, I have some 10 files named samp1.csv, samp2.csv,... samp10.csv Each file having the same number of fields like, Count, field1, field2, field3. And a source.csv file which has three fields field1, field2, field3. Now, i want to find the total count by taking the field1,... (8 Replies)
Discussion started by: johnwilliams.sp
8 Replies

7. Programming

arithmetic calculation using awk

hi there again, i need to do a simple division with my data with a number of rows. i think i wanted to have a simple output like this one: col1 col2 col3 val1 val2 val1/val2 valn valm valn/valm any suggestion is very much appreciated. thanks much. (2 Replies)
Discussion started by: ida1215
2 Replies

8. Programming

awk script for finding probability of distribution of numbers

Dear All I am having data file containing 0 to 40,000 like this... 0 5 1 65 2 159 3 356 ... ... 40000 19 I want to find the probability of distribution between the numbers. The second column values are angles from 0 to 360 and the 1st column is number of files. I am expecting... (2 Replies)
Discussion started by: bala06
2 Replies

9. Shell Programming and Scripting

awk split and awk calculation in the same command

I am trying to run the awk below. My question is when I split the input, then run anotherawk to perform a calculation using that splitas the input there are no issues. When I try to combine them the output is not correct, is the split not working or did I do it wrong? Thank you :). input ... (8 Replies)
Discussion started by: cmccabe
8 Replies

10. Shell Programming and Scripting

awk calculation with zero as N/A

In the below awk, I am trying to calculate percent for a given id. It is very close the problem is when the # being used in the calculation is zero. I am not sure how to code this condition into the awk as it happens frequently. The portion in italics was an attempt but that lead to an error. Thank... (13 Replies)
Discussion started by: cmccabe
13 Replies
STDA(1) 							   User Commands							   STDA(1)

NAME
stda - Simple Tools for Data Analysis (STDA) DESCRIPTION
STDA includes some primary tools for data analysis. You can evaluate sums, averages, integrals, derivatives, histograms or probability dis- tribution functions of 1-d data, and eventually plot the results. The programs are stand-alone tools (supporting the standard UNIX input and output pipelines) intended for data processing from the command line. It should be noted that all but one of the scripts use awk and core system utilities. For plotting you have to install Gnuplot (see http://gnuplot.info) since 'muplot' is a wrapper around it. In sum- mary, the package provides utilities for straightforward analysis of data series where a complex analytical approach is not needed and where an ultimate numerical precision with floating-point numbers is not critical. Some general examples of application cases include eval- uating usage statistics from server logfiles, determining a response time distribution from a series of queries to a [remote] service, pro- ducing a plot from multiple data files, etc. This software should be considered as an open project to be extended with new command-line driven utilities helpful for performing common data analysis tasks. Any contributions and suggestions are welcome. Following programs are included in the distribution: * maphimbu - histogram builder for 1-d numerical and text data * mintegrate - average/sum/integral/derivative of 1-d numerical data * mmval - find minimum and maximum value in a data set * muplot - plot a multi-curve figure from multiple data by using Gnuplot * nnum - produce a series of equally separated integers or floats * prefield - prepare input file for 'muplot' to plot 2-d fields by arrows EXAMPLES
- Evaluate the current apache2 logfile and make an unique list of the hostnames (respectively ip-addresses) sorted by the total number of their http requests: maphimbu -rs2 /var/log/apache2/access.log - On a X terminal plot the probability function and the cumulative distribution function of a sin(x) data sample: nnum -3.14159 3.14159 0.00001 %.6g |awk '{ print $1, sin($1) }' | maphimbu -d0.01 -x2 -ns1 |mintegrate -d0.01 -x1 -y3 -S |muplot lp - 1:3,4 COPYRIGHT
Copyright (C) 2009, 2011-2012 Dimitar Ivanov <dimitar.ivanov@mirendom.net> License: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. stda 1.1.1 February 2012 STDA(1)
All times are GMT -4. The time now is 11:18 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy