AWK counting interval / histogram data

02-15-2012

Registered User

89, 1

Join Date: Oct 2010

Last Activity: 19 July 2017, 8:11 AM EDT

Posts: 89

Thanks Given: 18

Thanked 1 Time in 1 Post

AWK counting interval / histogram data

My data looks like this:
frame phi psi

Code:

0 68.466774 -58.170494
1  75.128593 -51.646816
2 76.083946 -64.300102
3 77.578056  -76.464218
4 63.180199 -76.067680
5 77.203979 -58.560757
6  66.574913 -60.000214
7 73.218269 -70.978203
8 70.956879 -76.096558
9  65.538872 -76.716568
10 57.107117 -67.572067
11 63.389595  -49.936893
12 83.935219 -65.073227
13 78.492310 -69.225609
14  58.567463 -77.028725
15 60.258656 -85.608917
16 80.604012  -68.479416
17 79.839516 -58.189476
18 68.693405 -66.911407
19  48.195873 -56.744625
20 75.479187 -48.657692
21 80.180649  -69.976234
22 71.216110 -70.213730
23 67.672768 -50.655262
24  55.870106 -63.952560
25 65.091850 -59.066532
26 64.395363  -40.585659
27 80.011673 -56.789768
28 74.003281 -69.651680
29  65.848534 -60.928204
30 65.260933 -78.133301
...

I would like to bin this data following the criteria of a bin for phi and psi values.

I.e. my desired output data would be of the form, if we
choose the bins to have width 10.

Code:

phi   psi   count
-180  -180   464
-170 -170   324
-160 -160   133

...

So, for an AWK script I need a command that will consider $2 and $3 in ranges of e.g. bin width 10:
e.g.
$2<=-170&&$2>=-180&&$3<=-170&&$3>=-180
$2<=-160&&$2>=-170&&$3<=-160&&$3>=-170
$2<=-150&&$2>=-160&&$3<=-150&&$3>=-160
$2<=-140&&$2>=-150&&$3<=-140&&$3>=-150
$2<=-130&&$2>=-140&&$3<=-130&&$3>=-140
$2<=-120&&$2>=-130&&$3<=-120&&$3>=-130
$2<=-110&&$2>=-120&&$3<=-110&&$3>=-120
$2<=-100&&$2>=-110&&$3<=-100&&$3>=-110
$2<=-90&&$2>=-100&&$3<=-90&&$3>=-100
...

and for each of these ranges, I wish to bin (count) the number of data points that fall within each interval. Any help here?
I can sort of see how to count in AWK, but how do you discretise the count in intervals of this kind. e.g how do you loop
with 10 units of change between each loop?
Thanks

chrisjorg

View Public Profile for chrisjorg

Find all posts by chrisjorg

02-15-2012

Registered User

23,310, 4,623

Join Date: Aug 2005

Last Activity: 7 July 2020, 11:47 AM EDT

Location: Saskatchewan

Posts: 23,310

Thanks Given: 1,331

Thanked 4,623 Times in 4,217 Posts

I see no values in your output that have anything to do with your input, so I'm left a bit confused.

What about values that don't match? phi is between 80 and 90, and rho is between 70 and 80? should they be ignored?

---------- Post updated at 11:32 AM ---------- Previous update was at 11:29 AM ----------

Based on what I'm guessing you want:

Code:

awk 'BEGIN { MIN=99999999; MAX=-MIN }

{        A=sprintf("%d", $2/10);
          B=sprintf("%d", $3/10);
          if(A == B)
          {
                  BIN[A]++;
                  if(A<MIN) MIN=A;
                  else if(A>MAX) MAX=A;
          }
}
END { for(N=MIN; N<=MAX; N++) print N*10, N*10, BIN[N]; }' inputfile

Corona688

View Public Profile for Corona688

Visit Corona688's homepage!

Find all posts by Corona688

02-15-2012

Registered User

89, 1

Join Date: Oct 2010

Last Activity: 19 July 2017, 8:11 AM EDT

Posts: 89

Thanks Given: 18

Thanked 1 Time in 1 Post

No, they should not be ignored.

Maybe it would simply be easier to simply bin the data like in this Perl script (which only works for binning 1-column arrays like Phi on its own: @list denotes the input array containing Phi, $bin_width is 10.

Code:

sub histogram
{
   my($bin_width, @list) = @_;
   my %histogram;
   $histogram{ceil(($_ + 1) / $bin_width) -1}++ for @list;
   print "%histogram"
   my $max;
   my $min;

   while (my ($key, $value) = each(%histogram))
   {
     $max = $key if !defined($min) || $key > $max;
     $min = $key if !defined($min) || $key < $min;
   }
   for (my $i = $min; $i <= $max; $i++)
   {
     my $bin = sprintf("% 10d", ($i)*$bin_width);
     my $frequency = $histogram{$i} || 0;

     print $bin." ".$frequency."\n";
   }
   print "    Width: ".$bin_width."\n";
   print "    Range: ".$min."-".$max."\n\n";

In this Perl script, we iterate over the hash using two variables called $key and $value. Consider the bin width 10. For an input data value of -173 we perform the ceiling calculation
ceil((-173+1)/10 - 1) =-18
This input number, -173, is located in bin -18 which is $key=-18 and has a $value=1. Then the next time the script locates a value in bin -18, it will augment (++) the $value to 2. etc. so we are binning the data in this way without requiring any selections. I would like to try to extend this script to a more complicated hash with 2 columns (one for phi, one for psi).

Anyway, maybe this helps?

The example output data is just an example of how it could look.

Code:

phi   psi   count
-180  -180   464
-170 -170   324
-160 -160   133

chrisjorg

View Public Profile for chrisjorg

Find all posts by chrisjorg

02-15-2012

Registered User

23,310, 4,623

Join Date: Aug 2005

Last Activity: 7 July 2020, 11:47 AM EDT

Location: Saskatchewan

Posts: 23,310

Thanks Given: 1,331

Thanked 4,623 Times in 4,217 Posts

Okay then:

Code:

awk 'BEGIN { MIN=99999999; MAX=-MIN; OFS="\t"; BINSIZE=10; }

{        A=sprintf("%d", $2/BINSIZE);
          B=sprintf("%d", $3/BINSIZE);
          BIN[A OFS B]++;
}
END { for(X in A) print X, BIN[X]; }' inputfile

Corona688

View Public Profile for Corona688

Visit Corona688's homepage!

Find all posts by Corona688

02-15-2012

Registered User

89, 1

Join Date: Oct 2010

Last Activity: 19 July 2017, 8:11 AM EDT

Posts: 89

Thanks Given: 18

Thanked 1 Time in 1 Post

awk: can't assign to A; it's an array name.
input record number 1, file angles_merge.dat
source line number 3

chrisjorg

View Public Profile for chrisjorg

Find all posts by chrisjorg

02-15-2012

Registered User

23,310, 4,623

Join Date: Aug 2005

Last Activity: 7 July 2020, 11:47 AM EDT

Location: Saskatchewan

Posts: 23,310

Thanks Given: 1,331

Thanked 4,623 Times in 4,217 Posts

Typo.

Code:

awk 'BEGIN { MIN=99999999; MAX=-MIN; OFS="\t"; BINSIZE=10; }

{        A=sprintf("%d", $2/BINSIZE);
          B=sprintf("%d", $3/BINSIZE);
          BIN[A OFS B]++;
}
END { for(X in BIN) print X, BIN[X]; }' inputfile

This User Gave Thanks to Corona688 For This Post:

Corona688

View Public Profile for Corona688

Visit Corona688's homepage!

Find all posts by Corona688

02-16-2012

Registered User

89, 1

Join Date: Oct 2010

Last Activity: 19 July 2017, 8:11 AM EDT

Posts: 89

Thanks Given: 18

Thanked 1 Time in 1 Post

Fantastic, thanks.
What if I wanted to output bins that were not visited?
Currently I am only getting out bins that contain data, but
I would like to include bins that are not visited, is this possible?

In Perl, during the loop, if no data was accrued for a certain bin the
bin is still printed and the frequency is 0.

Code:

   my $frequency = $histogram{$i} || 0;

---------- Post updated at 01:31 PM ---------- Previous update was at 01:28 PM ----------

Currently my data is sort of useless when plotting:

Code:

-170 -140 4
-170 -150 14
-170 -160 46
-170 -170 122
-170 -30 1
-170 -40 7
-170 -50 3
-170 -60 3
-170 120 9
-170 130 83
-170 140 258
-170 150 366
-170 160 384
-170 170 246
-160 -130 4
-160 -140 9
-160 -150 38
-160 -160 164
-160 -170 587
-160 -30 3
-160 -40 4
-160 -50 8
-160 -60 1
-160 100 2
-160 110 13
-160 120 35
-160 130 339
-160 140 1135
-160 150 1903
-160 160 1975
-160 170 1414
-150 -110 3
-150 -120 1
-150 -130 6
-150 -140 26
-150 -150 95
-150 -160 453
-150 -170 1771
-150 -20 3
-150 -30 8
-150 -40 4
-150 -50 10
-150 -60 9

chrisjorg

View Public Profile for chrisjorg

Find all posts by chrisjorg

Shell Programming and Scripting

AWK counting interval / histogram data

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk Sort 2d histogram output from min(X,Y) to max(X,Y)

Discussion started by: chrisjorg

2. Shell Programming and Scripting

Script (ksh) to get data in every 30 mins interval for the given date

Discussion started by: rpm120

3. Shell Programming and Scripting

awk for histogram

Discussion started by: kayak

4. Shell Programming and Scripting

Data counting

Discussion started by: Lucky Ali

5. Shell Programming and Scripting

counting using awk

Discussion started by: Diya123

6. Shell Programming and Scripting

Help- counting delimiter in a huge file and split data into 2 files

Discussion started by: lv99

7. Shell Programming and Scripting

Counting average data per hour

Discussion started by: justbow

8. Shell Programming and Scripting

compare the interval of 2 numbers of input2with interval of several numbers of input1

Discussion started by: repinementer

9. Shell Programming and Scripting

Counting with Awk

Discussion started by: saint2006

10. Shell Programming and Scripting

To extract data of a perticular interval (date-time wise)

Discussion started by: abhishek27