AWK counting interval / histogram data

02-17-2012

Registered User

3,231, 978

Join Date: Dec 2009

Last Activity: 11 June 2014, 8:40 PM EDT

Posts: 3,231

Thanks Given: 179

Thanked 978 Times in 791 Posts

AWK's arrays aren't actually multidimensional. The index comma notation simply concatenates its arguments using the string value in the variable SUBSEP (the default value can vary from one AWK implementation to another). ar[A,B] is equivalent to ar[A SUBSEP B]. So if you only want the member whose index is the value of A, simply use ar[A].

Regards,
Alister

alister

View Public Profile for alister

Find all posts by alister

02-17-2012

Registered User

23,310, 4,623

Join Date: Aug 2005

Last Activity: 7 July 2020, 11:47 AM EDT

Location: Saskatchewan

Posts: 23,310

Thanks Given: 1,331

Thanked 4,623 Times in 4,217 Posts

You don't.

You can split it apart into A and B by split() ing on SUBSEB, like

Code:

for(X in ARR)
{
        split(X, Y, SUBSEP);
        A=Y[1];
        B=Y[2];
}

---------- Post updated at 01:31 PM ---------- Previous update was at 01:29 PM ----------

If this is related to your other thread involving awk multidimensional arrays, please post there. I just made a reply to you there in fact.

Corona688

View Public Profile for Corona688

Visit Corona688's homepage!

Find all posts by Corona688

02-17-2012

Registered User

23,310, 4,623

Join Date: Aug 2005

Last Activity: 7 July 2020, 11:47 AM EDT

Location: Saskatchewan

Posts: 23,310

Thanks Given: 1,331

Thanked 4,623 Times in 4,217 Posts

You know, it's entirely possible I still don't understand what you really want. You never posted any matching output data for your input data, and the logic you posted wasn't what you wanted either, so I'm still guessing.

Please post a sample of output data that matches your input data so I can properly see the correlation.

Quote:

Originally Posted by chrisjorg

Ok,
but why do you not do like in the first script:

Code:

for (X in BIN) print X, BIN[X]

[/quote] Because you don't get your zeroes that way. If you set A[1]=5, and A[9]=6, for(X in A) gets you 1, then 9, not 1 2 3 4 5 6 7 8 9. awk arrays are sparse.

Quote:

Would you not need to correlate your variables A, B to the array BIN?

They are correlated to the array:

Code:

awk 'BEGIN { AMIN=99999999; AMAX=-AMIN; BMIN=AMIN; BMAX=AMAX; OFS="\t"; BINSIZE=10; }

{        A=sprintf("%d", $2/BINSIZE);
          B=sprintf("%d", $3/BINSIZE);
          if(A<AMIN) AMIN=A; if(A>AMAX) AMAX=A;
          if(B<BMIN) BMIN=B; if(B>BMAX) BMAX=B;
          BIN[A,B]++;
}
END { for(A=AMIN; A<=AMAX; A++)
         for(B=BMIN; B<=BMAX; B++)
         print A*BINSIZE, B*BINSIZE, BIN[A,B]; }' inputfile

But because you want zero values, I have to loop through everything that might be in the array, not just everything that is. So I record the lowest and highest values of the array indexes, and print everything between them. Everything completely absent becomes a zero. This also makes the output ordered exactly how you want, where it might not be otherwise.

Hmm. Another thing, it might be printing blanks instead of zeroes for empty bins. Adding a zero to it should force it into a number no matter what.

Code:

awk 'BEGIN { AMIN=99999999; AMAX=-AMIN; BMIN=AMIN; BMAX=AMAX; BINSIZE=10; OFS="\t" }

{        A=sprintf("%d", $2/BINSIZE);
          B=sprintf("%d", $3/BINSIZE);
          if(A<AMIN) AMIN=A; if(A>AMAX) AMAX=A;
          if(B<BMIN) BMIN=B; if(B>BMAX) BMAX=B;
          BIN[A,B]++;
}
END { for(A=AMIN; A<=AMAX; A++)
         for(B=BMIN; B<=BMAX; B++)
         print A*BINSIZE, B*BINSIZE, BIN[A,B]+0 }' inputfile

---------- Post updated at 01:33 PM ---------- Previous update was at 01:13 PM ----------

Okay, from your question in your other thread, I'm guessing that for each rho-range, you want phi values between the minimum and maximum for that range, not the global maximums in general?

Please confirm or deny.

Or better yet, show example output!!

---------- Post updated at 01:43 PM ---------- Previous update was at 01:33 PM ----------

This should only print zeroes between two values, not extra ones at the 'edges'. I test explicitly for "" to avoid the 999999 nonsense...

Code:

awk 'BEGIN { BINSIZE=10; OFS="\t" }

function MIN(A, D)
{
        if(A == "") return(D);
        else if(A > D) return(D);
        else return(A);
}

function MAX(A, D)
{
        if(A == "") return(D);
        else if(A > D) return(A);
        else return(D);
}

{        A=sprintf("%d", $2/BINSIZE);
          B=sprintf("%d", $3/BINSIZE);
          AMIN=MIN(AMIN, A);  AMAX=MAX(AMAX, A);

          MINB[A]=MIN(MINB[A], B);
          MAXB[A]=MAX(MAXB[A], B);
          BIN[A,B]++;
}
END { for(A=AMIN;  A<=AMAX; A++)
          {
                  if(MINB[A] == "") continue; # No data
                  for(B=MINB[A]; A<=MAXB[A]; B++)
                          print A*BINSIZE, B*BINSIZE, BIN[A,B]+0;
          }
 }' inputfile

Last edited by Corona688; 02-17-2012 at 03:55 PM..

Corona688

View Public Profile for Corona688

Visit Corona688's homepage!

Find all posts by Corona688

02-17-2012

Registered User

89, 1

Join Date: Oct 2010

Last Activity: 19 July 2017, 8:11 AM EDT

Posts: 89

Thanks Given: 18

Thanked 1 Time in 1 Post

Ok, sorry, this was in no way a criticism. I think you have done a marvelous job at trying to decipher my poor description of the problem.

Currently, with the first script you posted:
For the data input:

Code:

-179.995483     132.155258
-179.995483     132.155258
-179.986374     153.868210
-179.986374     153.868210
-179.925522     149.994141
-179.894913     -176.379990
-179.894913     -176.379990
-179.888428     159.134262
-179.790649     158.782471
-179.790649     158.782471
-179.768814     146.420975
-179.768814     146.420975
-179.701813     148.886353
-179.685852     177.829773
-179.685852     177.829773
-179.670364     161.292084
-179.634399     161.466721
-179.634399     161.466721
-179.631607     164.796097
-179.631607     164.796097
-179.595261     143.675720
-179.595261     143.675720
-179.549637     161.132858
-179.549637     161.132858
-179.504288     -40.797535
179.801575      -172.905792
179.801575      -172.905792
179.881226      133.914032
179.881226      133.914032
179.910248      159.141998
179.910248      159.141998
179.942413      130.512344
179.942413      130.512344
179.969635      164.739243

The data output is:

Code:

-17     15      5
-17     16      7
-17     17      2
17      13      4
0       0       1
17      15      2
17      16      1
-17     -17     2
-17     -4      1
17      -17     2
-17     13      2
-17     14      6

With the second script you posted using the same input data:

Output

Code:

-170    -170    2
-170    -160
-170    -150
-170    -140
-170    -130
-170    -120
-170    -110
-170    -100
-170    -90
-170    -80
-170    -70
-170    -60
-170    -50
-170    -40     1
-170    -30
-170    -20
-170    -10
-170    0
-170    10
-160    -170
-160    -160
-160    -150
-160    -140
-160    -130
-160    -120
-160    -110
-160    -100
-160    -90
-160    -80
-160    -70
-160    -60
-160    -50
-160    -40
-160    -30
-160    -20
-160    -10
-160    0
-160    10
-150    -170
-150    -160
-150    -150
-150    -140
-150    -130
-150    -120
-150    -110
-150    -100
-150    -90
-150    -80
-150    -70
-150    -60
-150    -50
-150    -40
-150    -30
-150    -20
-150    -10
-150    0
-150    10
-140    -170
-140    -160
-140    -150
-140    -140
-140    -130
-140    -120
-140    -110
-140    -100
-140    -90
-140    -80
-140    -70
-140    -60
-140    -50
-140    -40
-140    -30
-140    -20
-140    -10
-140    0
-140    10
-130    -170
-130    -160
-130    -150
-130    -140
-130    -130
-130    -120
-130    -110
-130    -100
-130    -90
-130    -80
-130    -70
-130    -60
-130    -50
-130    -40
-130    -30
-130    -20
-130    -10
-130    0
-130    10
-120    -170
-120    -160
-120    -150
-120    -140
-120    -130
-120    -120
-120    -110
-120    -100
-120    -90
-120    -80
-120    -70
-120    -60
-120    -50
-120    -40
-120    -30
-120    -20
-120    -10
-120    0
-120    10
-110    -170
-110    -160
-110    -150
-110    -140
-110    -130
-110    -120
-110    -110
-110    -100
-110    -90
-110    -80
-110    -70
-110    -60
-110    -50
-110    -40
-110    -30
-110    -20
-110    -10
-110    0
-110    10
-100    -170
-100    -160
-100    -150
-100    -140
-100    -130
-100    -120
-100    -110
-100    -100
-100    -90
-100    -80
-100    -70
-100    -60
-100    -50
-100    -40
-100    -30
-100    -20
-100    -10
-100    0
-100    10
-90     -170
-90     -160
-90     -150
-90     -140
-90     -130
-90     -120
-90     -110
-90     -100
-90     -90
-90     -80
-90     -70
-90     -60
-90     -50
-90     -40
-90     -30
-90     -20
-90     -10
-90     0
-90     10
-80     -170
-80     -160
-80     -150
-80     -140
-80     -130
-80     -120
-80     -110
-80     -100
-80     -90
-80     -80
-80     -70
-80     -60
-80     -50
-80     -40
-80     -30
-80     -20
-80     -10
-80     0
-80     10
-70     -170
-70     -160
-70     -150
-70     -140
-70     -130
-70     -120
-70     -110
-70     -100
-70     -90
-70     -80
-70     -70
-70     -60
-70     -50
-70     -40
-70     -30
-70     -20
-70     -10
-70     0
-70     10
-60     -170
-60     -160
-60     -150
-60     -140
-60     -130
-60     -120
-60     -110
-60     -100
-60     -90
-60     -80
-60     -70
-60     -60
-60     -50
-60     -40
-60     -30
-60     -20
-60     -10
-60     0
-60     10
-50     -170
-50     -160
-50     -150
-50     -140
-50     -130
-50     -120
-50     -110
-50     -100
-50     -90
-50     -80
-50     -70
-50     -60
-50     -50
-50     -40
-50     -30
-50     -20
-50     -10
-50     0
-50     10
-40     -170
-40     -160
-40     -150
-40     -140
-40     -130
-40     -120
-40     -110
-40     -100
-30     -150
-30     -140
-30     -130
-30     -120
-30     -110
-30     -100
-30     -90
-30     -80
-30     -70
-30     -60
-30     -50
-30     -40
-30     -30
-30     -20
-30     -10
-30     0
-30     10
-20     -170
-20     -160
-20     -150
-20     -140
-20     -130
-20     -120
-20     -110
-20     -100
-20     -90
-20     -80
-20     -70
-20     -60
-20     -50
-20     -40
-20     -30
-20     -20
-20     -10
-20     0
-20     10
-10     -170
-10     -160
-10     -150
-10     -140
-10     -130
-10     -120
-10     -110
-10     -100
-10     -90
-10     -80
-10     -70
-10     -60
-10     -50
-10     -40
-10     -30
-10     -20
-10     -10
-10     0
-10     10
0       -170
0       -160
0       -150
0       -140
0       -130
0       -120
0       -110
0       -100
0       -90
0       -80
0       -70
0       -60
0       -50
0       -40
0       -30
0       -20
0       -10
0       0       1
0       10
10      -170
10      -160
10      -150
10      -140
10      -130
10      -120
10      -110
10      -100
10      -90
10      -80
10      -70
10      -60
10      -50
10      -40
10      -30
10      -20
10      -10
10      0
10      10

---------- Post updated at 03:21 PM ---------- Previous update was at 03:17 PM ----------

With the +0 addition to script two:

Code:

-170    -170    2
-170    -160    0
-170    -150    0
-170    -140    0
-170    -130    0
-170    -120    0
-170    -110    0
-170    -100    0
-170    -90     0
-170    -80     0
-170    -70     0
-170    -60     0
-170    -50     0
-170    -40     1
-170    -30     0
-170    -20     0
-170    -10     0
-170    0       0
-170    10      0
-160    -170    0
-160    -160    0
-160    -150    0
-160    -140    0
-160    -130    0
-160    -120    0
-160    -110    0
-160    -100    0
-160    -90     0
-160    -80     0
-160    -70     0
-160    -60     0
-160    -50     0
-160    -40     0
-160    -30     0
-160    -20     0
-160    -10     0
-160    0       0
-160    10      0
-150    -170    0
-150    -160    0
-150    -150    0
-150    -140    0
-150    -130    0
-150    -120    0
-150    -110    0
-150    -100    0
-150    -90     0
-150    -80     0
-150    -70     0
-150    -60     0
-150    -50     0
-150    -40     0
-150    -30     0
-150    -20     0
-150    -10     0
-150    0       0
-150    10      0
-140    -170    0
-140    -160    0
-140    -150    0
-140    -140    0
-140    -130    0
-140    -120    0
-140    -110    0
-140    -100    0
-140    -90     0
-140    -80     0
-140    -70     0
-140    -60     0
-140    -50     0
-140    -40     0
-140    -30     0
-140    -20     0
-140    -10     0
-140    0       0
-140    10      0
-130    -170    0
-130    -160    0
-130    -150    0
-130    -140    0
-130    -130    0
-130    -120    0
-130    -110    0
-130    -100    0
-130    -90     0
-130    -80     0
-130    -70     0
-130    -60     0
-130    -50     0
-130    -40     0
-130    -30     0

chrisjorg

View Public Profile for chrisjorg

Find all posts by chrisjorg

02-17-2012

Registered User

23,310, 4,623

Join Date: Aug 2005

Last Activity: 7 July 2020, 11:47 AM EDT

Location: Saskatchewan

Posts: 23,310

Thanks Given: 1,331

Thanked 4,623 Times in 4,217 Posts

I'm not upset about any criticism. My program did have a lot of bugs, but that's unfortunately to be expected when I have no way to test them -- I can run them, but so what, they work to my understanding of your question, not yours. Without a good example of what it should be outputting I've been grasping at straws. Three bad guesses and counting.

The example output you originally posted had nothing to do with the example input, so wasn't much help; all that demonstrates is that you want it printing rows of three numbers, not which three numbers.

Please save us both days of trouble by picking(or inventing) a few lines of input that clearly illustrate the logic you want, work them out in your head, and show what they'd output. Please.

Last edited by Corona688; 02-17-2012 at 04:36 PM..

Corona688

View Public Profile for Corona688

Visit Corona688's homepage!

Find all posts by Corona688

02-17-2012

Registered User

89, 1

Join Date: Oct 2010

Last Activity: 19 July 2017, 8:11 AM EDT

Posts: 89

Thanks Given: 18

Thanked 1 Time in 1 Post

I mean, what I posted before is exactly what I want.
But no worries, I am quite satisfied with the first script you wrote.
Let us just leave it here.

Thanks for all the help!

The last modified script you posted is
good except it doesn't stop looping over B:
Code

Code:

awk 'BEGIN { BINSIZE=10; OFS="\t" }

function MIN(A, D)
{
        if(A == "") return(D);
        else if(A > D) return(D);
        else return(A);
}

function MAX(A, D)
{
        if(A == "") return(D);
        else if(A > D) return(A);
        else return(D);
}

{        A=sprintf("%d", $2/BINSIZE);
          B=sprintf("%d", $3/BINSIZE);
          AMIN=MIN(AMIN, A);  AMAX=MAX(AMAX, A);

          MINB[A]=MIN(MINB[A], B);
          MAXB[A]=MAX(MAXB[A], B);
          BIN[A,B]++;
}
END { for(A=AMIN;  A<=AMAX; A++)
          {
                  if(MINB[A] == "") continue; # No data
                  for(B=MINB[A]; A<=MAXB[A]; B++)
                          print A*BINSIZE, B*BINSIZE, BIN[A,B]+0;
          }
 }' inputfile

Input

Code:

-179.995483     132.155258
-179.995483     132.155258
-179.986374     153.868210
-179.986374     153.868210
-179.925522     149.994141
-179.894913     -176.379990
-179.894913     -176.379990
-179.888428     159.134262
-179.790649     158.782471
-179.790649     158.782471
-179.768814     146.420975
-179.768814     146.420975
-179.701813     148.886353
-179.685852     177.829773
-179.685852     177.829773
-179.670364     161.292084
-179.634399     161.466721
-179.634399     161.466721
-179.631607     164.796097
-179.631607     164.796097
-179.595261     143.675720
-179.595261     143.675720
-179.549637     161.132858
-179.549637     161.132858
-179.504288     -40.797535
179.801575      -172.905792
179.801575      -172.905792
179.881226      133.914032
179.881226      133.914032
179.910248      159.141998
179.910248      159.141998
179.942413      130.512344
179.942413      130.512344
179.969635      164.739243

Output:

Code:

-170    -170    2
-170    -160    0
-170    -150    0
-170    -140    0
-170    -130    0
-170    -120    0
-170    -110    0
-170    -100    0
-170    -90     0
-170    -80     0
-170    -70     0
-170    -60     0
-170    -50     0
-170    -40     1
-170    -30     0
-170    -20     0
-170    -10     0
-170    0       0
-170    10      0
-170    20      0
-170    30      0
-170    40      0
-170    50      0
-170    60      0
-170    70      0
-170    80      0
-170    90      0
-170    100     0
-170    110     0
-170    120     0
-170    130     2
-170    140     6
-170    150     5
-170    160     7
-170    170     2
-170    180     0
-170    190     0
-170    200     0
-170    210     0
-170    220     0
-170    230     0
-170    240     0
-170    250     0
-170    260     0
-170    270     0
-170    280     0
-170    290     0
-170    300     0
-170    310     0
-170    320     0
-170    330     0
-170    340     0
-170    350     0
-170    360     0
-170    370     0
-170    380     0
-170    390     0
-170    400     0
-170    410     0
-170    420     0
-170    430     0
-170    440     0
-170    450     0
-170    460     0
-170    470     0
-170    480     0
-170    490     0
-170    500     0
-170    510     0
-170    520     0
-170    530     0
-170    540     0
-170    550     0
-170    560     0
-170    570     0
-170    580     0
-170    590     0
-170    600     0
-170    610     0
-170    620     0
-170    630     0
-170    640     0
-170    650     0
-170    660     0
-170    670     0
-170    680     0
-170    690     0
-170    700     0
and doesn't stop until 99999999999999999

---------- Post updated at 05:15 PM ---------- Previous update was at 05:12 PM ----------

---------- Post updated at 05:16 PM ---------- Previous update was at 05:15 PM ----------

Going back to script.awk

Code:

BEGIN { MIN=99999999; MAX=-MIN; OFS="\t"; BINSIZE=10; }

{         A=sprintf("%d", $1/BINSIZE);
          B=sprintf("%d", $2/BINSIZE);
          BIN[A OFS B]++;
}
END { for(X in BIN) print X, BIN[X]; }

I would like my data to have a newline each time
a new bin is output, is this possible?
Example output

Code:

-170 -140 4
-170 -150 14
-170 -160 46
-170 -170 122
-170 -30 1
-170 -40 7
-170 -50 3
-170 -60 3
-170 120 9
-170 130 83
-170 140 258
-170 150 366
-170 160 384
-170 170 246

-160 -130 4
-160 -140 9
-160 -150 38
-160 -160 164
-160 -170 587
-160 -30 3
-160 -40 4
-160 -50 8
-160 -60 1
-160 100 2
-160 110 13
-160 120 35
-160 130 339
-160 140 1135
-160 150 1903
-160 160 1975
-160 170 1414

-150 -110 3
-150 -120 1
-150 -130 6
-150 -140 26
-150 -150 95
-150 -160 453
-150 -170 1771
-150 -20 3
-150 -30 8
-150 -40 4
-150 -50 10
-150 -60 9
-150 -80 2
-150 100 6
-150 110 35
-150 120 184
-150 130 795
-150 140 2649
-150 150 5267
-150 160 5897
-150 170 4198

-140 -10 3
-140 -100 6
-140 -110 5
-140 -120 7
-140 -130 26
-140 -140 56
-140 -150 203
-140 -160 874
-140 -170 3168
-140 -20 4
-140 -30 7
-140 -40 15
-140 -50 11
-140 -60 9
-140 -70 1
-140 -80 4
-140 -90 10
-140 10 2
-140 100 31
-140 110 90
-140 120 408
-140 130 1434
-140 140 4305
-140 150 7987
-140 160 10015
-140 170 7310
-140 60 2
-140 80 2
-140 90 13

-130 -10 7
-130 -100 2

---------- Post updated at 05:15 PM ---------- Previous update was at 05:15 PM ----------

If this is too complicated, then don't worry about it!

chrisjorg

View Public Profile for chrisjorg

Find all posts by chrisjorg

Shell Programming and Scripting

AWK counting interval / histogram data

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk Sort 2d histogram output from min(X,Y) to max(X,Y)

Discussion started by: chrisjorg

2. Shell Programming and Scripting

Script (ksh) to get data in every 30 mins interval for the given date

Discussion started by: rpm120

3. Shell Programming and Scripting

awk for histogram

Discussion started by: kayak

4. Shell Programming and Scripting

Data counting

Discussion started by: Lucky Ali

5. Shell Programming and Scripting

counting using awk

Discussion started by: Diya123

6. Shell Programming and Scripting

Help- counting delimiter in a huge file and split data into 2 files

Discussion started by: lv99

7. Shell Programming and Scripting

Counting average data per hour

Discussion started by: justbow

8. Shell Programming and Scripting

compare the interval of 2 numbers of input2with interval of several numbers of input1

Discussion started by: repinementer

9. Shell Programming and Scripting

Counting with Awk

Discussion started by: saint2006

10. Shell Programming and Scripting

To extract data of a perticular interval (date-time wise)

Discussion started by: abhishek27