AWK counting interval / histogram data


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting AWK counting interval / histogram data
# 15  
Old 02-17-2012
AWK's arrays aren't actually multidimensional. The index comma notation simply concatenates its arguments using the string value in the variable SUBSEP (the default value can vary from one AWK implementation to another). ar[A,B] is equivalent to ar[A SUBSEP B]. So if you only want the member whose index is the value of A, simply use ar[A].

Regards,
Alister
# 16  
Old 02-17-2012
You don't.

You can split it apart into A and B by split() ing on SUBSEB, like

Code:
for(X in ARR)
{
        split(X, Y, SUBSEP);
        A=Y[1];
        B=Y[2];
}

---------- Post updated at 01:31 PM ---------- Previous update was at 01:29 PM ----------

If this is related to your other thread involving awk multidimensional arrays, please post there. I just made a reply to you there in fact.
# 17  
Old 02-17-2012
You know, it's entirely possible I still don't understand what you really want. You never posted any matching output data for your input data, and the logic you posted wasn't what you wanted either, so I'm still guessing.

Please post a sample of output data that matches your input data so I can properly see the correlation.

Quote:
Originally Posted by chrisjorg
Ok,
but why do you not do like in the first script:
Code:
for (X in BIN) print X, BIN[X]

[/quote] Because you don't get your zeroes that way. If you set A[1]=5, and A[9]=6, for(X in A) gets you 1, then 9, not 1 2 3 4 5 6 7 8 9. awk arrays are sparse.
Quote:
Would you not need to correlate your variables A, B to the array BIN?
They are correlated to the array:

Code:
awk 'BEGIN { AMIN=99999999; AMAX=-AMIN; BMIN=AMIN; BMAX=AMAX; OFS="\t"; BINSIZE=10; }

{        A=sprintf("%d", $2/BINSIZE);
          B=sprintf("%d", $3/BINSIZE);
          if(A<AMIN) AMIN=A; if(A>AMAX) AMAX=A;
          if(B<BMIN) BMIN=B; if(B>BMAX) BMAX=B;
          BIN[A,B]++;
}
END { for(A=AMIN; A<=AMAX; A++)
         for(B=BMIN; B<=BMAX; B++)
         print A*BINSIZE, B*BINSIZE, BIN[A,B]; }' inputfile

But because you want zero values, I have to loop through everything that might be in the array, not just everything that is. So I record the lowest and highest values of the array indexes, and print everything between them. Everything completely absent becomes a zero. This also makes the output ordered exactly how you want, where it might not be otherwise.

Hmm. Another thing, it might be printing blanks instead of zeroes for empty bins. Adding a zero to it should force it into a number no matter what.

Code:
awk 'BEGIN { AMIN=99999999; AMAX=-AMIN; BMIN=AMIN; BMAX=AMAX; BINSIZE=10; OFS="\t" }

{        A=sprintf("%d", $2/BINSIZE);
          B=sprintf("%d", $3/BINSIZE);
          if(A<AMIN) AMIN=A; if(A>AMAX) AMAX=A;
          if(B<BMIN) BMIN=B; if(B>BMAX) BMAX=B;
          BIN[A,B]++;
}
END { for(A=AMIN; A<=AMAX; A++)
         for(B=BMIN; B<=BMAX; B++)
         print A*BINSIZE, B*BINSIZE, BIN[A,B]+0 }' inputfile

---------- Post updated at 01:33 PM ---------- Previous update was at 01:13 PM ----------

Okay, from your question in your other thread, I'm guessing that for each rho-range, you want phi values between the minimum and maximum for that range, not the global maximums in general?

Please confirm or deny.

Or better yet, show example output!!

---------- Post updated at 01:43 PM ---------- Previous update was at 01:33 PM ----------

This should only print zeroes between two values, not extra ones at the 'edges'. I test explicitly for "" to avoid the 999999 nonsense...
Code:
awk 'BEGIN { BINSIZE=10; OFS="\t" }

function MIN(A, D)
{
        if(A == "") return(D);
        else if(A > D) return(D);
        else return(A);
}

function MAX(A, D)
{
        if(A == "") return(D);
        else if(A > D) return(A);
        else return(D);
}

{        A=sprintf("%d", $2/BINSIZE);
          B=sprintf("%d", $3/BINSIZE);
          AMIN=MIN(AMIN, A);  AMAX=MAX(AMAX, A);

          MINB[A]=MIN(MINB[A], B);
          MAXB[A]=MAX(MAXB[A], B);
          BIN[A,B]++;
}
END { for(A=AMIN;  A<=AMAX; A++)
          {
                  if(MINB[A] == "") continue; # No data
                  for(B=MINB[A]; A<=MAXB[A]; B++)
                          print A*BINSIZE, B*BINSIZE, BIN[A,B]+0;
          }
 }' inputfile


Last edited by Corona688; 02-17-2012 at 03:55 PM..
# 18  
Old 02-17-2012
Ok, sorry, this was in no way a criticism. I think you have done a marvelous job at trying to decipher my poor description of the problem.

Currently, with the first script you posted:
For the data input:
Code:
-179.995483     132.155258
-179.995483     132.155258
-179.986374     153.868210
-179.986374     153.868210
-179.925522     149.994141
-179.894913     -176.379990
-179.894913     -176.379990
-179.888428     159.134262
-179.790649     158.782471
-179.790649     158.782471
-179.768814     146.420975
-179.768814     146.420975
-179.701813     148.886353
-179.685852     177.829773
-179.685852     177.829773
-179.670364     161.292084
-179.634399     161.466721
-179.634399     161.466721
-179.631607     164.796097
-179.631607     164.796097
-179.595261     143.675720
-179.595261     143.675720
-179.549637     161.132858
-179.549637     161.132858
-179.504288     -40.797535
179.801575      -172.905792
179.801575      -172.905792
179.881226      133.914032
179.881226      133.914032
179.910248      159.141998
179.910248      159.141998
179.942413      130.512344
179.942413      130.512344
179.969635      164.739243

The data output is:
Code:
-17     15      5
-17     16      7
-17     17      2
17      13      4
0       0       1
17      15      2
17      16      1
-17     -17     2
-17     -4      1
17      -17     2
-17     13      2
-17     14      6

With the second script you posted using the same input data:

Output
Code:
-170    -170    2
-170    -160
-170    -150
-170    -140
-170    -130
-170    -120
-170    -110
-170    -100
-170    -90
-170    -80
-170    -70
-170    -60
-170    -50
-170    -40     1
-170    -30
-170    -20
-170    -10
-170    0
-170    10
-160    -170
-160    -160
-160    -150
-160    -140
-160    -130
-160    -120
-160    -110
-160    -100
-160    -90
-160    -80
-160    -70
-160    -60
-160    -50
-160    -40
-160    -30
-160    -20
-160    -10
-160    0
-160    10
-150    -170
-150    -160
-150    -150
-150    -140
-150    -130
-150    -120
-150    -110
-150    -100
-150    -90
-150    -80
-150    -70
-150    -60
-150    -50
-150    -40
-150    -30
-150    -20
-150    -10
-150    0
-150    10
-140    -170
-140    -160
-140    -150
-140    -140
-140    -130
-140    -120
-140    -110
-140    -100
-140    -90
-140    -80
-140    -70
-140    -60
-140    -50
-140    -40
-140    -30
-140    -20
-140    -10
-140    0
-140    10
-130    -170
-130    -160
-130    -150
-130    -140
-130    -130
-130    -120
-130    -110
-130    -100
-130    -90
-130    -80
-130    -70
-130    -60
-130    -50
-130    -40
-130    -30
-130    -20
-130    -10
-130    0
-130    10
-120    -170
-120    -160
-120    -150
-120    -140
-120    -130
-120    -120
-120    -110
-120    -100
-120    -90
-120    -80
-120    -70
-120    -60
-120    -50
-120    -40
-120    -30
-120    -20
-120    -10
-120    0
-120    10
-110    -170
-110    -160
-110    -150
-110    -140
-110    -130
-110    -120
-110    -110
-110    -100
-110    -90
-110    -80
-110    -70
-110    -60
-110    -50
-110    -40
-110    -30
-110    -20
-110    -10
-110    0
-110    10
-100    -170
-100    -160
-100    -150
-100    -140
-100    -130
-100    -120
-100    -110
-100    -100
-100    -90
-100    -80
-100    -70
-100    -60
-100    -50
-100    -40
-100    -30
-100    -20
-100    -10
-100    0
-100    10
-90     -170
-90     -160
-90     -150
-90     -140
-90     -130
-90     -120
-90     -110
-90     -100
-90     -90
-90     -80
-90     -70
-90     -60
-90     -50
-90     -40
-90     -30
-90     -20
-90     -10
-90     0
-90     10
-80     -170
-80     -160
-80     -150
-80     -140
-80     -130
-80     -120
-80     -110
-80     -100
-80     -90
-80     -80
-80     -70
-80     -60
-80     -50
-80     -40
-80     -30
-80     -20
-80     -10
-80     0
-80     10
-70     -170
-70     -160
-70     -150
-70     -140
-70     -130
-70     -120
-70     -110
-70     -100
-70     -90
-70     -80
-70     -70
-70     -60
-70     -50
-70     -40
-70     -30
-70     -20
-70     -10
-70     0
-70     10
-60     -170
-60     -160
-60     -150
-60     -140
-60     -130
-60     -120
-60     -110
-60     -100
-60     -90
-60     -80
-60     -70
-60     -60
-60     -50
-60     -40
-60     -30
-60     -20
-60     -10
-60     0
-60     10
-50     -170
-50     -160
-50     -150
-50     -140
-50     -130
-50     -120
-50     -110
-50     -100
-50     -90
-50     -80
-50     -70
-50     -60
-50     -50
-50     -40
-50     -30
-50     -20
-50     -10
-50     0
-50     10
-40     -170
-40     -160
-40     -150
-40     -140
-40     -130
-40     -120
-40     -110
-40     -100
-30     -150
-30     -140
-30     -130
-30     -120
-30     -110
-30     -100
-30     -90
-30     -80
-30     -70
-30     -60
-30     -50
-30     -40
-30     -30
-30     -20
-30     -10
-30     0
-30     10
-20     -170
-20     -160
-20     -150
-20     -140
-20     -130
-20     -120
-20     -110
-20     -100
-20     -90
-20     -80
-20     -70
-20     -60
-20     -50
-20     -40
-20     -30
-20     -20
-20     -10
-20     0
-20     10
-10     -170
-10     -160
-10     -150
-10     -140
-10     -130
-10     -120
-10     -110
-10     -100
-10     -90
-10     -80
-10     -70
-10     -60
-10     -50
-10     -40
-10     -30
-10     -20
-10     -10
-10     0
-10     10
0       -170
0       -160
0       -150
0       -140
0       -130
0       -120
0       -110
0       -100
0       -90
0       -80
0       -70
0       -60
0       -50
0       -40
0       -30
0       -20
0       -10
0       0       1
0       10
10      -170
10      -160
10      -150
10      -140
10      -130
10      -120
10      -110
10      -100
10      -90
10      -80
10      -70
10      -60
10      -50
10      -40
10      -30
10      -20
10      -10
10      0
10      10

---------- Post updated at 03:21 PM ---------- Previous update was at 03:17 PM ----------

With the +0 addition to script two:

Code:
-170    -170    2
-170    -160    0
-170    -150    0
-170    -140    0
-170    -130    0
-170    -120    0
-170    -110    0
-170    -100    0
-170    -90     0
-170    -80     0
-170    -70     0
-170    -60     0
-170    -50     0
-170    -40     1
-170    -30     0
-170    -20     0
-170    -10     0
-170    0       0
-170    10      0
-160    -170    0
-160    -160    0
-160    -150    0
-160    -140    0
-160    -130    0
-160    -120    0
-160    -110    0
-160    -100    0
-160    -90     0
-160    -80     0
-160    -70     0
-160    -60     0
-160    -50     0
-160    -40     0
-160    -30     0
-160    -20     0
-160    -10     0
-160    0       0
-160    10      0
-150    -170    0
-150    -160    0
-150    -150    0
-150    -140    0
-150    -130    0
-150    -120    0
-150    -110    0
-150    -100    0
-150    -90     0
-150    -80     0
-150    -70     0
-150    -60     0
-150    -50     0
-150    -40     0
-150    -30     0
-150    -20     0
-150    -10     0
-150    0       0
-150    10      0
-140    -170    0
-140    -160    0
-140    -150    0
-140    -140    0
-140    -130    0
-140    -120    0
-140    -110    0
-140    -100    0
-140    -90     0
-140    -80     0
-140    -70     0
-140    -60     0
-140    -50     0
-140    -40     0
-140    -30     0
-140    -20     0
-140    -10     0
-140    0       0
-140    10      0
-130    -170    0
-130    -160    0
-130    -150    0
-130    -140    0
-130    -130    0
-130    -120    0
-130    -110    0
-130    -100    0
-130    -90     0
-130    -80     0
-130    -70     0
-130    -60     0
-130    -50     0
-130    -40     0
-130    -30     0

# 19  
Old 02-17-2012
Smilie

I'm not upset about any criticism. My program did have a lot of bugs, but that's unfortunately to be expected when I have no way to test them -- I can run them, but so what, they work to my understanding of your question, not yours. Without a good example of what it should be outputting I've been grasping at straws. Three bad guesses and counting.

The example output you originally posted had nothing to do with the example input, so wasn't much help; all that demonstrates is that you want it printing rows of three numbers, not which three numbers.

Please save us both days of trouble by picking(or inventing) a few lines of input that clearly illustrate the logic you want, work them out in your head, and show what they'd output. Please.

Last edited by Corona688; 02-17-2012 at 04:36 PM..
# 20  
Old 02-17-2012
I mean, what I posted before is exactly what I want.
But no worries, I am quite satisfied with the first script you wrote.
Let us just leave it here.

Thanks for all the help!

The last modified script you posted is
good except it doesn't stop looping over B:
Code
Code:
awk 'BEGIN { BINSIZE=10; OFS="\t" }

function MIN(A, D)
{
        if(A == "") return(D);
        else if(A > D) return(D);
        else return(A);
}

function MAX(A, D)
{
        if(A == "") return(D);
        else if(A > D) return(A);
        else return(D);
}

{        A=sprintf("%d", $2/BINSIZE);
          B=sprintf("%d", $3/BINSIZE);
          AMIN=MIN(AMIN, A);  AMAX=MAX(AMAX, A);

          MINB[A]=MIN(MINB[A], B);
          MAXB[A]=MAX(MAXB[A], B);
          BIN[A,B]++;
}
END { for(A=AMIN;  A<=AMAX; A++)
          {
                  if(MINB[A] == "") continue; # No data
                  for(B=MINB[A]; A<=MAXB[A]; B++)
                          print A*BINSIZE, B*BINSIZE, BIN[A,B]+0;
          }
 }' inputfile

Input
Code:
-179.995483     132.155258
-179.995483     132.155258
-179.986374     153.868210
-179.986374     153.868210
-179.925522     149.994141
-179.894913     -176.379990
-179.894913     -176.379990
-179.888428     159.134262
-179.790649     158.782471
-179.790649     158.782471
-179.768814     146.420975
-179.768814     146.420975
-179.701813     148.886353
-179.685852     177.829773
-179.685852     177.829773
-179.670364     161.292084
-179.634399     161.466721
-179.634399     161.466721
-179.631607     164.796097
-179.631607     164.796097
-179.595261     143.675720
-179.595261     143.675720
-179.549637     161.132858
-179.549637     161.132858
-179.504288     -40.797535
179.801575      -172.905792
179.801575      -172.905792
179.881226      133.914032
179.881226      133.914032
179.910248      159.141998
179.910248      159.141998
179.942413      130.512344
179.942413      130.512344
179.969635      164.739243

Output:
Code:
-170    -170    2
-170    -160    0
-170    -150    0
-170    -140    0
-170    -130    0
-170    -120    0
-170    -110    0
-170    -100    0
-170    -90     0
-170    -80     0
-170    -70     0
-170    -60     0
-170    -50     0
-170    -40     1
-170    -30     0
-170    -20     0
-170    -10     0
-170    0       0
-170    10      0
-170    20      0
-170    30      0
-170    40      0
-170    50      0
-170    60      0
-170    70      0
-170    80      0
-170    90      0
-170    100     0
-170    110     0
-170    120     0
-170    130     2
-170    140     6
-170    150     5
-170    160     7
-170    170     2
-170    180     0
-170    190     0
-170    200     0
-170    210     0
-170    220     0
-170    230     0
-170    240     0
-170    250     0
-170    260     0
-170    270     0
-170    280     0
-170    290     0
-170    300     0
-170    310     0
-170    320     0
-170    330     0
-170    340     0
-170    350     0
-170    360     0
-170    370     0
-170    380     0
-170    390     0
-170    400     0
-170    410     0
-170    420     0
-170    430     0
-170    440     0
-170    450     0
-170    460     0
-170    470     0
-170    480     0
-170    490     0
-170    500     0
-170    510     0
-170    520     0
-170    530     0
-170    540     0
-170    550     0
-170    560     0
-170    570     0
-170    580     0
-170    590     0
-170    600     0
-170    610     0
-170    620     0
-170    630     0
-170    640     0
-170    650     0
-170    660     0
-170    670     0
-170    680     0
-170    690     0
-170    700     0
and doesn't stop until 99999999999999999

---------- Post updated at 05:15 PM ---------- Previous update was at 05:12 PM ----------

---------- Post updated at 05:16 PM ---------- Previous update was at 05:15 PM ----------

Going back to script.awk
Code:
BEGIN { MIN=99999999; MAX=-MIN; OFS="\t"; BINSIZE=10; }

{         A=sprintf("%d", $1/BINSIZE);
          B=sprintf("%d", $2/BINSIZE);
          BIN[A OFS B]++;
}
END { for(X in BIN) print X, BIN[X]; }

I would like my data to have a newline each time
a new bin is output, is this possible?
Example output
Code:
-170 -140 4
-170 -150 14
-170 -160 46
-170 -170 122
-170 -30 1
-170 -40 7
-170 -50 3
-170 -60 3
-170 120 9
-170 130 83
-170 140 258
-170 150 366
-170 160 384
-170 170 246

-160 -130 4
-160 -140 9
-160 -150 38
-160 -160 164
-160 -170 587
-160 -30 3
-160 -40 4
-160 -50 8
-160 -60 1
-160 100 2
-160 110 13
-160 120 35
-160 130 339
-160 140 1135
-160 150 1903
-160 160 1975
-160 170 1414

-150 -110 3
-150 -120 1
-150 -130 6
-150 -140 26
-150 -150 95
-150 -160 453
-150 -170 1771
-150 -20 3
-150 -30 8
-150 -40 4
-150 -50 10
-150 -60 9
-150 -80 2
-150 100 6
-150 110 35
-150 120 184
-150 130 795
-150 140 2649
-150 150 5267
-150 160 5897
-150 170 4198

-140 -10 3
-140 -100 6
-140 -110 5
-140 -120 7
-140 -130 26
-140 -140 56
-140 -150 203
-140 -160 874
-140 -170 3168
-140 -20 4
-140 -30 7
-140 -40 15
-140 -50 11
-140 -60 9
-140 -70 1
-140 -80 4
-140 -90 10
-140 10 2
-140 100 31
-140 110 90
-140 120 408
-140 130 1434
-140 140 4305
-140 150 7987
-140 160 10015
-140 170 7310
-140 60 2
-140 80 2
-140 90 13

-130 -10 7
-130 -100 2

---------- Post updated at 05:15 PM ---------- Previous update was at 05:15 PM ----------

If this is too complicated, then don't worry about it!
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk Sort 2d histogram output from min(X,Y) to max(X,Y)

I've got Gnuplot-format 2D histogram data output which looks as follows. 6.5 -1.25 10.2804 6.5404 -1.25 10.4907 6.58081 -1.25 10.8087 6.62121 -1.25 10.4686 6.66162 -1.25 10.506 6.70202 -1.25 10.3084 6.74242 -1.25 9.68256 6.78283 -1.25 9.41229 6.82323 -1.25 9.43078 6.86364 -1.25 9.62408... (1 Reply)
Discussion started by: chrisjorg
1 Replies

2. Shell Programming and Scripting

Script (ksh) to get data in every 30 mins interval for the given date

Hello, Since I m new to shell, I had a hard time to sought out this problem. I have a log file of a utility which tells that batch files are successful with timestamp. Given below is a part of the log file. 2013/03/07 00:13:50 Apache/1.3.29 (Unix) configured -- resuming normal operations... (12 Replies)
Discussion started by: rpm120
12 Replies

3. Shell Programming and Scripting

awk for histogram

I have a single file that looks like this: 1.62816 1.62816 0.86941 0.86941 0.731465 0.731465 1.03174 1.03174 0.769444 0.769444 0.981181 0.981181 1.14681 1.14681 1.00511 1.00511 1.20385 1.20385 (2 Replies)
Discussion started by: kayak
2 Replies

4. Shell Programming and Scripting

Data counting

I have a large tab delimited text file with 10 columns for example chrM 412 A A 75 0 25 2 ..,AGAATt II chrM 413 G G 72 0 25 4 ..t,,Aag IIIH chrM 414 C C 75 0 25 4 ...a,.. III2 chrM 415 C T 75 75 25 4 TTTt,,,ATC III7 At... (4 Replies)
Discussion started by: Lucky Ali
4 Replies

5. Shell Programming and Scripting

counting using awk

Hi, I want to perform a task using shell script. I am new to awk programming and any help would be greatly appreciated. I have the following 3 files (for example) file1: Name count Symbol chr1_1_50 10 XXXX chr3_101_150 30 YYYY File2: Name ... (13 Replies)
Discussion started by: Diya123
13 Replies

6. Shell Programming and Scripting

Help- counting delimiter in a huge file and split data into 2 files

I’m new to Linux script and not sure how to filter out bad records from huge flat files (over 1.3GB each). The delimiter is a semi colon “;” Here is the sample of 5 lines in the file: Name1;phone1;address1;city1;state1;zipcode1 Name2;phone2;address2;city2;state2;zipcode2;comment... (7 Replies)
Discussion started by: lv99
7 Replies

7. Shell Programming and Scripting

Counting average data per hour

Hi i have log like this : Actually i will process the data become Anybody can help me ? (6 Replies)
Discussion started by: justbow
6 Replies

8. Shell Programming and Scripting

compare the interval of 2 numbers of input2with interval of several numbers of input1

Help plz Does any one have any idea how to compare interval ranges of 2 files. finding 1-4 (1,2,3,4) of input2 in input1 of same key "a" values (5-10, 30-40, 45-60, 80-90, 100-120 ). Obviously 1-4 is not one of the range with in input1 a. so it should give out of range. finding 30-33(31,32,33)... (1 Reply)
Discussion started by: repinementer
1 Replies

9. Shell Programming and Scripting

Counting with Awk

I need "awk solution" for simple counting! File looks like: STUDENT GRADE student1 A student2 A student3 B student4 A student5 B Desired Output: GRADE No.of Students A 3 B 2 Thanks for awking! (4 Replies)
Discussion started by: saint2006
4 Replies

10. Shell Programming and Scripting

To extract data of a perticular interval (date-time wise)

I want a shell script which extract data from a log file which contains date and time-wise data and i need the data for a perticular interval of time...what can i do??? (3 Replies)
Discussion started by: abhishek27
3 Replies
Login or Register to Ask a Question