AWK counting interval / histogram data


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting AWK counting interval / histogram data
# 8  
Old 02-16-2012
Quote:
Originally Posted by chrisjorg
Fantastic, thanks.
What if I wanted to output bins that were not visited?
working on it.

---------- Post updated at 12:42 PM ---------- Previous update was at 12:36 PM ----------

Code:
awk 'BEGIN { AMIN=99999999; AMAX=-AMIN; BMIN=AMIN; BMAX=AMAX; OFS="\t"; BINSIZE=10; }

{        A=sprintf("%d", $2/BINSIZE);
          B=sprintf("%d", $3/BINSIZE);
          if(A<AMIN) AMIN=A; else if(A>AMAX) AMAX=A;
          if(B<BMIN) BMIN=B; else if(B>BMAX) BMAX=B;
          BIN[A,B]++;
}
END { for(A=AMIN; A<=AMAX; A++)
         for(B=BMIN; B<=BMAX; B++)
         print A*BINSIZE, B*BINSIZE, BIN[A,B]; }' inputfile


Last edited by Corona688; 02-16-2012 at 02:51 PM..
# 9  
Old 02-16-2012
Ok, so first I changed AMIN=-180 which then made more sense.

My data output now looks like:

Code:
-100    0       108
-100    10      115
-100    20      134
-100    30      121
-100    40      150
-100    50      180
-100    60      189
-100    70      149
-100    80      122
-100    90      133
-100    100     111
-100    110     171
-100    120     206
-100    130     414
-100    140     450
-100    150     506
-100    160     311
-100    170     177
-100    180
-100    190
-100    200
-100    210
-100    220
-100    230
-100    240
-100    250
-100    260
-100    270
-100    280
-100    290
-100    300
-100    310

Which is good, but I want it to stop looping after 180. B keeps going until 890 for some reason. The if conditions you introduced make sense.
# 10  
Old 02-16-2012
There's a reason I set the MIN and MAX values to what they were. Look at the defaults -- a positive minimum and a negative maximum. The first data item will override them because they're smaller than the minimum and smaller than the maximum. Further items will extend the range sensibly from there.

In fact, on thinking about it, the bug is that it needs to set both of them on the first try. It should not leave AMAX at -99999999 when it gets a value larger than it. So those "else's" shouldn't be there.

Code:
awk 'BEGIN { AMIN=99999999; AMAX=-AMIN; BMIN=AMIN; BMAX=AMAX; OFS="\t"; BINSIZE=10; }

{        A=sprintf("%d", $2/BINSIZE);
          B=sprintf("%d", $3/BINSIZE);
          if(A<AMIN) AMIN=A; if(A>AMAX) AMAX=A;
          if(B<BMIN) BMIN=B; if(B>BMAX) BMAX=B;
          BIN[A,B]++;
}
END { for(A=AMIN; A<=AMAX; A++)
         for(B=BMIN; B<=BMAX; B++)
         print A*BINSIZE, B*BINSIZE, BIN[A,B]; }' inputfile

This User Gave Thanks to Corona688 For This Post:
# 11  
Old 02-17-2012
Hmm, well my data is weird.

The initial script you wrote output the correct frequencies,
just that it was not looping over bins which gave zero frequency.
I think the answer lies in introducing a conditional OR comment,
such that bins that are not visited still get assigned a zero frequency
(like in the Perl script). I am looking myself to see if I can find the
solution here. It has been extremely helpful!

---------- Post updated at 08:24 AM ---------- Previous update was at 05:12 AM ----------

Pretty much. When running with your last script, the script keeps setting AMIN to -100 which is wrong. I have input data of -170, but this is not represented in the output.

Last edited by chrisjorg; 02-17-2012 at 06:38 AM..
# 12  
Old 02-17-2012
Try switching A and B in the final loop.
# 13  
Old 02-17-2012
Ok,
but why do you not do like in the first script:

Code:
for (X in BIN) print X, BIN[X]

Would you not need to correlate your variables A, B to the array BIN?

Last edited by chrisjorg; 02-17-2012 at 02:20 PM..
# 14  
Old 02-17-2012
AWK array

For a multidimensional array ar[A, B] how do you only refer to A?

Is it

Code:
for (A in ar)

?
Thanks
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk Sort 2d histogram output from min(X,Y) to max(X,Y)

I've got Gnuplot-format 2D histogram data output which looks as follows. 6.5 -1.25 10.2804 6.5404 -1.25 10.4907 6.58081 -1.25 10.8087 6.62121 -1.25 10.4686 6.66162 -1.25 10.506 6.70202 -1.25 10.3084 6.74242 -1.25 9.68256 6.78283 -1.25 9.41229 6.82323 -1.25 9.43078 6.86364 -1.25 9.62408... (1 Reply)
Discussion started by: chrisjorg
1 Replies

2. Shell Programming and Scripting

Script (ksh) to get data in every 30 mins interval for the given date

Hello, Since I m new to shell, I had a hard time to sought out this problem. I have a log file of a utility which tells that batch files are successful with timestamp. Given below is a part of the log file. 2013/03/07 00:13:50 Apache/1.3.29 (Unix) configured -- resuming normal operations... (12 Replies)
Discussion started by: rpm120
12 Replies

3. Shell Programming and Scripting

awk for histogram

I have a single file that looks like this: 1.62816 1.62816 0.86941 0.86941 0.731465 0.731465 1.03174 1.03174 0.769444 0.769444 0.981181 0.981181 1.14681 1.14681 1.00511 1.00511 1.20385 1.20385 (2 Replies)
Discussion started by: kayak
2 Replies

4. Shell Programming and Scripting

Data counting

I have a large tab delimited text file with 10 columns for example chrM 412 A A 75 0 25 2 ..,AGAATt II chrM 413 G G 72 0 25 4 ..t,,Aag IIIH chrM 414 C C 75 0 25 4 ...a,.. III2 chrM 415 C T 75 75 25 4 TTTt,,,ATC III7 At... (4 Replies)
Discussion started by: Lucky Ali
4 Replies

5. Shell Programming and Scripting

counting using awk

Hi, I want to perform a task using shell script. I am new to awk programming and any help would be greatly appreciated. I have the following 3 files (for example) file1: Name count Symbol chr1_1_50 10 XXXX chr3_101_150 30 YYYY File2: Name ... (13 Replies)
Discussion started by: Diya123
13 Replies

6. Shell Programming and Scripting

Help- counting delimiter in a huge file and split data into 2 files

I’m new to Linux script and not sure how to filter out bad records from huge flat files (over 1.3GB each). The delimiter is a semi colon “;” Here is the sample of 5 lines in the file: Name1;phone1;address1;city1;state1;zipcode1 Name2;phone2;address2;city2;state2;zipcode2;comment... (7 Replies)
Discussion started by: lv99
7 Replies

7. Shell Programming and Scripting

Counting average data per hour

Hi i have log like this : Actually i will process the data become Anybody can help me ? (6 Replies)
Discussion started by: justbow
6 Replies

8. Shell Programming and Scripting

compare the interval of 2 numbers of input2with interval of several numbers of input1

Help plz Does any one have any idea how to compare interval ranges of 2 files. finding 1-4 (1,2,3,4) of input2 in input1 of same key "a" values (5-10, 30-40, 45-60, 80-90, 100-120 ). Obviously 1-4 is not one of the range with in input1 a. so it should give out of range. finding 30-33(31,32,33)... (1 Reply)
Discussion started by: repinementer
1 Replies

9. Shell Programming and Scripting

Counting with Awk

I need "awk solution" for simple counting! File looks like: STUDENT GRADE student1 A student2 A student3 B student4 A student5 B Desired Output: GRADE No.of Students A 3 B 2 Thanks for awking! (4 Replies)
Discussion started by: saint2006
4 Replies

10. Shell Programming and Scripting

To extract data of a perticular interval (date-time wise)

I want a shell script which extract data from a log file which contains date and time-wise data and i need the data for a perticular interval of time...what can i do??? (3 Replies)
Discussion started by: abhishek27
3 Replies
Login or Register to Ask a Question