Create bins with totals and percentage


 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers Create bins with totals and percentage
# 1  
Old 02-13-2020
Create bins with totals and percentage

I would like to create bins to get histogram with totals and percentage, e.g. starting from 0.

If possible to set the minimum and maximum value in the bins ( in my case value min=0 and max=20 )

Input file
Code:
8  5
10 1
11 4
12 4
12 4
13 5
16 7
18 9
16 9
17 7
18 5
19 5
20 1
21 7

output desired
Code:
      0 0        0.0%
 0 -  2 0        0.0%
 2 -  4 0        0.0%
 4 -  6 0        0.0%
 6 -  8 0        0.0%
 8 - 10 5        6.8%
10 - 12 5        6.8%
12 - 14 13      17.8%
14 - 16 0        0.0%
16 - 18 23      31.5%
18 - 20 19      26.0%
   > 20 8       11.0%
---------------------
Total: 73


I use this code, it works perfectly but the percentage is missed.

Code:
awk 'BEGIN { delta = (delta == "" ? 2 : delta) }
{
    bucketNr = int(($0+delta) / delta)
    cnt[bucketNr]++
    numBuckets = (numBuckets > bucketNr ? numBuckets : bucketNr)
}
END {
    for (bucketNr=1; bucketNr<=numBuckets; bucketNr++) {
        end = beg + delta
        printf "%0.1f %0.1f %d\n", beg, end, cnt[bucketNr]
        beg = end
    }
}' file

Thanks in advance

Last edited by jiam912; 02-14-2020 at 12:46 AM..
# 2  
Old 02-13-2020
how's the percentage supposed to be calculated?
percentage of the bin out of the total sum?
then 8-10 5 8.2 is supposed to be 8-10 5 3.5 for your example...
Please elaborate.
# 3  
Old 02-13-2020
How far would this small adaption to your code get you:
Code:
awk '
BEGIN   {delta = (delta=="")?2:delta
        }

        {bucketNr = int(($1+delta) / delta)
         cnt[bucketNr] += $2
         TOT           += $2
         numBuckets = (numBuckets > bucketNr ? numBuckets : bucketNr)
        }

END     {for (bucketNr=1; bucketNr<=numBuckets; bucketNr++)     {end = beg + delta
                                                                 printf "%2.0f - %2.0f\t%d\t%4.1f%%\n", beg, end, cnt[bucketNr], cnt[bucketNr] / TOT *100
                                                                 beg = end
                                                                }
        }
' file
 0 -  2    0     0.0%
 2 -  4    0     0.0%
 4 -  6    0     0.0%
 6 -  8    0     0.0%
 8 - 10    5     6.8%
10 - 12    5     6.8%
12 - 14    13    17.8%
14 - 16    0     0.0%
16 - 18    23    31.5%
18 - 20    19    26.0%
20 - 22    8     11.0%

This User Gave Thanks to RudiC For This Post:
# 4  
Old 02-13-2020
Dear vgersh99 , the percentage is calculated based in the total sum .
# 5  
Old 02-13-2020
For your "max bin" question, try

Code:
awk -v"MAX=20" '
BEGIN   {delta = (delta=="")?2:delta
         MXBCK = (MAX / delta) + 1
        }

        {bucketNr = int(($1+delta) / delta)
         if (bucketNr > MXBCK) bucketNr = MXBCK
         cnt[bucketNr] += $2
         TOT           += $2
         numBuckets = (numBuckets > bucketNr ? numBuckets : bucketNr)
        }

END     {for (bucketNr=1; bucketNr<numBuckets; bucketNr++)      {end = beg + delta
                                                                 printf "%2.0f - %2.0f\t%d\t%4.1f%%\n", beg, end, cnt[bucketNr], cnt[bucketNr] / TOT *100
                                                                 beg = end
                                                                }
         printf "   > %2.0f\t%d\t%4.1f%%\n", MAX, cnt[MXBCK], cnt[MXBCK] / TOT *100
         print "---------------------"
         print "Total:", TOT
        }
' file
 0 -  2    0     0.0%
 2 -  4    0     0.0%
 4 -  6    0     0.0%
 6 -  8    0     0.0%
 8 - 10    5     6.8%
10 - 12    5     6.8%
12 - 14    13    17.8%
14 - 16    0     0.0%
16 - 18    23    31.5%
18 - 20    19    26.0%
   > 20    8     11.0%
---------------------
Total: 73


Last edited by RudiC; 02-13-2020 at 05:38 PM.. Reason: Added the forgotten "Total" line...
This User Gave Thanks to RudiC For This Post:
# 6  
Old 02-13-2020
same as above but total included:
Code:
awk 'BEGIN { delta = (delta == "" ? 2 : delta) ; max=20 }
{
if ($1 > max) {
maxf++;
maxc+=$2;
} else {
bucketNr = int(($1+delta) / delta)
cnt[bucketNr]++
cntc[bucketNr]+=$2;
numBuckets = (numBuckets > bucketNr ? numBuckets : bucketNr)
}
total+=$2;
}
END {
for (bucketNr=1; bucketNr<=numBuckets; bucketNr++) {
end = beg + delta
printf "%2d-%2d %3d %.2f%\n", beg, end, cntc[bucketNr], (cntc[bucketNr]*100.0)/total;
beg = end
}
if (maxf) printf ">%2d %2d %.2f%\n", max, maxc, (maxc*100.0)/total;
print "-------------";
print "Total: " total;
}' file

The code posted above needs some work. Lucky that $0+delta works somehow. The code posted does not output the columns desired shown, etc.

Last edited by RavinderSingh13; 02-28-2020 at 01:39 AM..
This User Gave Thanks to rdrtx1 For This Post:
# 7  
Old 02-13-2020
RudiC/rdrtx1,
the binning, the percentage and the totals are somewhat different from the OP's desired output in post #1.
Hence post #2 with the ask to elaborate.
These 2 Users Gave Thanks to vgersh99 For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to select 2D data bins

I wish to use AWK to do something akin: Select all 2D data with 1<$1<2 and -7.5<$2<-6.5 But it's not working awk 'END {print ($1<=2&&$1>=1&&$2<=-6.5&&$2>=-7.5)}' bla Data: -1.06897 -8.04482 -61.469 -1.13613 -8.04482 -61.2271 -1.00182 -8.04482 -61.2081 -1.06897 -8.13518 -60.8544... (2 Replies)
Discussion started by: chrisjorg
2 Replies

2. Shell Programming and Scripting

Grand totals in awk

I have a one-liner script like this that gives a total of everything in various directories: for i in *; do (cd $i && cd statelist && echo $i && ls -la |awk 'NR>3 {SUM += $5}\ END { print "Total number of elements " SUM }');done It works just great but at the end I want to print a grand... (3 Replies)
Discussion started by: newbie2010
3 Replies

3. UNIX for Dummies Questions & Answers

awk solution for taking bins

Hi all, I'm looking for an awk solution for taking bins of data set. For example, if I have two columns of data that I wish to use for a scatter plot, and it contains 5 million lines, how can I take averages of every 100 points, 1000, 10000 etc... The idea is to take bins of the 5,000,000 points... (7 Replies)
Discussion started by: torchij
7 Replies

4. Solaris

Redirecting print to optional output bins

Guys We have a HP P4015 laserjet printer with a 5 bin mailbox attached & configured. We can print to the specific output bins from Oracle e-Business suite, however our print output format is incompatible so it prints out random characters instead of the letter content. I have looked... (2 Replies)
Discussion started by: s1977
2 Replies

5. Shell Programming and Scripting

Calculating frequency of values within bins

Hi, I am working with files containing 2 columns in which i need to come up with the frequency/count of values in col. 2 falling within specifics binned values of col. 1. the contents of a sample file is shown below: 15 12.5 15 11.2 16 0.2 16 1.4 17 1.6 18 4.5 17 5.6 12 8.6 11 7.2 9 ... (13 Replies)
Discussion started by: ida1215
13 Replies

6. Shell Programming and Scripting

Totals in a file - incorrectly displaying

Afternoon, I have a script which creates/modifies data into a formatted csv. The trailer record should display 2 columns, the first is a static entry of "T" to identify it as a trailer record. The 2nd is a total of amounts in a column throughout the entire file. My total isn't displaying... (8 Replies)
Discussion started by: mcclunyboy
8 Replies

7. Shell Programming and Scripting

Report Totals

Hello, I have written a script in a previous server and its being migrated to a new server. I'm trying to debug my script since i've had to make minor changes to it to get it to work. I'm having a hard time getting my totals to populate here is the syntax DUMP_COUNT=`sqlplus -S... (4 Replies)
Discussion started by: senormarquez
4 Replies

8. Shell Programming and Scripting

summarising totals in awk

awk ' FILENAME == "all" { balance += substr($0,17,13) dt = substr($0,6,8) } END { for ( name in balance ) printf("%013s %3s of %8s\n", balance/100,name,dt) | "sort " } ' all > summation using this code i wanted to take summary totals of... (3 Replies)
Discussion started by: paresh n doshi
3 Replies

9. Shell Programming and Scripting

Calculating totals in AWK

Hello, With the following small script I list the size of documents belonging to a certain user by each time selecting the bytes-field of that file ($7). Now it fills the array with every file it finds so in the end the output of some users contains up to 200.000 numbers. So how can I calculate... (7 Replies)
Discussion started by: Hille
7 Replies
Login or Register to Ask a Question