Visit Our UNIX and Linux User Community


Calculate percentage of a value accross m


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Calculate percentage of a value accross m
# 1  
Old 12-07-2013
Calculate percentage of a value accross m

I have 100 csv files like:

Code:
file_city_1 file_city_2 file_city_3 file_city_4

City name is variable, there is 25 cities, each city has 4 region. Each of the 4 region contain some statistics like:

Code:
parameter1 : number1
    parameter1 : number2
    .....
    parameter50 : number50


So I should calculate each region's each parameter percentage in city total for this parameter. So what I want to do is something something like:

Code:
    file_city_parameter1_total = parameter1_region1 + parameter1_region2 + parameter1_region3 + parameter1_region4

Then calculate percentage for this specific parameter for each region:

Code:
 file_city_region_parameter1_percentage = parameter1_region1 / file_city_parameter1_total * 100

then echo all parameters percentage over total (sum of 4 regions) for a specific region and should do it for regions (all 100 files)

I actually tried multiple codes but I don't want to paste them here as they are way long to read. I defined all parameters separately for each region, parameters, tried different command substitution and nested loop. But stuck at some point. So I'm wondering what is the easiest way to accomplish this work done with awk, grep, sed or anything else.

Thanks.
# 2  
Old 12-07-2013
Quote:
Originally Posted by Meacham12
I have 100 csv files like:

Code:
file_city_1 file_city_2 file_city_3 file_city_4

City name is variable, there is 25 cities, each city has 4 region. Each of the 4 region contain some statistics like:

Code:
parameter1 : number1
    parameter1 : number2
    .....
    parameter50 : number50

So I should calculate each region's each parameter percentage in city total for this parameter. So what I want to do is something something like:

Code:
    file_city_parameter1_total = parameter1_region1 + parameter1_region2  + parameter1_region3 + parameter1_region4

Then calculate percentage for this specific parameter for each region:

Code:
 file_city_region_parameter1_percentage = parameter1_region1 /  file_city_parameter1_total * 100

then echo all parameters percentage over total (sum of 4 regions) for a specific region and should do it for regions (all 100 files)

I actually tried multiple codes but I don't want to paste them here as they are way long to read. I defined all parameters separately for each region, parameters, tried different command substitution and nested loop. But stuck at some point. So I'm wondering what is the easiest way to accomplish this work done with awk, grep, sed or anything else.

Thanks.


Whether it is possible for you to attach 1 sample file ?

Something like this might be you wanted, I will not guarantee you that it will work for your data, since you didn't show file structure, it's just a guess
Code:
for i in file_city_* ; do
      awk -F":" '{ sum+=$2; var[$1]=$2 }END{ for( i in var ) print "file_city_region_"i"_percentage = ", var[i]/sum*100 }' $i
done


Last edited by Akshay Hegde; 12-07-2013 at 07:14 AM.. Reason: typo fix
# 3  
Old 12-07-2013
this is the parameters I copied from the original file:

Code:
 cat network_stats_miami_1.csv 
Total Subs                                : 0
VLAN Count                                         : 28129501
Subs Segmentation                                     : 28142
ARPU Segmentation                                         : 0
RTT Delay                             : 3096610
IPv4 Fragmented Count                              : 2809853525
IPv4 Non Fragmented Count                          : 2809853525
Call Drop Rate                                : 1
Connection with Good voice quality                  : 45545345
Standalone Dedicated Control Channel (SDCCH) Congestion : 0   

.
.
.

The format is like that.

---------- Post updated at 07:09 AM ---------- Previous update was at 06:49 AM ----------

Quote:
Originally Posted by Akshay Hegde
Whether it is possible for you to attach 1 sample file ?

Something like this might be you wanted, I will not guarantee you that it will work for your data, since you didn't show file structure, it's just a guess
Code:
for i in file_city_* ; do
      awk -F":" '{ sum+=$2; var[$1]=$2 }END{ for( i in var ) print "file_city_region_"i"_percentage = ", var[i]/sum*100 }' $i
done

This will calculate the summary of all KPI's right? But I want to calculate each KPI's percentage separately. Each KPI's are located in same line for each file.

l also thought something like:

Code:
for city in ${CITY} ; do
for region in {$REGION} ; do
for kpi in ${KPI} ; do

#calculate sum of kpi 

eval ${kpi}_${city}_total=$( awk - F ":" '/$kpi/{sum=sum+$2} END {print sum}' ${csv_dir}/${file_base}_${city}* )

# parse a single region kpi value:

eval ${kpi}_${city}_${region}=$( awk -F ":"  '/$kpi/{print $2}' ${csv_dir}/${file}_${city}_${region}.csv )

#then calculate percentage

eval ${kpi}_${city}_${region}_percentage=$( awk -v  var1=$(eval echo \${kpi}_${city}_${region}) -v var2=$(eval echo \${kpi}_${city}_total) 'BEGIN{print var1 / var2 * 100}' )

And I'm stuck at this point. My script calculates kpi total and region value correctly but it is creating some error during the calculation of percentage.

It is generating an output like nun. I know this is not easiest way to resolve this, so I wanted to take the opinions of the others.

Last edited by Scrutinizer; 12-07-2013 at 10:26 AM.. Reason: code tags
# 4  
Old 12-07-2013
Quote:
Originally Posted by Meacham12
this is the parameters I copied from the original file:

cat network_stats_miami_1.csv
Total Subs : 0
VLAN Count : 28129501
Subs Segmentation : 28142
ARPU Segmentation : 0
RTT Delay : 3096610
IPv4 Fragmented Count : 2809853525
IPv4 Non Fragmented Count : 2809853525
Call Drop Rate : 1
Connection with Good voice quality : 45545345
Standalone Dedicated Control Channel (SDCCH) Congestion : 0

.
.
.
The format is like that.
What's your expected output for above input ?
# 5  
Old 12-07-2013
Well for example I want to first sum all Total Subs values in all of the 4 region for the city. Then calculate the percentage of this region's Total Subs value over the city total ( 4 regions) Do same thing for each kpi, each region. And finally I want to see an output like:


Code:
Miami 

KPI Name                           Region1 Region2 Region3 Region4
Total Subs                              20%       30%      40%       10%
VLAN Count                            15%       40%      40%         5%
Subs Segmentation                80%       5%        5%          10%
.
.
.
.
Chicago

KPI Name                           Region1 Region2 Region3 Region4
Total Subs                              20%       30%      40%       10%
VLAN Count                            15%       40%      40%         5%
Subs Segmentation                80%       5%        5%          10%

And so on.
# 6  
Old 12-07-2013
if your real file contains region like this try , I just pasted same contents 4 times in 2 file and tested

Code:
$ cat file_test1.tmp
Total Subs : 0
VLAN Count : 28129501
Subs Segmentation : 28142
ARPU Segmentation : 0
RTT Delay : 3096610
IPv4 Fragmented Count : 2809853525
IPv4 Non Fragmented Count : 2809853525
Call Drop Rate : 1
Connection with Good voice quality : 45545345
Standalone Dedicated Control Channel (SDCCH) Congestion : 0

Total Subs : 0
VLAN Count : 28129501
Subs Segmentation : 28142
ARPU Segmentation : 0
RTT Delay : 3096610
IPv4 Fragmented Count : 2809853525
IPv4 Non Fragmented Count : 2809853525
Call Drop Rate : 1
Connection with Good voice quality : 45545345
Standalone Dedicated Control Channel (SDCCH) Congestion : 0

Total Subs : 0
VLAN Count : 28129501
Subs Segmentation : 28142
ARPU Segmentation : 0
RTT Delay : 3096610
IPv4 Fragmented Count : 2809853525
IPv4 Non Fragmented Count : 2809853525
Call Drop Rate : 1
Connection with Good voice quality : 45545345
Standalone Dedicated Control Channel (SDCCH) Congestion : 0



Total Subs : 0
VLAN Count : 28129501
Subs Segmentation : 28142
ARPU Segmentation : 0
RTT Delay : 3096610
IPv4 Fragmented Count : 2809853525
IPv4 Non Fragmented Count : 2809853525
Call Drop Rate : 1
Connection with Good voice quality : 45545345
Standalone Dedicated Control Channel (SDCCH) Congestion : 0

Code:
$ cp file_test1.tmp file_test2.tmp

Code:
$ ls *.tmp -1
file_test1.tmp
file_test2.tmp

Code:
awk -F":" ' 

function stat(){
        # print file being used
        print p RS 

        for(i=1;i<=j;i++){
                if(C[i,1]){

                            # If divisible by zero then zero
                            per = C[i,2] == 0 ? "0" : C[C[i,1],3]/(C[i,2]*100)

                            X[C[i,1]] = X[C[i,1]] ? X[C[i,1]] OFS per"%" : per"%"
                    
                            # Variable for future use to print parameter in same order
                             stop = split(X[C[i,1]],Z,OFS) == 1 ?  i : stop 
                           }
                        }

        # Count how many regions are there in file 
        region = split(X[C[i-1,1]],Z,OFS)

        for(i=1;i<=region;i++){
                               printf i==1 ? "Parameter" OFS "Region"i OFS : \
                               i < region ? "Region"i OFS : "Region"i RS
                              }
        
           for(i=1;i<=stop;i++)
                    print C[i,1],X[C[i,1]];print RS
        
                   }

      FNR==1{
               if(NR != 1){
                
                          # Call for each file 
                          stat()

                          # Reset for each file
                          j = region = stop = 0
                         delete X ;  delete C; delete Z
                           }
              p = FILENAME
            } 

        {
             # Stores Parameter
             C[++j,1] = $1

             # Stores Parameter Value
             C[j,2]   = $2

             # Sum of each Parameter
             C[C[j,1],3] += C[j,2]
        }

     END{
          stat()
        }
    ' OFS=\, *.tmp

Code:
file_test1.tmp

Parameter,Region1,Region2,Region3,Region4
Total Subs ,0%,0%,0%,0%
VLAN Count ,0.04%,0.04%,0.04%,0.04%
Subs Segmentation ,0.04%,0.04%,0.04%,0.04%
ARPU Segmentation ,0%,0%,0%,0%
RTT Delay ,0.04%,0.04%,0.04%,0.04%
IPv4 Fragmented Count ,0.04%,0.04%,0.04%,0.04%
IPv4 Non Fragmented Count ,0.04%,0.04%,0.04%,0.04%
Call Drop Rate ,0.04%,0.04%,0.04%,0.04%
Connection with Good voice quality ,0.04%,0.04%,0.04%,0.04%
Standalone Dedicated Control Channel (SDCCH) Congestion ,0%,0%,0%,0%


file_test2.tmp

Parameter,Region1,Region2,Region3,Region4
Total Subs ,0%,0%,0%,0%
VLAN Count ,0.04%,0.04%,0.04%,0.04%
Subs Segmentation ,0.04%,0.04%,0.04%,0.04%
ARPU Segmentation ,0%,0%,0%,0%
RTT Delay ,0.04%,0.04%,0.04%,0.04%
IPv4 Fragmented Count ,0.04%,0.04%,0.04%,0.04%
IPv4 Non Fragmented Count ,0.04%,0.04%,0.04%,0.04%
Call Drop Rate ,0.04%,0.04%,0.04%,0.04%
Connection with Good voice quality ,0.04%,0.04%,0.04%,0.04%
Standalone Dedicated Control Channel (SDCCH) Congestion ,0%,0%,0%,0%

if you need tab separated file then at the end change OFS=\, to OFS=\\t , if you are trying on Solaris/Sun OS use nawk
# 7  
Old 12-07-2013
Hi Akshay;

I first appreciate for long answer and code.

My real file doesn't contain any region field. It's just in the name of file.

For example: file_miami_1 is region 1, file_miami_2 is region 2, file_miami_3 region 3 and file_miami_4 is region 4. For each city there is 4 region, this number is constant. And the kpi percentage for each region should be something like kpi1_city1_region1 / ( kpi1_city1_region1 + kpi1_city1_region2 + kpi1_city1_region3 + kpi1_city1_region4 ) * 100

and this number is let say kpi1_city1_region1_percentage

For each kpi I want each kpi is displayed on the same line like:

City1 Region1 Percentages:

KPI Name Reg1% Reg2% Reg3% Reg4%
Total Subs 0% 10% 20% 70%
Vlan Count 10% 20% 20% 60%

..... (for all kpi's)

And then same thing for city1 region2, region3, region4 (a separate table for each city_region)

Then the next city city2_region1, city2_region2, city2_region3, city2_region4

until

city25_region1, city25_region2, city_25_region3, city25_region4

So the program should display in total of 100 percentage table for 25 cities, 4 regions.

This is what your code does?

Previous Thread | Next Thread
Test Your Knowledge in Computers #540
Difficulty: Medium
In dynamically typed programming languages. instead of declaring a variable to have a particular type, the type of a variable is determined by an A.I. in the operating system.
True or False?

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Calculate percentage difference between two columns

I have a input text file in this format: ITEM1 10.9 20.1 ITEM2 11.6 12 ITEM3 14 15.7 ITEM5 20 50.6 ITEM6 25 23.6 I want to print those lines which have more than 5% difference between second and third columns. (8 Replies)
Discussion started by: ctrld
8 Replies

2. Shell Programming and Scripting

Calculate Percentage

Hello, Ive got a bunch of numbers here e.g: 6065 6094 6348 6297 6161 6377 6338 6290 How do I find out if there is a difference between 10% or more between one of these numbers ? I am trying to do this in Bash.. but no luck so far.. Does anyone have an Idea ?? Thanx, - Pascal... (9 Replies)
Discussion started by: denbekker
9 Replies

3. Shell Programming and Scripting

Need an awk script to calculate the percentage of value field and replace

Need an awk script to calculate the percentage of value field and replace I have a input file called file.txt with the following content: john|622.5674603562933|8|br:1;cn:3;fr:1;jp:1;us:2 andy|0.0|12|**:3;br:1;ca:2;de:2;dz:1;fr:2;nl:1 in fourth filed of input file, calulate percentage of each... (1 Reply)
Discussion started by: veeruasu
1 Replies

4. Shell Programming and Scripting

How to calculate what percentage of X value is there in the file?

Input File: 5081 2058 175 8282 2358 7347 6612 3459 END OF INPUT FILE I need to know how to calculate minimum,maximum,average of the values in the file and also what percentage is the values over some user defined value for example 1000 and what percentage of value is over 5000. By... (2 Replies)
Discussion started by: aroragaurav.84
2 Replies

5. Shell Programming and Scripting

Script shell, how to calculate percentage?

hello, please can you help me. jj and kk are two numbers which are the result of an sql program. I would like to calculate the ratio jj/kk*100. I have done this: ratio=$((jj/kk * 100)) or ratio=`expr $jj \/ expr $kk) but the result is 0 What can i do? Thanks for help. (3 Replies)
Discussion started by: likeaix
3 Replies

6. UNIX for Dummies Questions & Answers

Is it possible to extract rows with the same first column and then calculate its percentage?

A short excerpt of my .txt file looks like: CXRA3Z2J9MQKR B CXRA3Z2J9MQKR A CXRA3Z2J9MQKR C CXRA3Z2J9MQKR B A162JX4ML69UIC C A162JX4ML69UIC A FZ9Z19TI2XOA5 A FZ9Z19TI2XOA5 C FZ9Z19TI2XOA5 B FZ9Z19TI2XOA5 B BRNTTJUB8GXE9 A BRNTTJUB8GXE9 A ... (7 Replies)
Discussion started by: pxalpine
7 Replies

7. Shell Programming and Scripting

Need an AWK script to calculate the percentage

Hi I need a awk script to calculate percentage. I have to pass the pararmeters in to the awk script and calculate the percentage. Sum = 50 passed = 43 failed = 7 I need to pass these value in to the awk script and calculate the percentage. Please advice me. (8 Replies)
Discussion started by: bobprabhu
8 Replies

8. Shell Programming and Scripting

How can i calculate percentage ??

i have 3 files like total.dat=18 equal.dat=14 notequal.dat=16 i need find the equal percentange means: equalpercentage = ($equal.dat / $total.dat * 100) How i can do this ? I tried some of the answers to calculate the percentage in this forums.but it couldn't worked.Some one please... (6 Replies)
Discussion started by: bobprabhu
6 Replies

9. Shell Programming and Scripting

How to calculate the percentage for the values in column

Hi, I am having the file which contains the following two columns. 518 _factorial 256 _main 73 _atol 52 ___do_global_ctors 170 ___main 52 ___do_g How can calculate the percentage of each value in the first column ? first need to get the sum of the first column and... (3 Replies)
Discussion started by: saleru_raja
3 Replies

10. Programming

how do I calculate percentage ?

int percent (int a, int b) { if (b/a*100 > 25) return TRUE; else return FALSE; } I want to calculate what percentage of a is b. say if b = 48, a = 100 so b is 48% of a but wouldnt b/a give me 0 ??? what can be done ?? (6 Replies)
Discussion started by: the_learner
6 Replies

Featured Tech Videos