Help to optimize script running time

 
Thread Tools Search this Thread
Homework and Emergencies Emergency UNIX and Linux Support Help to optimize script running time
# 1  
Old 11-04-2010
Help to optimize script running time

Dear Forum experts

I have the below script which I made to run under bash shell, it runs perfectly for low records number, let us say like 100000. when I put all records (3,000,000), it's takes hours

can you please suggest anything to optimize or to run in different way :-|

Code:
{OFS="|"; FS=";"; n=split("ZainEazy;Ezlink;EzlinkDuo;ZainThawani;Day;Night;Zain_Super;Zony;BlineGovernment;Army;Bline;ZainS7abak;ZainCallAsia;FlateRate;ZainF
urat;Ziyarah;SubDealers;Zain5;Zain5NOSuperNO;Zain5Disc;ZainElKul;GelnaZBonus;NonOfficialBLGvD;Visitors;StaffLine;OfficialBlineGov;Mo7afazat;Zain5Xtra;Zain5Xt
raNoSupNo;Zain5XtraDisc;Zain500;Zain500NOSuperNO;Zain500Disc;ZainNile;Jaishana;Aqaba;Ayla;ZainQuattro", arr,";")}

{

if (substr($2,7, 1)%2==0){
Subs_IN4[$1]+=1;
SubStr_IN4[$2]=2;
for ( i=57 ; i<= NF; i++ ) {SubStr_IN4[$2]=SubStr_IN4[$2]";"$i};

if (SubStr_IN4[$2] ~ /FnF_1/){ split($34,a,"|");
for ( i=0 ; i<= 9; i++ )
{if (a[i] ~ /00/){ FnFGroup1_IN4[$1]=FnFGroup1[$1]+1}}}else {FnFGroup1[$1]="NA"};

if (SubStr_IN4[$2] ~ /FnF_2/){ split($37,b,"|");
for ( i=0 ; i<= 9; i++ )
{if (b[i] ~ /00/){ FnFGroup2_IN4[$1]=FnFGroup2[$1]+1}}}else {FnFGroup2[$1]="NA"};

if (SubStr_IN4[$2] ~ /FnF_3/){ split($39,c,"|");
for ( i=0 ; i<= 9; i++ )
{if (c[i] ~ /00/){ FnFGroup3_IN4[$1]=FnFGroup3[$1]+1}}}else {FnFGroup3[$1]="NA"};


if($3 > 0 && $18=="TRUE"){POS_IN4[$1]+=1; SOLD_IN4[$1]+=1}
   else{if($3 == 0 && $18=="TRUE"){ZERO_IN4[$1]+=1; SOLD_IN4[$1]+=1}
            else{if($18=="FALSE"){NOTSOLD_IN4[$1]+=1}}}

        if ($18=="TRUE" && $7=="Active"){ACTIVE_IN4[$1]+=1}
        else{if ($18=="TRUE" && $7=="IncomingCallsOnly"){GRACE_IN4[$1]+=1}
        else{if ($18=="TRUE" && $7=="RechargeOnly"){RECHARGE_IN4[$1]+=1}
                else{if ($18=="TRUE" && $7=="Transient"){TRANSIET_IN4[$1]+=1}}}}

        if ($7=="Active"){A_M_IN4[$1]+=$3; A_2ND_IN4[$1]+=$9; A_3RD_IN4[$1]+=$11;A_4TH_IN4[$1]+=$13}
        if ($7=="IncomingCallsOnly"){E_M_IN4[$1]+=$3; E_2ND_IN4[$1]+=$9; E_3RD_IN4[$1]+=$11;E_4TH_IN4[$1]+=$13}

        if ($9 > 0  ){Second_IN4[$1]+=1}
                                if ($11 > 0  ){Third_IN4[$1]+=1}
                                if ($13 > 0 ){Fourth_IN4[$1]+=1}
}

print arr[i]"_IN4", Subs_IN4[arr[i]], POS_IN4[arr[i]], ZERO_IN4[arr[i]], SOLD_IN4[arr[i]], NOTSOLD_IN4[arr[i]], TRANSIET_IN4[arr[i]], ACTIVE_IN4[arr[i]], GRA
CE_IN4[arr[i]], RECHARGE_IN4[arr[i]], Second_IN4[arr[i]],Third_IN4[arr[i]],Fourth_IN4[arr[i]], A_M_IN4[arr[i]], A_2ND_IN4[arr[i]], A_3RD_IN4[arr[i]], A_4TH_I
N4[arr[i]], E_M_IN4[arr[i]], E_2ND_IN4[arr[i]], E_3RD_IN4[arr[i]], E_4TH_IN4[arr[i]], FnFGroup1_IN4[arr[i]], FnFGroup2_IN4[arr[i]], FnFGroup3_IN4[arr[i]]

print arr[i]"_IN5", Subs_IN5[arr[i]], POS_IN5[arr[i]], ZERO_IN5[arr[i]], SOLD_IN5[arr[i]], NOTSOLD_IN5[arr[i]], TRANSIET_IN5[arr[i]], ACTIVE_IN5[arr[i]], GRA
CE_IN5[arr[i]], RECHARGE_IN5[arr[i]], Second_IN5[arr[i]],Third_IN5[arr[i]],Fourth_IN5[arr[i]], A_M_IN5[arr[i]], A_2ND_IN5[arr[i]], A_3RD_IN5[arr[i]], A_4TH_I
N5[arr[i]], E_M_IN5[arr[i]], E_2ND_IN5[arr[i]], E_3RD_IN5[arr[i]], E_4TH_IN5[arr[i]], FnFGroup1_IN5[arr[i]], FnFGroup2_IN5[arr[i]], FnFGroup3_IN5[arr[i]]

print arr[i]"_TOTAL", Subs[arr_IN4[i]]+Subs[arr_IN5[i]], POS_IN4[arr[i]]+POS_IN5[arr[i]], ZERO_IN4[arr[i]]+ZERO_IN5[arr[i]], SOLD_IN4[arr[i]]+SOLD_IN5[arr[i]
], NOTSOLD_IN4[arr[i]]+NOTSOLD_IN5[arr[i]], TRANSIET_IN4[arr[i]]+TRANSIET_IN5[arr[i]], ACTIVE_IN4[arr[i]]+ACTIVE_IN5[arr[i]], GRACE_IN4[arr[i]]+GRACE_IN5[arr
[i]], RECHARGE_IN4[arr[i]]+RECHARGE_IN5[arr[i]], Second_IN4[arr[i]]+Second_IN5[arr[i]],Third_IN4[arr[i]]+Third_IN5[arr[i]],Fourth_IN4[arr[i]]+Fourth_IN5[arr[
i]], A_M_IN4[arr[i]]+A_M_IN5[arr[i]], A_2ND_IN4[arr[i]]+A_2ND_IN5[arr[i]], A_3RD_IN4[arr[i]]+A_3RD_IN5[arr[i]], A_4TH_IN4[arr[i]]+A_4TH_IN5[arr[i]], E_M_IN4[
arr[i]]+E_M_IN5[arr[i]], E_2ND_IN4[arr[i]]+E_2ND_IN5[arr[i]], E_3RD_IN4[arr[i]]+E_3RD_IN5[arr[i]], E_4TH_IN4[arr[i]]+E_4TH_IN5[arr[i]], FnFGroup1_IN4[arr[i]]
+FnFGroup1_IN5[arr[i]], FnFGroup2_IN4[arr[i]]+FnFGroup2_IN5[arr[i]], FnFGroup3_IN4[arr[i]]+FnFGroup3_IN5[arr[i]]
}


Last edited by pludi; 11-04-2010 at 08:51 PM..
# 2  
Old 11-04-2010
This is very hard to read. I assume it is an awk script. There is a lot of what seems to be repeated logic.

If you give us a few lines of sample:
input
expected output

We can probably help more effectively
# 3  
Old 11-04-2010
Thank you Jim

Input data sample:

Zain500Disc;46464564;560;;0;0;Active;2011-02-04 22:59:00;0;1970-01-01 00:00:00;0;1970-01-01 00:00:00;0;1970-01-01 00:00:00;1970-01-01 00:00:00;2011-03-06 22:59:00;2011-06-05 22:59:00;TRUE;FALSE;0;0;0000;FALSE;TRUE;false;FALSE;0;true;0;true;true;TRUE;false;{00962795901649|00 962796371949|00962796859686|00962795293754|00962796859676};0;TRUE;{00963966107669};0;TRUE;{};0;TRUE; ;false;TRUE;FALSE;FALSE;2;FALSE;FALSE;;0;0;0;0;0;2;(FnF_1,-108,01.01.2025 23:59:59:999);(FnF_2,-108,01.01.2025 23:59:59:999);

ZainEazy;4646464;1;2000;0;0;Active;2016-09-10 22:59:00;0;2006-12-11 22:59:00;0;2009-05-26 22:59:00;0;1970-01-01 00:00:00;1970-01-01 00:00:00;2016-10-10 22:59:00;2017-01-09 22:59:00;TRUE;FALSE;0;0;0000;FALSE;TRUE;false;FALSE;0;true;0;true;true;TRUE;false;{};2;TRUE;{};0;TRU E;{};0;TRUE;;false;TRUE;FALSE;FALSE;2;FALSE;FALSE;;0;0;0;0;0;2;(FnF_1,-108,01.01.2025 23:59:59:999);

Jaishana;34535353;2776;;0;0;Active;2011-04-23 23:59:59;0;2006-03-07 23:59:00;0;2010-05-31 23:59:00;0;1970-01-01 00:00:00;1970-01-01 00:00:00;2011-05-23 23:59:59;2011-08-22 23:59:59;TRUE;FALSE;0;0;0000;FALSE;TRUE;false;FALSE;0;true;0;true;true;TRUE;false;{};2;TRUE;{};0;TRU E;{};0;TRUE;102;TRUE;TRUE;FALSE;FALSE;2;FALSE;FALSE;;0;0;0;0;0;2;(FnF_1,-108,01.01.2025 23:59:59:999);(CUG,-128,01.01.2025 00:00:00:000);

ZainQuattro;43534535;6406;4000;0;0;Active;2011-01-14 22:59:00;0;1970-01-01 00:00:00;0;2010-04-26 23:59:00;0;1970-01-01 00:00:00;1970-01-01 00:00:00;2011-02-13 22:59:00;2011-05-15 22:59:00;TRUE;FALSE;0;0;0000;FALSE;TRUE;false;FALSE;0;true;0;true;true;TRUE;false;{00962795500047|00 962795600207|00962799106309|00962795782960};0;TRUE;{00963941950270|00963947278825|00963966531175};0; TRUE;{};0;TRUE;;false;TRUE;FALSE;FALSE;2;FALSE;FALSE;;0;0;0;0;0;2;(FnF_1,-108,01.01.2025 23:59:59:999);(FnF_2,-108,01.01.2025 23:59:59:999);


This is a sample of input date, the program simply group lines and count them based on field number 1... field 1 is used as index for all arrays used..
it's in awk.... but I dont know why it's very slow, even it's fast somehow for low records !!
# 4  
Old 11-04-2010
With an awk script that long it's hard to tell what you're doing.

If all you really want to do is count uses of the first column:

Code:
BEGIN { FS=";" }

{        if(length($0) > 0)
                count[$1]++;
}

END {
        for(keys in count)
                print keys ":" count[keys];
}

For your input data, this prints:
Code:
Zain500Disc:1
ZainEazy:1
ZainQuattro:1
Jaishana:1

---------- Post updated at 08:09 PM ---------- Previous update was at 07:48 PM ----------

It's hard to "optimize" huge amounts of logic since the slowdown may not be in one important place but in the logic itself. "optimizing" it means pretty much replacing it. Here I would use sort:

Code:
BEGIN { FS=";" ; count=1 ; cur=""; }

{
        if(length($0) > 0)
        {
                if(cur == $1)   count++;
                else
                {
                        if(length(cur) > 0)
                                print cur " had " count "\n";

                        cur=$1;         count=1;
                }

                print $0;
        }
}

END {   if(length(cur) > 0)     print cur " had " count "\n";   }

Code:
sort < input | awk -f count.awk

That way, you get your records already grouped and just have to count when things change.
# 5  
Old 11-04-2010
I would go with Corona's approach. For only 300000 records the array example is great. (the first code bit)
We do that everyday with 1M record files. Takes 30 seconds on a Solaris v445.
# 6  
Old 11-05-2010
Don't know what you want to achieve but it looks like commands like the following could help a bit
Code:
awk -F";" '{print$1}' input | sort | uniq -c

Code:
sort -t ";" -k 1 input

# 7  
Old 11-05-2010
This script does not make a whole lot of sense. Array arr is initialized everytime, but this could be done in a begin statement. Yet it is only referenced as arr[i], but i never gets set explicitly, sometimes by accident it assumes the value of 10 because it is used as a counter in a for loop that does or does not get run depending on if conditions, so then arr[10] gets called which produces army
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Killing the process if running for long time in script

I am running a script which will read the data from fail line by line and call the Java program by providing the arguments from the each line. The Java code is working fast for few records and for some records its getting hanged not providing response for morethan one hour. Currently am... (4 Replies)
Discussion started by: dineshaila
4 Replies

2. Shell Programming and Scripting

Kill long running script, if it crosses the threshold time

Hi, I need a script to kill the process if it running for long time. Inputs for the scripts: 1.test.sh (will be running fron cron scheduler) 2.1 hr (ie threshold_time - if the test.sh is running for more than 1 hr test.sh has to kill) Thanks, Divya (1 Reply)
Discussion started by: Divya Nochiyil
1 Replies

3. Shell Programming and Scripting

Example of running script with time limits?

Hi, I want to write a script that does some sort of health check on the database. It will query the database for information, some query takes long and some are quick. For example, inside the script I will do something as below: #!/bin/ksh run_query_01 & run_query_02 &... (1 Reply)
Discussion started by: newbie_01
1 Replies

4. Shell Programming and Scripting

Setting time for running of the script

Dear all, I wonder if it is possible that we can run the script from time to time..I meant, it should repeat the sourcing of the script by itself? In my case, I need to source this script manually from time to time, like once in every 10 minutes. emily, (2 Replies)
Discussion started by: emily
2 Replies

5. Shell Programming and Scripting

Find the script running time and subtract from sleeptime

HI Guys, I want to find out the script running time and subtract from sleeptime. My Script Below Give me error :- #!/usr/bin/ksh timeout=100 start=$SECONDS sleep 20 end=$SECONDS echo "Time: $((end - start)) " ScTime = $((end - start)) (1 Reply)
Discussion started by: asavaliya
1 Replies

6. Shell Programming and Scripting

How to know the exact running time of script!

Hi All, newbie here, I'm just wondering how can i know the exact running time of my script? Please advise, THanks, (1 Reply)
Discussion started by: nikki1200
1 Replies

7. Shell Programming and Scripting

Help in running a script after a particular time

Unix Gurus, I have a requirement where the shell script needs to do specific tasks after certain period of time. Daily we receive few files in a particular folder. The script does the file renaming, pass parameters to run some web services and pushes to remote FTP location. But my... (3 Replies)
Discussion started by: shankar1dada
3 Replies

8. Shell Programming and Scripting

Running batches of files at a time from a script

Hi I have a script that performs a process on a file. I want to know how to include a function to run a batch of files? Here is my script #!/bin/bash #---------------------------------------------------------------------------------------------------------------------- #This... (2 Replies)
Discussion started by: ladyAnne
2 Replies

9. Shell Programming and Scripting

display time required to complete running script

hi is there any way i can display a countdown time needed to run a script? like load a counter at the beginning of the script with the estimated time and display the counter decrementing till it finishes running the script? (3 Replies)
Discussion started by: npatwardhan
3 Replies

10. Shell Programming and Scripting

to compare latest logfile with the current running time of the script

how can i compare the latest log file with the current time.. consider i am running a script "a.sh" at 09:00 ( function of the script a.sh is to update the database ) this script is going to create logfile if the script is sucess in case of failure it is not going to create logfile.. ... (0 Replies)
Discussion started by: mail2sant
0 Replies
Login or Register to Ask a Question