Optimize awk code

02-05-2016

Registered User

919, 3

Join Date: Dec 2006

Last Activity: 5 March 2020, 5:37 PM EST

Posts: 919

Thanks Given: 757

Thanked 3 Times in 3 Posts

Optimize awk code

sample data.file:

Code:

0,mfrh_green_screen,1454687485,383934,/PROD/G/cicsmrch/sys/unikixmain.log,37M,mfrh_green_screen,28961345,0,382962--383934
0,mfrh_green_screen,1454687785,386190,/PROD/G/cicsmrch/sys/unikixmain.log,37M,mfrh_green_screen,29139568,0,383934--386190
0,mfrh_green_screen,1452858644,-684,/PROD/G/cicsmrch/sys/unikixmain.log,111M,mfrh_green_screen,732502,732502,,111849151,0,731818
0,mfrh_green_screen,1452858944,-888,/PROD/G/cicsmrch/sys/unikixmain.log,111M,mfrh_green_screen,732707,732707,,111918753,0,731819

Code i'm running against this file:

Code:

VALFOUND=1454687485
SEARCHPATT='Thu Feb 04'
awk "/,${VALFOUND},/,0" data.file | gawk -F, '{A=strftime("%a %b %d %T %Y,%s",$3);{Q=1};if((Q)&&(NF == 13)){split($4, B,"-");print B[2] "-" $3 "_0""-" $4"----"A} else if ((Q)&&(NF == 10)) {split($NF, B,"--");print B[2]-B[1] "-" $3 "_" $10"----"A}}' | egrep "${SEARCHPATT}" | awk -F"----" '{print $1}'

data.file is about 7MB in size and can grow quite bigger than that. when i run the above command on it it, it takes about 6 seconds to complete. Anyway to bring that number down???

Last edited by SkySmart; 02-05-2016 at 04:18 PM..

SkySmart

View Public Profile for SkySmart

Find all posts by SkySmart

02-05-2016

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

I have to admit I can't resolve the logics of your pipe. But, almost sure, I can say that all that (time consuming) piping can be reduced to/done by one single awk command.
You start listing the lines at the epoch value 1454687485, and list down to the end-of-file. Later you grep for Thu Feb 04. Why don't you operate on the lines with $3 between 1454626800 and 1454713199? That would save the first awk, the egrep, and, as the output of A is no more needed, the last awk as well.
The (boolean) Q variable is redundant as well; it is set to 1 and never reset - so what's its meaning?

This User Gave Thanks to RudiC For This Post:

RudiC

View Public Profile for RudiC

Find all posts by RudiC

02-05-2016

Registered User

919, 3

Join Date: Dec 2006

Last Activity: 5 March 2020, 5:37 PM EST

Posts: 919

Thanks Given: 757

Thanked 3 Times in 3 Posts

Thanks RudiC. I took your suggestions into consideration and combined all those commands into one awk command. Thanks so much.

In doing the above, i discovered the code i originally pasted in this thread is not the reason why the script was slow. I found out that it is the for loop below that takes at least 4 seconds to complete.

can anyone help me optimize the below code?

Content of variable VALUESA:

Code:

VALUESA="1751-1451549113_0--1751
1445-1451549413_0--1445
1864-1451549713_0--1864
1410-1451550013_0--1410
655-1451550313_0--655
147-1451550613_0--147
209-1451550913_0--209
1472-1451551213_0--1472
1984-1451551513_0--1984
690-1451551813_0--690
652-1451552113_0--652
1161-1451552413_0--1161
1314-1451552713_0--1314
1030-1451553013_0--1030
428-1451553313_0--428
262-1451553613_0--262
95-1451553913_0--95"

The slow for loop:

Code:

                                        ZPROCC=$(
                                        for ALLF in $(echo ${VALUESA} | sort -r | xargs)
                                        do
                                                ALL=$(echo "${ALLF}" | gawk -F"-" '{print $1}') ; ZSCORE=$(gawk "BEGIN {if($STDEVIATE>0) {print (${ALL} - ${AVERAGE}) / ${STDEVIATE}} else {print 0}}")
                                                EPTIME=$(echo "${ALLF}" | gawk -F"-" '{print $2}' | awk -F"_" '{print $1}')
                                                FIXED=$(gawk -v c="perl -le 'print scalar(localtime("${EPTIME}"))'" 'BEGIN{c|getline; close(c); print $0;}')
                                                ACSCORE=$(echo ${FIXED} ${EPTIME} | gawk '{print "["$2"-"$3"-""("$4")""-"$5"]"}')
                                                echo "frq=${ALL},std=${ZSCORE},time=${ACSCORE},epoch=${EPTIME},avg=${AVERAGE}"
                                        done)

SkySmart

View Public Profile for SkySmart

Find all posts by SkySmart

02-06-2016

Moderator

12,296, 3,792

Join Date: Nov 2008

Last Activity: 1 January 2021, 1:47 AM EST

Location: Amsterdam

Posts: 12,296

Thanks Given: 679

Thanked 3,792 Times in 3,282 Posts

Hi, you did not specify the shell, since you are using GNU utilities, I presumed it to be bash, this would functionally be these equivalent, but should be a bit more efficient:

Code:

ZPROCC=$(
  while read ALLF 
  do
    IFS=_- read ALL EPTIME x <<< "$ALLF"
    ZSCORE=$(( STDEVIATE>0 ? ( ALL - AVERAGE ) / STDEVIATE : 0 ))
    read x mon day time year x <<< $(perl -le "print scalar(localtime($EPTIME))")
    ACSCORE="[$mon-$day-($time)-$year]"
    echo "frq=${ALL},std=${ZSCORE},time=${ACSCORE},epoch=${EPTIME},avg=${AVERAGE}"
  done <<< "$VALUESA"
)

It is unsorted, since
$(echo ${VALUESA} | sort -r | xargs) produces the same output as ${VALUESA}

So, as is, it could be further reduced to:

Code:

ZPROCC=$(
  while IFS=_- read ALL EPTIME x
  do
    ZSCORE=$(( STDEVIATE>0 ? ( ALL - AVERAGE ) / STDEVIATE : 0 ))
    read x mon day time year x <<< $(perl -le "print scalar(localtime($EPTIME))")
    ACSCORE="[$mon-$day-($time)-$year]"
    echo "frq=${ALL},std=${ZSCORE},time=${ACSCORE},epoch=${EPTIME},avg=${AVERAGE}"
  done <<< "$VALUESA"
)

Which leaves one external call to perl per iteration. To eliminate that one as well the whole loop would need to be eliminated in favor of -for example- one awk or perl program...

I don't know where AVERAGE and STDEVIATE are determined ? Is that is n a similar loop, if so I suspect similar gains could be made there?

---edit---

This would be a gawk equivalent:

Code:

ZPROCC=$(
  gawk -F'[_-]' -v av="$AVERAGE" -v sd="$STDEVIATE" '
    {
      zscore=(sd>0) ? ($1-av)/sd : 0
      acscore=strftime("%b-%e-(%H:%M:%S)-%Y",$2)
      printf "frq=%s,std=%s,time=%s,epoch=%s,avg=%s\n", $1, zscore, acscore, $2, av
    }
  ' <<< "$VALUESA"
)

Last edited by Scrutinizer; 02-06-2016 at 04:50 AM..

This User Gave Thanks to Scrutinizer For This Post:

Scrutinizer

View Public Profile for Scrutinizer

Find all posts by Scrutinizer

02-06-2016

Registered User

919, 3

Join Date: Dec 2006

Last Activity: 5 March 2020, 5:37 PM EST

Posts: 919

Thanks Given: 757

Thanked 3 Times in 3 Posts

Quote:

Originally Posted by Scrutinizer

Hi, you did not specify the shell, since you are using GNU utilities, I presumed it to be bash, this would functionally be these equivalent, but should be a bit more efficient:

Code:

ZPROCC=$(
  while read ALLF 
  do
    IFS=_- read ALL EPTIME x <<< "$ALLF"
    ZSCORE=$(( STDEVIATE>0 ? ( ALL - AVERAGE ) / STDEVIATE : 0 ))
    read x mon day time year x <<< $(perl -le "print scalar(localtime($EPTIME))")
    ACSCORE="[$mon-$day-($time)-$year]"
    echo "frq=${ALL},std=${ZSCORE},time=${ACSCORE},epoch=${EPTIME},avg=${AVERAGE}"
  done <<< "$VALUESA"
)

It is unsorted, since
$(echo ${VALUESA} | sort -r | xargs) produces the same output as ${VALUESA}

So, as is, it could be further reduced to:

Code:

ZPROCC=$(
  while IFS=_- read ALL EPTIME x
  do
    ZSCORE=$(( STDEVIATE>0 ? ( ALL - AVERAGE ) / STDEVIATE : 0 ))
    read x mon day time year x <<< $(perl -le "print scalar(localtime($EPTIME))")
    ACSCORE="[$mon-$day-($time)-$year]"
    echo "frq=${ALL},std=${ZSCORE},time=${ACSCORE},epoch=${EPTIME},avg=${AVERAGE}"
  done <<< "$VALUESA"
)

Code:

ZPROCC=$(
  gawk -F'[_-]' -v av="$AVERAGE" -v sd="$STDEVIATE" '
    {
      zscore=(sd>0) ? ($1-av)/sd : 0
      acscore=strftime("%b-%e-(%H:%M:%S)-%Y",$2)
      printf "frq=%s,std=%s,time=%s,epoch=%s,avg=%s\n", $1, zscore, acscore, $2, av
    }
  ' <<< "$VALUESA"
)

thanks so much. sorry for not specifying the shell. i intend to run this on a number of unix systems, some of which have old OSes...i.e. HP-UX, AIX, ubuntu, centos.

i'm afraid some of the bash commands wont work on the older systems.

the shell i'm using is "/bin/sh" for older systems. and "/bin/dash" for newer ones. so i suppose your modifications would most likely work for the newer systems.

SkySmart

View Public Profile for SkySmart

Find all posts by SkySmart

02-06-2016

Moderator

12,296, 3,792

Join Date: Nov 2008

Last Activity: 1 January 2021, 1:47 AM EST

Location: Amsterdam

Posts: 12,296

Thanks Given: 679

Thanked 3,792 Times in 3,282 Posts

You're welcome...

Alright, try:

Code:

ZPROCC=$(
  while IFS=_- read ALL EPTIME x
  do
    ZSCORE=$(( STDEVIATE>0 ? ( ALL - AVERAGE ) / STDEVIATE : 0 ))
    read x mon day time year x << EOF
      $(perl -le "print scalar(localtime($EPTIME))")
EOF
    ACSCORE="[$mon-$day-($time)-$year]"
    echo "frq=${ALL},std=${ZSCORE},time=${ACSCORE},epoch=${EPTIME},avg=${AVERAGE}"
  done << EOF
$VALUESA
EOF
)

A yet faster solution would be all perl code in this case..

This User Gave Thanks to Scrutinizer For This Post:

Scrutinizer

View Public Profile for Scrutinizer

Find all posts by Scrutinizer

02-06-2016

Registered User

919, 3

Join Date: Dec 2006

Last Activity: 5 March 2020, 5:37 PM EST

Posts: 919

Thanks Given: 757

Thanked 3 Times in 3 Posts

Quote:

Originally Posted by Scrutinizer

You're welcome...

Alright, try:

Code:

ZPROCC=$(
  while IFS=_- read ALL EPTIME x
  do
    ZSCORE=$(( STDEVIATE>0 ? ( ALL - AVERAGE ) / STDEVIATE : 0 ))
    read x mon day time year x << EOF
      $(perl -le "print scalar(localtime($EPTIME))")
EOF
    ACSCORE="[$mon-$day-($time)-$year]"
    echo "frq=${ALL},std=${ZSCORE},time=${ACSCORE},epoch=${EPTIME},avg=${AVERAGE}"
  done << EOF
$VALUESA
EOF
)

A yet faster solution would be all perl code in this case..

it seems this solution doesn't do well when the numbers contain decimals.

error i received:

Code:

STDEVIATE>0 ? ( ALL - AVERAGE ) / STDEVIATE : 0 : 0403-057 Syntax error

SkySmart

View Public Profile for SkySmart

Find all posts by SkySmart

Shell Programming and Scripting

Optimize awk code

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Optimize multiple awk variable assignment

Discussion started by: SkySmart

2. Shell Programming and Scripting

Looking to optimize code

Discussion started by: Junaid Subhani

3. Shell Programming and Scripting

Optimize my mv script

Discussion started by: whegra

4. Shell Programming and Scripting

Optimize awk command

Discussion started by: SkySmart

5. Shell Programming and Scripting

Can someone please help me optimize my code (script searches subdirectories)?

Discussion started by: jl487

6. Shell Programming and Scripting

Optimize the nested IF

Discussion started by: machomaddy

7. Shell Programming and Scripting

pl help me to Optimize the given code

Discussion started by: pk_arun

8. Shell Programming and Scripting

Optimize shell code

Discussion started by: sandy1028

9. Shell Programming and Scripting

can we optimize this command

Discussion started by: crackthehit007

10. Shell Programming and Scripting

optimize the script

Discussion started by: amitrajvarma