Optimize awk code


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Optimize awk code
# 1  
Old 02-05-2016
Optimize awk code

sample data.file:

Code:
0,mfrh_green_screen,1454687485,383934,/PROD/G/cicsmrch/sys/unikixmain.log,37M,mfrh_green_screen,28961345,0,382962--383934
0,mfrh_green_screen,1454687785,386190,/PROD/G/cicsmrch/sys/unikixmain.log,37M,mfrh_green_screen,29139568,0,383934--386190
0,mfrh_green_screen,1452858644,-684,/PROD/G/cicsmrch/sys/unikixmain.log,111M,mfrh_green_screen,732502,732502,,111849151,0,731818
0,mfrh_green_screen,1452858944,-888,/PROD/G/cicsmrch/sys/unikixmain.log,111M,mfrh_green_screen,732707,732707,,111918753,0,731819


Code i'm running against this file:

Code:
VALFOUND=1454687485
SEARCHPATT='Thu Feb 04'
awk "/,${VALFOUND},/,0" data.file | gawk -F, '{A=strftime("%a %b %d %T %Y,%s",$3);{Q=1};if((Q)&&(NF == 13)){split($4, B,"-");print B[2] "-" $3 "_0""-" $4"----"A} else if ((Q)&&(NF == 10)) {split($NF, B,"--");print B[2]-B[1] "-" $3 "_" $10"----"A}}' | egrep "${SEARCHPATT}" | awk -F"----" '{print $1}'

data.file is about 7MB in size and can grow quite bigger than that. when i run the above command on it it, it takes about 6 seconds to complete. Anyway to bring that number down???

Last edited by SkySmart; 02-05-2016 at 04:18 PM..
# 2  
Old 02-05-2016
I have to admit I can't resolve the logics of your pipe. But, almost sure, I can say that all that (time consuming) piping can be reduced to/done by one single awk command.
You start listing the lines at the epoch value 1454687485, and list down to the end-of-file. Later you grep for Thu Feb 04. Why don't you operate on the lines with $3 between 1454626800 and 1454713199? That would save the first awk, the egrep, and, as the output of A is no more needed, the last awk as well.
The (boolean) Q variable is redundant as well; it is set to 1 and never reset - so what's its meaning?
This User Gave Thanks to RudiC For This Post:
# 3  
Old 02-05-2016
Thanks RudiC. I took your suggestions into consideration and combined all those commands into one awk command. Thanks so much.

In doing the above, i discovered the code i originally pasted in this thread is not the reason why the script was slow. I found out that it is the for loop below that takes at least 4 seconds to complete.

can anyone help me optimize the below code?

Content of variable VALUESA:

Code:
VALUESA="1751-1451549113_0--1751
1445-1451549413_0--1445
1864-1451549713_0--1864
1410-1451550013_0--1410
655-1451550313_0--655
147-1451550613_0--147
209-1451550913_0--209
1472-1451551213_0--1472
1984-1451551513_0--1984
690-1451551813_0--690
652-1451552113_0--652
1161-1451552413_0--1161
1314-1451552713_0--1314
1030-1451553013_0--1030
428-1451553313_0--428
262-1451553613_0--262
95-1451553913_0--95"

The slow for loop:

Code:
                                        ZPROCC=$(
                                        for ALLF in $(echo ${VALUESA} | sort -r | xargs)
                                        do
                                                ALL=$(echo "${ALLF}" | gawk -F"-" '{print $1}') ; ZSCORE=$(gawk "BEGIN {if($STDEVIATE>0) {print (${ALL} - ${AVERAGE}) / ${STDEVIATE}} else {print 0}}")
                                                EPTIME=$(echo "${ALLF}" | gawk -F"-" '{print $2}' | awk -F"_" '{print $1}')
                                                FIXED=$(gawk -v c="perl -le 'print scalar(localtime("${EPTIME}"))'" 'BEGIN{c|getline; close(c); print $0;}')
                                                ACSCORE=$(echo ${FIXED} ${EPTIME} | gawk '{print "["$2"-"$3"-""("$4")""-"$5"]"}')
                                                echo "frq=${ALL},std=${ZSCORE},time=${ACSCORE},epoch=${EPTIME},avg=${AVERAGE}"
                                        done)

# 4  
Old 02-06-2016
Hi, you did not specify the shell, since you are using GNU utilities, I presumed it to be bash, this would functionally be these equivalent, but should be a bit more efficient:

Code:
ZPROCC=$(
  while read ALLF 
  do
    IFS=_- read ALL EPTIME x <<< "$ALLF"
    ZSCORE=$(( STDEVIATE>0 ? ( ALL - AVERAGE ) / STDEVIATE : 0 ))
    read x mon day time year x <<< $(perl -le "print scalar(localtime($EPTIME))")
    ACSCORE="[$mon-$day-($time)-$year]"
    echo "frq=${ALL},std=${ZSCORE},time=${ACSCORE},epoch=${EPTIME},avg=${AVERAGE}"
  done <<< "$VALUESA"
)

It is unsorted, since
$(echo ${VALUESA} | sort -r | xargs) produces the same output as ${VALUESA}

So, as is, it could be further reduced to:
Code:
ZPROCC=$(
  while IFS=_- read ALL EPTIME x
  do
    ZSCORE=$(( STDEVIATE>0 ? ( ALL - AVERAGE ) / STDEVIATE : 0 ))
    read x mon day time year x <<< $(perl -le "print scalar(localtime($EPTIME))")
    ACSCORE="[$mon-$day-($time)-$year]"
    echo "frq=${ALL},std=${ZSCORE},time=${ACSCORE},epoch=${EPTIME},avg=${AVERAGE}"
  done <<< "$VALUESA"
)

Which leaves one external call to perl per iteration. To eliminate that one as well the whole loop would need to be eliminated in favor of -for example- one awk or perl program...

I don't know where AVERAGE and STDEVIATE are determined ? Is that is n a similar loop, if so I suspect similar gains could be made there?

---edit---

This would be a gawk equivalent:

Code:
ZPROCC=$(
  gawk -F'[_-]' -v av="$AVERAGE" -v sd="$STDEVIATE" '
    {
      zscore=(sd>0) ? ($1-av)/sd : 0
      acscore=strftime("%b-%e-(%H:%M:%S)-%Y",$2)
      printf "frq=%s,std=%s,time=%s,epoch=%s,avg=%s\n", $1, zscore, acscore, $2, av
    }
  ' <<< "$VALUESA"
)


Last edited by Scrutinizer; 02-06-2016 at 04:50 AM..
This User Gave Thanks to Scrutinizer For This Post:
# 5  
Old 02-06-2016
Quote:
Originally Posted by Scrutinizer
Hi, you did not specify the shell, since you are using GNU utilities, I presumed it to be bash, this would functionally be these equivalent, but should be a bit more efficient:

Code:
ZPROCC=$(
  while read ALLF 
  do
    IFS=_- read ALL EPTIME x <<< "$ALLF"
    ZSCORE=$(( STDEVIATE>0 ? ( ALL - AVERAGE ) / STDEVIATE : 0 ))
    read x mon day time year x <<< $(perl -le "print scalar(localtime($EPTIME))")
    ACSCORE="[$mon-$day-($time)-$year]"
    echo "frq=${ALL},std=${ZSCORE},time=${ACSCORE},epoch=${EPTIME},avg=${AVERAGE}"
  done <<< "$VALUESA"
)

It is unsorted, since
$(echo ${VALUESA} | sort -r | xargs) produces the same output as ${VALUESA}

So, as is, it could be further reduced to:
Code:
ZPROCC=$(
  while IFS=_- read ALL EPTIME x
  do
    ZSCORE=$(( STDEVIATE>0 ? ( ALL - AVERAGE ) / STDEVIATE : 0 ))
    read x mon day time year x <<< $(perl -le "print scalar(localtime($EPTIME))")
    ACSCORE="[$mon-$day-($time)-$year]"
    echo "frq=${ALL},std=${ZSCORE},time=${ACSCORE},epoch=${EPTIME},avg=${AVERAGE}"
  done <<< "$VALUESA"
)

Which leaves one external call to perl per iteration. To eliminate that one as well the whole loop would need to be eliminated in favor of -for example- one awk or perl program...

I don't know where AVERAGE and STDEVIATE are determined ? Is that is n a similar loop, if so I suspect similar gains could be made there?

---edit---

This would be a gawk equivalent:

Code:
ZPROCC=$(
  gawk -F'[_-]' -v av="$AVERAGE" -v sd="$STDEVIATE" '
    {
      zscore=(sd>0) ? ($1-av)/sd : 0
      acscore=strftime("%b-%e-(%H:%M:%S)-%Y",$2)
      printf "frq=%s,std=%s,time=%s,epoch=%s,avg=%s\n", $1, zscore, acscore, $2, av
    }
  ' <<< "$VALUESA"
)

thanks so much. sorry for not specifying the shell. i intend to run this on a number of unix systems, some of which have old OSes...i.e. HP-UX, AIX, ubuntu, centos.

i'm afraid some of the bash commands wont work on the older systems.

the shell i'm using is "/bin/sh" for older systems. and "/bin/dash" for newer ones. so i suppose your modifications would most likely work for the newer systems.
# 6  
Old 02-06-2016
You're welcome...

Alright, try:
Code:
ZPROCC=$(
  while IFS=_- read ALL EPTIME x
  do
    ZSCORE=$(( STDEVIATE>0 ? ( ALL - AVERAGE ) / STDEVIATE : 0 ))
    read x mon day time year x << EOF
      $(perl -le "print scalar(localtime($EPTIME))")
EOF
    ACSCORE="[$mon-$day-($time)-$year]"
    echo "frq=${ALL},std=${ZSCORE},time=${ACSCORE},epoch=${EPTIME},avg=${AVERAGE}"
  done << EOF
$VALUESA
EOF
)

A yet faster solution would be all perl code in this case..
This User Gave Thanks to Scrutinizer For This Post:
# 7  
Old 02-06-2016
Quote:
Originally Posted by Scrutinizer
You're welcome...

Alright, try:
Code:
ZPROCC=$(
  while IFS=_- read ALL EPTIME x
  do
    ZSCORE=$(( STDEVIATE>0 ? ( ALL - AVERAGE ) / STDEVIATE : 0 ))
    read x mon day time year x << EOF
      $(perl -le "print scalar(localtime($EPTIME))")
EOF
    ACSCORE="[$mon-$day-($time)-$year]"
    echo "frq=${ALL},std=${ZSCORE},time=${ACSCORE},epoch=${EPTIME},avg=${AVERAGE}"
  done << EOF
$VALUESA
EOF
)

A yet faster solution would be all perl code in this case..
it seems this solution doesn't do well when the numbers contain decimals.

error i received:

Code:
STDEVIATE>0 ? ( ALL - AVERAGE ) / STDEVIATE : 0 : 0403-057 Syntax error

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Optimize multiple awk variable assignment

how can i optimize the following: TOTALRESULT="total1=4 total2=9 total3=89 TMEMORY=1999" TOTAL1=$(echo "${TOTALRESULT}" | egrep "total1=" | awk -F"=" '{print $NF}') TOTAL2=$(echo "${TOTALRESULT}" | egrep "total2=" | awk -F"=" '{print $NF}') TOTAL3=$(echo... (4 Replies)
Discussion started by: SkySmart
4 Replies

2. Shell Programming and Scripting

Looking to optimize code

Hi guys, I feel a bit comfortable now doing bash scripting but I am worried that the way I do it is not optimized and I can do much better as to how I code. e.g. I have a whole line in a file from which I want to extract some values. Right now what I am doing is : STATE=`cat... (5 Replies)
Discussion started by: Junaid Subhani
5 Replies

3. Shell Programming and Scripting

Optimize my mv script

Hello, I'm wondering if there is a quicker way of doing this. Here is my mv script. d=/conversion/program/out cd $d ls $d > /home/tempuser/$$tmp while read line ; do a=`echo $line|cut -c1-5|sed "s/_//g"` b=`echo $line|cut -c16-21` if ;then mkdir... (13 Replies)
Discussion started by: whegra
13 Replies

4. Shell Programming and Scripting

Optimize awk command

WARNING=${1} CRITICAL=${2} echo ${OUTPUT} | gawk -F'' ' { V = $2 R = $0 } END { for ( i = 1; i <= n; i++) { if((V > 0) && (V < V)) print R, ((V - V) / V) * 100 else if ((V > V) && (V > 0)) ... (6 Replies)
Discussion started by: SkySmart
6 Replies

5. Shell Programming and Scripting

Can someone please help me optimize my code (script searches subdirectories)?

Here is my code. What it does is it reads an input file (input.txt which contains roughly 2,000 search phrases) and searches a directory for files that contains the search phrase. The directory contains roughly 1900 files and 84 subdirectories. The output is a file (output.txt) that shows only the... (23 Replies)
Discussion started by: jl487
23 Replies

6. Shell Programming and Scripting

Optimize the nested IF

Hi, I have to assign a value for a varaiable based on a Input. I have written the below code: if then nf=65 elif then nf=46 elif then nf=164 elif then nf=545 elif then nf=56 elif then (3 Replies)
Discussion started by: machomaddy
3 Replies

7. Shell Programming and Scripting

pl help me to Optimize the given code

Pl help to me to write the below code in a simple way ... i suupose to use this code 3 to 4 places in my makefile(gnu) .. **************************************** @for i in $(LIST_A); do \ for j in $(LIST_B); do\ if ;then\ echo "Need to sign"\ echo "List A = $$i , List B =$$j"\ ... (2 Replies)
Discussion started by: pk_arun
2 Replies

8. Shell Programming and Scripting

Optimize shell code

#!/usr/bin/perl use strict; use warnings; use Date::Manip; my $date_converted = UnixDate(ParseDate("3 days ago"),"%e/%h/%Y"); open FILE,">$ARGV"; while(<DATA>){ my @tab_delimited_array = split(/\t/,$_); $tab_delimited_array =~ s/^\ =~ s/^\-//; my $converted_date =... (2 Replies)
Discussion started by: sandy1028
2 Replies

9. Shell Programming and Scripting

can we optimize this command

can we optimize this command ? sed 's#AAAA##g' /study/i.txt | sed '1,2d' | tr -d '\n\' > /study/i1.txt; as here i am using two files ...its overhead..can we optimise to use only 1 file sed 's#AAAA##g' /study/i.txt | sed '1,2d' | tr -d '\n\' > /study/i.txt; keeping them same but it... (9 Replies)
Discussion started by: crackthehit007
9 Replies

10. Shell Programming and Scripting

optimize the script

Hi, I have this following script below. Its searching a log file for 2 string and if found then write the strings to success.txt and If not found write strings to failed.txt . if one found and not other...then write found to success.txt and not found to failed.txt. I want to optimize this... (3 Replies)
Discussion started by: amitrajvarma
3 Replies
Login or Register to Ask a Question