Is it better to grep and pipe to awk, or to seach with awk itself

10-07-2008

Registered User

162, 9

Join Date: Mar 2008

Last Activity: 18 December 2019, 2:39 AM EST

Location: NYC

Posts: 162

Thanks Given: 7

Thanked 9 Times in 9 Posts

Is it better to grep and pipe to awk, or to seach with awk itself

This may just be a lack of experience talking, but I always assumed that when possible it was better to use a commands built in abilities rather than to pipe to a bunch of commands. I wrote a (very simple) script a while back that was meant to pull out a certain error code, and report back what piece of equipment had thrown it:

Code:

#!/usr/bin/bash
######################################################
# Program:      problemboxes.sh 
# Date Created: 14 May 2008
# Developer:    Darrell S. **** (Digital Sys. Admin)
# Description:  Generates lists of boxes that have thrown excess error 18's 
# Last Updated: 7 Aug 2008
######################################################
clear
user=`echo $UID`
if 
[ $user != 0 ]; then
echo "you must be root to use this script";exit
else

clear
echo "generating list of boxes that have thrown more than 5 error 18's today. This may take a moment, please wait"

grep "AddResCnf response (0x12) is not 'OK'" /usr/local/n2bb/log/n2bb.log|awk '{print $10}'|tr -d "'"|cut -c 1-12|sort|uniq -c|awk '{ if ($1 > 5) print $1,$2}'
fi

Last night while bored I decided to try simplifying the script and came up with:

Code:

#!/usr/bin/bash
######################################################
# Program:      problemboxes.sh 
# Date Created: 14 May 2008
# Developer:    Darrell S. **** (Digital Sys. Admin)
# Description:  Generates lists of boxes that have thrown excess error 18's
# Last Updated: 5 October 2008
######################################################
clear
user=`echo $UID`
if 
[ $user != 0 ]; then
echo "you must be root to use this script";exit
else
clear
echo "generating list of boxes that have thrown more than 15 error 18's today. This may take a moment, please wait"
awk -F\' '/0x12/ {print $2}' /usr/local/n2bb/log/n2bb.log|cut -c 1-12|sort|uniq -c|awk '{ if ($1 > 15) print $1,$2}'
fi

What seems odd to me is, that while both return the same results, the one that searches with awk, takes considerably longer (granted only ~3 seconds right now, but that's because the log rolled at midnight) than the one that uses grep.

I have several scripts that use basically the same logic, just sort the information later, and as they tend to use up a lot of processor power (the logs these crawl are pretty big) I'd like to make them as efficient as possible.

In case it matters this is what the overall log file tends to look like:

Code:

root@bms02-twc-NM-newyork-ny:/usr/local/n2bb/log# tail n2bb.log
2008/10/07 04:27:04.348 GMT(10/07 00:27:04 -0400) INFO       SESSIONGW  N2BBSessionGateway_impl.DsmccMsgListener(): Received 'AddResCnf' message for session '001bd744aca600000f12'
2008/10/07 04:27:04.348 GMT(10/07 00:27:04 -0400) INFO       SESSIONGW  N2BBSessionGateway_impl.MsgHndlrThread(): Processing 'SvrAddResCnf' message for session '001bd744aca600000f12'
2008/10/07 04:27:04.455 GMT(10/07 00:27:04 -0400) INFO       SESSIONGW  N2BBSessionGateway_impl.msg_handler_SvrAddResCnf(): Sending SvrSetupRsp for session '001bd744aca600000f12'
2008/10/07 04:27:04.455 GMT(10/07 00:27:04 -0400) INFO       SESSIONGW  N2BBSessionGateway_impl.msg_handler_SvrAddResCnf(): Successfully set up session '001bd744aca600000f12'
2008/10/07 04:27:05.516 GMT(10/07 00:27:05 -0400) INFO       SESSIONGW  N2BBSessionGateway_impl.DsmccMsgListener(): Received 'SvrRelInd' message for session '0001a6fc0db015962b30'
2008/10/07 04:27:05.516 GMT(10/07 00:27:05 -0400) INFO       SESSIONGW  N2BBSessionGateway_impl.MsgHndlrThread(): Processing 'SvrRelInd' message for session '0001a6fc0db015962b30'
2008/10/07 04:27:05.516 GMT(10/07 00:27:05 -0400) INFO       SESSIONGW  N2BBSessionGateway_impl.releaseSession(): Sending SvrRlsRsp for session '0001a6fc0db015962b30'
2008/10/07 04:27:05.766 GMT(10/07 00:27:05 -0400) INFO       SESSIONGW  N2BBSessionGateway_impl.DsmccMsgListener(): Received 'SvrRelInd' message for session '00e0366d16b605c58bea'
2008/10/07 04:27:05.766 GMT(10/07 00:27:05 -0400) INFO       SESSIONGW  N2BBSessionGateway_impl.MsgHndlrThread(): Processing 'SvrRelInd' message for session '00e0366d16b605c58bea'
2008/10/07 04:27:05.767 GMT(10/07 00:27:05 -0400) INFO       SESSIONGW  N2BBSessionGateway_impl.releaseSession(): Sending SvrRlsRsp for session '00e0366d16b605c58bea'
root@bms02-twc-NM-newyork-ny:/usr/local/n2bb/log#

And this is the error that I'm looking to pull information from:

Code:

root@bms02-twc-NM-newyork-ny:/usr/local/n2bb/log# grep 0x12 n2bb.log
2008/10/07 04:01:45.592 GMT(10/07 00:01:45 -0400) ERROR      SESSIONGW  N2BBSessionGateway_impl.msg_handler_SvrAddResCnf(): Session '00e03665f4412ab15563' failed:  Unable to get resources in service group 11627.  AddResCnf response (0x12) is not 'OK'
2008/10/07 04:01:48.994 GMT(10/07 00:01:48 -0400) ERROR      SESSIONGW  N2BBSessionGateway_impl.msg_handler_SvrAddResCnf(): Session '00e03665f4412ab15564' failed:  Unable to get resources in service group 11627.  AddResCnf response (0x12) is not 'OK'

DeCoTwc

View Public Profile for DeCoTwc

Find all posts by DeCoTwc

10-07-2008

Registered User

2,898, 136

Join Date: Mar 2007

Last Activity: 11 July 2016, 2:55 PM EDT

Location: Toronto, Canada

Posts: 2,898

Thanks Given: 0

Thanked 136 Times in 120 Posts

Quote:

Originally Posted by DeCoTwc

Code:

#!/usr/bin/bash
######################################################
# Program:      problemboxes.sh 
# Date Created: 14 May 2008
# Developer:    Darrell S. **** (Digital Sys. Admin)
# Description:  Generates lists of boxes that have thrown excess error 18's 
# Last Updated: 7 Aug 2008
######################################################
clear
user=`echo $UID`

Why are you using command substitution instead of a straight assignment? In all shells except ksh93, it forks a new process and is almost as slow as an external command. Use:
Code:
user=$UID

Quote:

[snip]

What seems odd to me is, that while both return the same results, the one that searches with awk, takes considerably longer (granted only ~3 seconds right now, but that's because the log rolled at midnight) than the one that uses grep.

The search code in grep is much faster than that in all versions of awk except mawk, but it will not make much difference except on very large files.

The authors of AWK, in their book, The AWK Programming Language, recommend using grep to search and piping the results though awk for processing rather than doing it all in AWK.

cfajohnson

View Public Profile for cfajohnson

Find all posts by cfajohnson

10-07-2008

Registered User

162, 9

Join Date: Mar 2008

Last Activity: 18 December 2019, 2:39 AM EST

Location: NYC

Posts: 162

Thanks Given: 7

Thanked 9 Times in 9 Posts

Quote:

Originally Posted by cfajohnson

Why are you using command substitution instead of a straight assignment? In all shells except ksh93, it forks a new process and is almost as slow as an external command. Use:
Code:
user=$UID

The search code in grep is much faster than that in all versions of awk except mawk, but it will not make much difference except on very large files.

The authors of AWK, in their book, The AWK Programming Language, recommend using grep to search and piping the results though awk for processing rather than doing it all in AWK.

The UID thing...because I'm for lack of a better word...a newb.

Thanks for the info regarding the awk V. grep. The log I'm searching gets to be rather large as it only rolls every 12 hours. And even 45 minutes into it's cycle there was already a 3 second difference in running with grep as opposed to grep.

My second question is an extension of the first. Is there any benefit to giving grep more or less to search for?

is it better to grep for 0x12, as that's the error code, or to grep for "AddResCnf response (0x12) is not 'OK'" which is the entire error? In my mind I could think of logical reasons why I could think of why a longer search term is better than a short one...but as I said, I'm kind of a new.

DeCoTwc

View Public Profile for DeCoTwc

Find all posts by DeCoTwc

10-07-2008

Registered User

2,898, 136

Join Date: Mar 2007

Last Activity: 11 July 2016, 2:55 PM EDT

Location: Toronto, Canada

Posts: 2,898

Thanks Given: 0

Thanked 136 Times in 120 Posts

I don't think there'd be much difference, but try it and see.

cfajohnson

View Public Profile for cfajohnson

Find all posts by cfajohnson

10-07-2008

Registered User

162, 9

Join Date: Mar 2008

Last Activity: 18 December 2019, 2:39 AM EST

Location: NYC

Posts: 162

Thanks Given: 7

Thanked 9 Times in 9 Posts

misread prior post

Last edited by DeCoTwc; 10-07-2008 at 03:53 PM.. Reason: misread

DeCoTwc

View Public Profile for DeCoTwc

Find all posts by DeCoTwc

Shell Programming and Scripting

Is it better to grep and pipe to awk, or to seach with awk itself

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk pipe to sort

Discussion started by: cmccabe

2. Shell Programming and Scripting

awk seach and printing a particular pattern

Discussion started by: kumar85shiv

3. Shell Programming and Scripting

Use less pipe for grep or awk sed to print the line not include xx yy zz

Discussion started by: yanglei_fage

4. Shell Programming and Scripting

awk print pipe

Discussion started by: @man

5. Shell Programming and Scripting

Find files and seach limit with grep

Discussion started by: nfn

6. Shell Programming and Scripting

help with sed or awk with less pipe

Discussion started by: yanglei_fage

7. Shell Programming and Scripting

Pipe to awk to variable

Discussion started by: al0x

8. Shell Programming and Scripting

Read content between xml tags with awk, grep, awk or what ever...

Discussion started by: Sebi0815

9. Shell Programming and Scripting

pipe'ing grep output to awk

Discussion started by: ziggy25

10. Shell Programming and Scripting

AWK seach for exact word in certain column

Discussion started by: Orbix