Is it better to grep and pipe to awk, or to seach with awk itself


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Is it better to grep and pipe to awk, or to seach with awk itself
# 1  
Old 10-07-2008
Is it better to grep and pipe to awk, or to seach with awk itself

This may just be a lack of experience talking, but I always assumed that when possible it was better to use a commands built in abilities rather than to pipe to a bunch of commands. I wrote a (very simple) script a while back that was meant to pull out a certain error code, and report back what piece of equipment had thrown it:

Code:
#!/usr/bin/bash
######################################################
# Program:      problemboxes.sh 
# Date Created: 14 May 2008
# Developer:    Darrell S. **** (Digital Sys. Admin)
# Description:  Generates lists of boxes that have thrown excess error 18's 
# Last Updated: 7 Aug 2008
######################################################
clear
user=`echo $UID`
if 
[ $user != 0 ]; then
echo "you must be root to use this script";exit
else

clear
echo "generating list of boxes that have thrown more than 5 error 18's today. This may take a moment, please wait"

grep "AddResCnf response (0x12) is not 'OK'" /usr/local/n2bb/log/n2bb.log|awk '{print $10}'|tr -d "'"|cut -c 1-12|sort|uniq -c|awk '{ if ($1 > 5) print $1,$2}'
fi

Last night while bored I decided to try simplifying the script and came up with:

Code:
#!/usr/bin/bash
######################################################
# Program:      problemboxes.sh 
# Date Created: 14 May 2008
# Developer:    Darrell S. **** (Digital Sys. Admin)
# Description:  Generates lists of boxes that have thrown excess error 18's
# Last Updated: 5 October 2008
######################################################
clear
user=`echo $UID`
if 
[ $user != 0 ]; then
echo "you must be root to use this script";exit
else
clear
echo "generating list of boxes that have thrown more than 15 error 18's today. This may take a moment, please wait"
awk -F\' '/0x12/ {print $2}' /usr/local/n2bb/log/n2bb.log|cut -c 1-12|sort|uniq -c|awk '{ if ($1 > 15) print $1,$2}'
fi

What seems odd to me is, that while both return the same results, the one that searches with awk, takes considerably longer (granted only ~3 seconds right now, but that's because the log rolled at midnight) than the one that uses grep.

I have several scripts that use basically the same logic, just sort the information later, and as they tend to use up a lot of processor power (the logs these crawl are pretty big) I'd like to make them as efficient as possible.

In case it matters this is what the overall log file tends to look like:

Code:
root@bms02-twc-NM-newyork-ny:/usr/local/n2bb/log# tail n2bb.log
2008/10/07 04:27:04.348 GMT(10/07 00:27:04 -0400) INFO       SESSIONGW  N2BBSessionGateway_impl.DsmccMsgListener(): Received 'AddResCnf' message for session '001bd744aca600000f12'
2008/10/07 04:27:04.348 GMT(10/07 00:27:04 -0400) INFO       SESSIONGW  N2BBSessionGateway_impl.MsgHndlrThread(): Processing 'SvrAddResCnf' message for session '001bd744aca600000f12'
2008/10/07 04:27:04.455 GMT(10/07 00:27:04 -0400) INFO       SESSIONGW  N2BBSessionGateway_impl.msg_handler_SvrAddResCnf(): Sending SvrSetupRsp for session '001bd744aca600000f12'
2008/10/07 04:27:04.455 GMT(10/07 00:27:04 -0400) INFO       SESSIONGW  N2BBSessionGateway_impl.msg_handler_SvrAddResCnf(): Successfully set up session '001bd744aca600000f12'
2008/10/07 04:27:05.516 GMT(10/07 00:27:05 -0400) INFO       SESSIONGW  N2BBSessionGateway_impl.DsmccMsgListener(): Received 'SvrRelInd' message for session '0001a6fc0db015962b30'
2008/10/07 04:27:05.516 GMT(10/07 00:27:05 -0400) INFO       SESSIONGW  N2BBSessionGateway_impl.MsgHndlrThread(): Processing 'SvrRelInd' message for session '0001a6fc0db015962b30'
2008/10/07 04:27:05.516 GMT(10/07 00:27:05 -0400) INFO       SESSIONGW  N2BBSessionGateway_impl.releaseSession(): Sending SvrRlsRsp for session '0001a6fc0db015962b30'
2008/10/07 04:27:05.766 GMT(10/07 00:27:05 -0400) INFO       SESSIONGW  N2BBSessionGateway_impl.DsmccMsgListener(): Received 'SvrRelInd' message for session '00e0366d16b605c58bea'
2008/10/07 04:27:05.766 GMT(10/07 00:27:05 -0400) INFO       SESSIONGW  N2BBSessionGateway_impl.MsgHndlrThread(): Processing 'SvrRelInd' message for session '00e0366d16b605c58bea'
2008/10/07 04:27:05.767 GMT(10/07 00:27:05 -0400) INFO       SESSIONGW  N2BBSessionGateway_impl.releaseSession(): Sending SvrRlsRsp for session '00e0366d16b605c58bea'
root@bms02-twc-NM-newyork-ny:/usr/local/n2bb/log#

And this is the error that I'm looking to pull information from:

Code:
root@bms02-twc-NM-newyork-ny:/usr/local/n2bb/log# grep 0x12 n2bb.log
2008/10/07 04:01:45.592 GMT(10/07 00:01:45 -0400) ERROR      SESSIONGW  N2BBSessionGateway_impl.msg_handler_SvrAddResCnf(): Session '00e03665f4412ab15563' failed:  Unable to get resources in service group 11627.  AddResCnf response (0x12) is not 'OK'
2008/10/07 04:01:48.994 GMT(10/07 00:01:48 -0400) ERROR      SESSIONGW  N2BBSessionGateway_impl.msg_handler_SvrAddResCnf(): Session '00e03665f4412ab15564' failed:  Unable to get resources in service group 11627.  AddResCnf response (0x12) is not 'OK'

# 2  
Old 10-07-2008
Quote:
Originally Posted by DeCoTwc
This may just be a lack of experience talking, but I always assumed that when possible it was better to use a commands built in abilities rather than to pipe to a bunch of commands. I wrote a (very simple) script a while back that was meant to pull out a certain error code, and report back what piece of equipment had thrown it:

Code:
#!/usr/bin/bash
######################################################
# Program:      problemboxes.sh 
# Date Created: 14 May 2008
# Developer:    Darrell S. **** (Digital Sys. Admin)
# Description:  Generates lists of boxes that have thrown excess error 18's 
# Last Updated: 7 Aug 2008
######################################################
clear
user=`echo $UID`


Why are you using command substitution instead of a straight assignment? In all shells except ksh93, it forks a new process and is almost as slow as an external command. Use:

Code:
user=$UID

Quote:

[snip]

What seems odd to me is, that while both return the same results, the one that searches with awk, takes considerably longer (granted only ~3 seconds right now, but that's because the log rolled at midnight) than the one that uses grep.

The search code in grep is much faster than that in all versions of awk except mawk, but it will not make much difference except on very large files.

The authors of AWK, in their book, The AWK Programming Language, recommend using grep to search and piping the results though awk for processing rather than doing it all in AWK.
# 3  
Old 10-07-2008
Quote:
Originally Posted by cfajohnson

Why are you using command substitution instead of a straight assignment? In all shells except ksh93, it forks a new process and is almost as slow as an external command. Use:

Code:
user=$UID


The search code in grep is much faster than that in all versions of awk except mawk, but it will not make much difference except on very large files.

The authors of AWK, in their book, The AWK Programming Language, recommend using grep to search and piping the results though awk for processing rather than doing it all in AWK.
The UID thing...because I'm for lack of a better word...a newb.

Thanks for the info regarding the awk V. grep. The log I'm searching gets to be rather large as it only rolls every 12 hours. And even 45 minutes into it's cycle there was already a 3 second difference in running with grep as opposed to grep.

My second question is an extension of the first. Is there any benefit to giving grep more or less to search for?

is it better to grep for 0x12, as that's the error code, or to grep for "AddResCnf response (0x12) is not 'OK'" which is the entire error? In my mind I could think of logical reasons why I could think of why a longer search term is better than a short one...but as I said, I'm kind of a new.
# 4  
Old 10-07-2008

I don't think there'd be much difference, but try it and see.
# 5  
Old 10-07-2008
misread prior post

Last edited by DeCoTwc; 10-07-2008 at 03:53 PM.. Reason: misread
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk pipe to sort

In the below awk to add a sort by smallest to largest should it be added after the END? Thank you :). BEGIN { FS="*" } # Read search terms from file1 into 's' FNR==NR { s next } { # Check if $5 matches one of the search terms for(i in s) { if($5 ~ i) { ... (4 Replies)
Discussion started by: cmccabe
4 Replies

2. Shell Programming and Scripting

awk seach and printing a particular pattern

Hi i need a help for this. the output of a command is like below 1 /tmp/x 2.2K /tmp/y 3.2k /tmp/z 1G /tmp/a/b 2.2G /tmp/c 3.4k /tmp/d Now i need to grep for the paths which are in GB..like below 1G /tmp/a/b 2.2G /tmp/c pls suggest me, how can i... (12 Replies)
Discussion started by: kumar85shiv
12 Replies

3. Shell Programming and Scripting

Use less pipe for grep or awk sed to print the line not include xx yy zz

cat file |grep -v "xx" | grep -v "yy" |grep -v "zz" (3 Replies)
Discussion started by: yanglei_fage
3 Replies

4. Shell Programming and Scripting

awk print pipe

Hey fellas, I wrote an script which its output is like this: a 1 T a 1 T a 2 A b 5 G b 5 G b 5 G I wanna print $1 $2 and the total number of $2 value as the third column and after that $3. Sth like this: a 1 2 T a 2 1 A b 5 3 G I know how to do it with a given input... (4 Replies)
Discussion started by: @man
4 Replies

5. Shell Programming and Scripting

Find files and seach limit with grep

Hi, I'm testing nginx-cache-purge and notice that grep searches all file content, and since the cache key is in the second line I would like to limit grep searching. This is the script I'm using: github.com/perusio/nginx-cache-purge The line I would like to change and limit grep search... (5 Replies)
Discussion started by: nfn
5 Replies

6. Shell Programming and Scripting

help with sed or awk with less pipe

<tr><th align=right valign=top>Faulty_Part</th><td align=left valign=top>readhat version 6.0</td></tr> <tr><th align=right valign=top>Submit_Date</th><td align=left valign=top>2011-04-28 02:08:02</td></tr> .......(a long string) I want to get all the field between "left valign=top>" and "... (2 Replies)
Discussion started by: yanglei_fage
2 Replies

7. Shell Programming and Scripting

Pipe to awk to variable

Hi! If I'm trying something like: echo "hello world" | myvar=`awk -F "world" '{print $1}'` echo $myvar myvar is always empty :confused: I googled for houres now and don't understand why it isn't working... Trying it in normal bash. Can someone explain it to me so I can say "Of course!... (8 Replies)
Discussion started by: al0x
8 Replies

8. Shell Programming and Scripting

Read content between xml tags with awk, grep, awk or what ever...

Hello, I trying to extract text that is surrounded by xml-tags. I tried this cat tst.xml | egrep "<SERVER>.*</SERVER>" |sed -e "s/<SERVER>\(.*\)<\/SERVER>/\1/"|tr "|" " " which works perfect, if the start-tag and the end-tag are in the same line, e.g.: <tag1>Hello Linux-Users</tag1> ... (5 Replies)
Discussion started by: Sebi0815
5 Replies

9. Shell Programming and Scripting

pipe'ing grep output to awk

This script is supposed to find out if tomcat is running or not. #!/bin/sh if netstat -a | grep `grep ${1}: /tomcat/bases | awk -F: '{print $3}'` > /dev/null then echo Tomcat for $1 running else echo Tomcat for $1 NOT running fi the /tomcat/bases is a file that... (2 Replies)
Discussion started by: ziggy25
2 Replies

10. Shell Programming and Scripting

AWK seach for exact word in certain column

Can anyone help me how I will extract all lines in a file where the word "worker" or "co-worker" in 2nd column exist. There are also word in 2nd column like "workers" or "worker2" but I don't want to display those lines. Appreciate any help in advance! Thank you! (5 Replies)
Discussion started by: Orbix
5 Replies
Login or Register to Ask a Question