sed / grep / for statement performance

11-04-2009

Registered User

82, 1

Join Date: Oct 2008

Last Activity: 29 August 2012, 9:54 AM EDT

Posts: 82

Thanks Given: 0

Thanked 1 Time in 1 Post

sed / grep / for statement performance - please help

I'm searching the most effective way of doing the following task, so if someone can either provide a working solution with sed or one totally different but more effective then what I've got so far then please go ahead!

The debugme directory has 3 subdirectorys and each of them has one .txt file with about 48 entrys each.

Code:

time (
for FILE1 in `find debugme -name "*.txt"` ;do
    for FILE2 in `cat "$FILE1" | awk '{print $1}' | grep -i '^\([0-9]\+\)$'` ;do
    #for FILE2 in `sed 's/\([0-9]\+\) \([a-zA-Z0-9]\+\)/\1/i' "$FILE"` ;do
        CHECK=`grep "$FILE2" "debug.files"`
        if [ "$CHECK" = "" ]; then
            ## ADD MISSING ENTRY BLABLA
            echo "$FILE2 was missing, added!"
        fi
    done
done
)

Avg result:
real    0m0.174s
user    0m0.052s
sys     0m0.128s

time (
for FILE1 in `find debugme -name "*.txt"` ;do
    FILE2() {
        S=`grep $1 debug.files`
        if [ "$S" = "" ] ; then
            ## ADD MISSING ENTRY BLABLA
            echo "$FILE2 was missing, added!"
        fi
    }
    while read cola colb ; do
        $(FILE2 $cola)
    done < $FILE1
done
)

Avg result:
real    0m0.269s
user    0m0.064s
sys     0m0.228s

The .txt files are in this format:

Code:

[03:53:22] root:~# cat example.txt
52352578 ABF2778ABD^
73534536 LASDM337lA^
83523422 JFAASMM31^

And debug.files in this:

Code:

[03:53:25] root:~# cat debug.files
52352578
73534536

TehOne

View Public Profile for TehOne

Find all posts by TehOne

11-04-2009

Registered User

1,033, 8

Join Date: Sep 2008

Last Activity: 1 July 2013, 6:45 PM EDT

Location: Malvern, Worcs. U.K.

Posts: 1,033

Thanks Given: 0

Thanked 8 Times in 8 Posts

I suspect that:

Code:

if [ -z "`grep $1 debug.files`" ] ; then

will be quicker than:

Code:

S=`grep $1 debug.files`
if [ "$S" = "" ] ; then

Also:

Code:

awk '{print $1}' "$FILE1" |

will be more efficient than:

Code:

cat "$FILE1" | awk '{print $1}' |

But if you are really serious about performance then using perl(1) would be quicker, that is using one binary to do all the processing rather than calling: grep(1), test(1), sed(1), etc.

Saying that I know some very able folk who could do the whole thing in nawk(1)!

Hope this helps...

TonyFullerMalv

View Public Profile for TonyFullerMalv

Find all posts by TonyFullerMalv

11-04-2009

Registered User

193, 0

Join Date: Sep 2007

Last Activity: 1 April 2010, 2:17 PM EDT

Location: SE Mass

Posts: 193

Thanks Given: 0

Thanked 0 Times in 0 Posts

And do you need "find" at all?

As it's only three directories and 1 file in each

Code:

for file in  */*.txt; do

is surely faster than starting up find.

If that's what you are after, of course.

TonyLawrence

View Public Profile for TonyLawrence

Find all posts by TonyLawrence

11-05-2009

Registered User

82, 1

Join Date: Oct 2008

Last Activity: 29 August 2012, 9:54 AM EDT

Posts: 82

Thanks Given: 0

Thanked 1 Time in 1 Post

Quote:

Originally Posted by TonyLawrence

And do you need "find" at all?

As it's only three directories and 1 file in each

Code:

for file in  */*.txt; do

is surely faster than starting up find.

If that's what you are after, of course.

That was just an example, the .txt file(s) is/are either in the main directory or in subdirectorys and debugme is a variable. I guess that find is necessary?

Oh and I'm looking for the most effective way of doing this in bash, no perl

ps. I've tested your suggestions, here is the result:

Code:

With your modifications:
real    0m0.171s
user    0m0.060s
sys     0m0.116s
Without:
real    0m0.170s
user    0m0.044s
sys     0m0.124s

---------- Post updated at 05:14 PM ---------- Previous update was at 02:36 PM ----------

So nobody has any suggestions on how to make it run quicker?

---------- Post updated 11-05-09 at 04:29 AM ---------- Previous update was 11-04-09 at 05:14 PM ----------

Seriously I though that some of the experts here would know how to optimize such a task :/

Last edited by TehOne; 11-04-2009 at 06:43 PM..

TehOne

View Public Profile for TehOne

Find all posts by TehOne

11-05-2009

Registered User

2,669, 20

Join Date: Sep 2006

Last Activity: 28 January 2015, 8:30 AM EST

Posts: 2,669

Thanks Given: 0

Thanked 20 Times in 20 Posts

get rid of all those useless cats, greps and sed.

Code:

find debugme -name "*.txt" | awk 'BEGIN{
   # get all the lines of debug.files into array for later comparison. Analogous to grep "$FILE2" "debug.files"
   while( (getline line < "debug.files" ) > 0 ) {
       a[++d]=line
   }
   close("debug.files")
}
{
   filename=$0
   f=0 #at the start of processing each file, set f=0 marker
   while( (getline line < filename ) > 0 ){
       m=split(line,t," ")  # split the line on space, this is same as your awk "{print $1}"
       if ( t[1]+0 == t[1]){  # this should be equivalent to grep -i '^\([0-9]\+\)$
           for(i=1;i<=d;i++){
              ## print out variable values for debugging as needed
              if( a[i] ~ t[1] ){  # go through the lines of debug.files and compare with t[1]
                print "found"
                found=1   
              }
           }
       }       
       if(f==0){
        print "not found"
       }
   }
   close(filename)   
}'

NB:not tested.

Last edited by ghostdog74; 11-05-2009 at 10:58 AM..

ghostdog74

View Public Profile for ghostdog74

Find all posts by ghostdog74

11-05-2009

Registered User

82, 1

Join Date: Oct 2008

Last Activity: 29 August 2012, 9:54 AM EDT

Posts: 82

Thanks Given: 0

Thanked 1 Time in 1 Post

Quote:

Originally Posted by ghostdog74

get rid of all those useless cats, greps and sed.

Code:

find debugme -name "*.txt" | awk 'BEGIN{
   while( (getline line < "debug.files" ) > 0 ) {
       a[++d]=line
   }
   close("debug.files")
}
{
   filename=$0
   while( (getline line < filename ) > 0 ){
       m=split(line,t," ")
       if ( t[1]+0 == t[1]){
           for(i=1;i<=d;i++){
              if( a[i] ~ t[1] ){
                print "found"
                found=1   
              }
           }
       }       
       if(f==0){
        print "not found"
       }
   }
   close(filename)   
}'

NB:not tested.

Your attempt seems to be best when it comes to performance but it doesn't work as it shows not found for each line! I'd appreciate it alot if you could get it working.

Last edited by TehOne; 11-05-2009 at 10:51 AM..

TehOne

View Public Profile for TehOne

Find all posts by TehOne

11-05-2009

Registered User

2,669, 20

Join Date: Sep 2006

Last Activity: 28 January 2015, 8:30 AM EST

Posts: 2,669

Thanks Given: 0

Thanked 20 Times in 20 Posts

Quote:

Originally Posted by TehOne

if you could get it working.

no, you should be the one getting it working, since firstly, you have the exact environment, i don't. Secondly, its not my work, its yours. I have edited the script with comments. look at it, play with it, read the docs, print out the values of the variables to see what they contain at runtime, do whatever. then post again if you hit problems.

ghostdog74

View Public Profile for ghostdog74

Find all posts by ghostdog74

Shell Programming and Scripting

sed / grep / for statement performance - please help

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Convert Update statement into Insert statement in UNIX using awk, sed....

Discussion started by: dev123

2. Shell Programming and Scripting

Performance improvement in grep

Discussion started by: vegasluxor

3. Shell Programming and Scripting

Using GREP in IF Statement

Discussion started by: Ariean

4. Shell Programming and Scripting

Use awk/sed/grep with goto statement!

Discussion started by: dixits

5. Programming

Help with improve the performance of grep

Discussion started by: cpp_beginner

6. Shell Programming and Scripting

performance of shell script ( grep command)

Discussion started by: vikash_k

7. Shell Programming and Scripting

Grep statement

Discussion started by: Shazin

8. UNIX for Advanced & Expert Users

sed performance

Discussion started by: f3k

9. Shell Programming and Scripting

How to improve grep performance...

Discussion started by: pooga17

10. Shell Programming and Scripting

Using grep in if statement

Discussion started by: chiru_h