Visit Our UNIX and Linux User Community


sed / grep / for statement performance - please help


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting sed / grep / for statement performance - please help
# 1  
Old 11-04-2009
Error sed / grep / for statement performance - please help

I'm searching the most effective way of doing the following task, so if someone can either provide a working solution with sed or one totally different but more effective then what I've got so far then please go ahead!

The debugme directory has 3 subdirectorys and each of them has one .txt file with about 48 entrys each.

Code:
time (
for FILE1 in `find debugme -name "*.txt"` ;do
    for FILE2 in `cat "$FILE1" | awk '{print $1}' | grep -i '^\([0-9]\+\)$'` ;do
    #for FILE2 in `sed 's/\([0-9]\+\) \([a-zA-Z0-9]\+\)/\1/i' "$FILE"` ;do
        CHECK=`grep "$FILE2" "debug.files"`
        if [ "$CHECK" = "" ]; then
            ## ADD MISSING ENTRY BLABLA
            echo "$FILE2 was missing, added!"
        fi
    done
done
)

Avg result:
real    0m0.174s
user    0m0.052s
sys     0m0.128s

time (
for FILE1 in `find debugme -name "*.txt"` ;do
    FILE2() {
        S=`grep $1 debug.files`
        if [ "$S" = "" ] ; then
            ## ADD MISSING ENTRY BLABLA
            echo "$FILE2 was missing, added!"
        fi
    }
    while read cola colb ; do
        $(FILE2 $cola)
    done < $FILE1
done
)

Avg result:
real    0m0.269s
user    0m0.064s
sys     0m0.228s

The .txt files are in this format:

Code:
[03:53:22] root:~# cat example.txt
52352578 ABF2778ABD^
73534536 LASDM337lA^
83523422 JFAASMM31^

And debug.files in this:

Code:
[03:53:25] root:~# cat debug.files
52352578
73534536

# 2  
Old 11-04-2009
I suspect that:
Code:
if [ -z "`grep $1 debug.files`" ] ; then

will be quicker than:
Code:
S=`grep $1 debug.files`
if [ "$S" = "" ] ; then

Also:
Code:
awk '{print $1}' "$FILE1" |

will be more efficient than:
Code:
cat "$FILE1" | awk '{print $1}' |

But if you are really serious about performance then using perl(1) would be quicker, that is using one binary to do all the processing rather than calling: grep(1), test(1), sed(1), etc.

Saying that I know some very able folk who could do the whole thing in nawk(1)!

Hope this helps...
# 3  
Old 11-04-2009
And do you need "find" at all?

As it's only three directories and 1 file in each

Code:
for file in  */*.txt; do

is surely faster than starting up find.

If that's what you are after, of course.
# 4  
Old 11-05-2009
Quote:
Originally Posted by TonyLawrence
And do you need "find" at all?

As it's only three directories and 1 file in each

Code:
for file in  */*.txt; do

is surely faster than starting up find.

If that's what you are after, of course.
That was just an example, the .txt file(s) is/are either in the main directory or in subdirectorys and debugme is a variable. I guess that find is necessary?

Oh and I'm looking for the most effective way of doing this in bash, no perl Smilie


ps. I've tested your suggestions, here is the result:

Code:
With your modifications:
real    0m0.171s
user    0m0.060s
sys     0m0.116s
Without:
real    0m0.170s
user    0m0.044s
sys     0m0.124s



---------- Post updated at 05:14 PM ---------- Previous update was at 02:36 PM ----------

So nobody has any suggestions on how to make it run quicker?

---------- Post updated 11-05-09 at 04:29 AM ---------- Previous update was 11-04-09 at 05:14 PM ----------

Seriously I though that some of the experts here would know how to optimize such a task :/

Last edited by TehOne; 11-04-2009 at 06:43 PM..
# 5  
Old 11-05-2009
get rid of all those useless cats, greps and sed.
Code:
find debugme -name "*.txt" | awk 'BEGIN{
   # get all the lines of debug.files into array for later comparison. Analogous to grep "$FILE2" "debug.files"
   while( (getline line < "debug.files" ) > 0 ) {
       a[++d]=line
   }
   close("debug.files")
}
{
   filename=$0
   f=0 #at the start of processing each file, set f=0 marker
   while( (getline line < filename ) > 0 ){
       m=split(line,t," ")  # split the line on space, this is same as your awk "{print $1}"
       if ( t[1]+0 == t[1]){  # this should be equivalent to grep -i '^\([0-9]\+\)$
           for(i=1;i<=d;i++){
              ## print out variable values for debugging as needed
              if( a[i] ~ t[1] ){  # go through the lines of debug.files and compare with t[1]
                print "found"
                found=1   
              }
           }
       }       
       if(f==0){
        print "not found"
       }
   }
   close(filename)   
}'

NB:not tested.

Last edited by ghostdog74; 11-05-2009 at 10:58 AM..
# 6  
Old 11-05-2009
Quote:
Originally Posted by ghostdog74
get rid of all those useless cats, greps and sed.
Code:
find debugme -name "*.txt" | awk 'BEGIN{
   while( (getline line < "debug.files" ) > 0 ) {
       a[++d]=line
   }
   close("debug.files")
}
{
   filename=$0
   while( (getline line < filename ) > 0 ){
       m=split(line,t," ")
       if ( t[1]+0 == t[1]){
           for(i=1;i<=d;i++){
              if( a[i] ~ t[1] ){
                print "found"
                found=1   
              }
           }
       }       
       if(f==0){
        print "not found"
       }
   }
   close(filename)   
}'

NB:not tested.
Your attempt seems to be best when it comes to performance but it doesn't work as it shows not found for each line! I'd appreciate it alot if you could get it working.

Last edited by TehOne; 11-05-2009 at 10:51 AM..
# 7  
Old 11-05-2009
Quote:
Originally Posted by TehOne
if you could get it working.
no, you should be the one getting it working, since firstly, you have the exact environment, i don't. Secondly, its not my work, its yours. I have edited the script with comments. look at it, play with it, read the docs, print out the values of the variables to see what they contain at runtime, do whatever. then post again if you hit problems.

Previous Thread | Next Thread
Test Your Knowledge in Computers #606
Difficulty: Easy
MySQL 7.0 added the JSON utility function JSON_PRETTY(), which outputs an existing JSON value in an easy-to-read format.
True or False?

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Convert Update statement into Insert statement in UNIX using awk, sed....

Hi folks, I have a scenario to convert the update statements into insert statements using shell script (awk, sed...) or in database using regex. I have a bunch of update statements with all columns in a file which I need to convert into insert statements. UPDATE TABLE_A SET COL1=1 WHERE... (0 Replies)
Discussion started by: dev123
0 Replies

2. Shell Programming and Scripting

Performance improvement in grep

Below script is used to search numeric data from around 400 files in a folder. I have 300 such folders. Need help in performance improvement in the script. Below Script searches 20 such folders ( 300 files in each folder) simultaneously. This increases cpu utilization upto 90% What changes... (3 Replies)
Discussion started by: vegasluxor
3 Replies

3. Shell Programming and Scripting

Using GREP in IF Statement

Hello All, I have 2 different pieces of code, I am confused why the Code1 is giving me the correct result where as the Code2 is not giving me correct result. It gives me always result as "Failure" irrespective of the "ERROR" word exists in logfile or not. may I know the reason why? I am using Bash... (17 Replies)
Discussion started by: Ariean
17 Replies

4. Shell Programming and Scripting

Use awk/sed/grep with goto statement!

Hi, I have an array with characters and I am looking for specific character in that array and if those specific character not found than I use goto statment which is define somehwhere in the script. My code is: set a = (A B C D E F) @ i = 0 while ($i <= ${#a}) if ($a != "F" || $a != "D")... (3 Replies)
Discussion started by: dixits
3 Replies

5. Programming

Help with improve the performance of grep

Input file: #content_1 12314345345 242467 #content_14 436677645 576577657 #content_100 3425546 56 #content_12 243254546 1232454 . . Reference file: content_100 (1 Reply)
Discussion started by: cpp_beginner
1 Replies

6. Shell Programming and Scripting

performance of shell script ( grep command)

Hi, I have to find out the run time for 40-45 different componets. These components writes in to a genreric log file in a single directory. eg. directory is LOG and the log file name format is generic_log_<process_id>_<date YY_MM_DD_HH_MM_SS>.log i am taking the run time using the time... (3 Replies)
Discussion started by: vikash_k
3 Replies

7. Shell Programming and Scripting

Grep statement

Hi All, Please can somebody advise that if I want to search a pattern xyz the grep command should only select xyz and not any other pattern containing xyz (ex abxyzcd) Regards (1 Reply)
Discussion started by: Shazin
1 Replies

8. UNIX for Advanced & Expert Users

sed performance

hello experts, i am trying to replace a line in a 100+mb text file. the structure is similar to the passwd file, id:value1:value2 and so on. using the sed command sed -i 's/\(123\):\(\{1,\}\):/\1:bar:/' data.txt works nicely, the line "123:foo:" is replaced by "123:bar:". however, it takes... (7 Replies)
Discussion started by: f3k
7 Replies

9. Shell Programming and Scripting

How to improve grep performance...

Hi All, I am using grep command to find string "abc" in one file . content of file is *********** abc = xyz def= lmn ************ i have given the below mentioned command to redirect the output to tmp file grep abc file | sort -u | awk '{print #3}' > out_file Then i am searching... (2 Replies)
Discussion started by: pooga17
2 Replies

10. Shell Programming and Scripting

Using grep in if statement

Can somebody please guide me towards right syntax: #!/bin/ksh if i = $(grep $NAME filename) echo "Name Found" else echo " Name not Found" fi I need to grep for $NAME in the file, and if it returns false, execute a series of commands and if true, exit out. The above is not the right... (3 Replies)
Discussion started by: chiru_h
3 Replies

Featured Tech Videos