Bash script too slow | Unix Linux Forums | Shell Programming and Scripting

  Go Back    


Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

Bash script too slow

Shell Programming and Scripting


Tags
bash, for loop, perl help, scripting, slow

Closed Thread    
 
Thread Tools Search this Thread Display Modes
    #1  
Old 01-29-2010
tigta09 tigta09 is offline
Registered User
 
Join Date: Sep 2009
Last Activity: 29 January 2010, 12:18 PM EST
Posts: 15
Thanks: 0
Thanked 0 Times in 0 Posts
Bash script too slow

I have a bash script that will take approx. 130 days to complete. I am trying to grep a list of 1,144 user ID's out of 41 (1 GB each) files. The 41 files were originally one 41 G file, but that was horrendously too slow.
This is my current file:

Code:
#!/bin/bash
      for i in `cat WashFD.txt`  # 1,144 files
          do
           for b in `cat xfiles` # 41 "x??" files
            do
          echo "looking for " $i "in " $b
          cat $b | grep -i $i   >> SEID.searches
      done
    done

Currently, I am processing one of the 41 files every 4 minutes. 4 x 41 = 164 min.
164 / 60 (min/hour) = 2.73 hours per user_id. I have 1,144 user_id's multiplied by 2.73 = 3123.12 hours. 3123.12 / 24 (hours in a day) = 130.13 days.

As you can see, that is way too long to process this task. I don't know PERL but I've heard its faster. If anyone has any suggestions please let me know.

Last edited by vbe; 01-29-2010 at 09:54 AM.. Reason: code tags please
Sponsored Links
    #2  
Old 01-29-2010
trey85stang trey85stang is offline
Registered User
 
Join Date: May 2008
Last Activity: 13 June 2012, 11:26 AM EDT
Posts: 110
Thanks: 1
Thanked 2 Times in 2 Posts
can you give an example of "xfiles"
Sponsored Links
    #3  
Old 01-29-2010
pludi's Avatar
pludi pludi is offline Forum Advisor  
Cat herder
 
Join Date: Dec 2008
Last Activity: 28 March 2014, 8:35 AM EDT
Location: Vienna, Austria, Earth
Posts: 5,522
Thanks: 38
Thanked 335 Times in 308 Posts
What platform/UNIX are you on, what's the format of the lines, and what do the user IDs look like?
    #4  
Old 01-29-2010
tigta09 tigta09 is offline
Registered User
 
Join Date: Sep 2009
Last Activity: 29 January 2010, 12:18 PM EST
Posts: 15
Thanks: 0
Thanked 0 Times in 0 Posts
The x files contain http log entries. The fuchsia colored object is the user_id that I am looking for.

2009-09-29 13:59:04 DD\\ABCDE 152.225.186.39 Search Engines
and Portals GET http://ui.sina.com/assets/js/jump_home.js applica
tion/x-javascript 262 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5
.1; SV1; InfoPath.1; .NET CLR 2.0.50727; .NET CLR 1.1.4322) TCP_HIT/304 SINA.com US ??? - ?? DIRECT/12.130.152.120 192.168.40.148

---------- Post updated at 10:06 AM ---------- Previous update was at 10:04 AM ----------

the user id's look like DD\\ABCDE . the platform is Ubuntu Linux 8.10
Sponsored Links
    #5  
Old 01-29-2010
dpc.ucore.info dpc.ucore.info is offline
Registered User
 
Join Date: Jan 2010
Last Activity: 30 August 2010, 6:31 AM EDT
Location: Silesia, Poland
Posts: 69
Thanks: 0
Thanked 0 Times in 0 Posts
You don't seem to use regex at all. Use a "fgrep" or "grep -F" to work in fixed strings mode. This way it processing time will be nothing comparing to reading data from the disk.

---------- Post updated at 05:19 PM ---------- Previous update was at 05:11 PM ----------

OK. I can see that you're reading each file multiple times. This is the cause of the problem, not processing time.

Use basic grep regexes and first compose the string of usernames like this:


Code:
user1|user2|user3|user4|...

Then grep each source file looking for all matches at once. Don't use


Code:
cat FILE | grep STRING

This is slower then simple:


Code:
grep STRING FILE

!
Save all matches to temporary file and from this file check for each username. As this file should be much smaller then the original (I assume) you will save much time when reading it multiple time for each user.
Sponsored Links
    #6  
Old 01-29-2010
pludi's Avatar
pludi pludi is offline Forum Advisor  
Cat herder
 
Join Date: Dec 2008
Last Activity: 28 March 2014, 8:35 AM EDT
Location: Vienna, Austria, Earth
Posts: 5,522
Thanks: 38
Thanked 335 Times in 308 Posts
OK here's what you can do:
  • use grep with -F as dpc.ucore.info suggested
  • use the -f file switch as described in the man page. That will allow you to load and search for multiple search strings at once
  • don't loop over 41 files, but specify them all at the command line at once. Use -H to display which file it was found in if needed.
With these, you could cut your search down to
Code:
grep -F -f WashFD.txt $( cat xfiles ) > SEID.searches

Sponsored Links
    #7  
Old 01-29-2010
tigta09 tigta09 is offline
Registered User
 
Join Date: Sep 2009
Last Activity: 29 January 2010, 12:18 PM EST
Posts: 15
Thanks: 0
Thanked 0 Times in 0 Posts
After running:


Code:
grep -F -H -f  WashFD.txt $( cat xfiles ) > SEID.searches

the SEID.searches file is growing very fast, however the entries do not match what is in the WashFD.txt.

Am I missing something?

Last edited by pludi; 01-29-2010 at 12:08 PM..
Sponsored Links
Closed Thread

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Performance of log parsing shell script very slow sowmitr Shell Programming and Scripting 7 04-08-2009 01:37 PM
FTP run from shell script gives slow transfer rates Countificus Shell Programming and Scripting 8 04-07-2009 04:50 PM
script to add numbers is slow macsurveyr Shell Programming and Scripting 10 03-23-2009 11:07 PM
passing variable from bash to perl from bash script arsidh Shell Programming and Scripting 10 06-04-2008 12:25 PM
My script runs too slow :-(... roybe UNIX for Advanced & Expert Users 0 06-13-2005 12:54 PM



All times are GMT -4. The time now is 05:12 PM.