Bash script too slow


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Bash script too slow
# 1  
Old 01-29-2010
Bash script too slow

I have a bash script that will take approx. 130 days to complete. I am trying to grep a list of 1,144 user ID's out of 41 (1 GB each) files. The 41 files were originally one 41 G file, but that was horrendously too slow.Smilie
This is my current file:
Code:
#!/bin/bash
      for i in `cat WashFD.txt`  # 1,144 files
          do
           for b in `cat xfiles` # 41 "x??" files
            do
          echo "looking for " $i "in " $b
          cat $b | grep -i $i   >> SEID.searches
      done
    done

Currently, I am processing one of the 41 files every 4 minutes. 4 x 41 = 164 min.
164 / 60 (min/hour) = 2.73 hours per user_id. I have 1,144 user_id's multiplied by 2.73 = 3123.12 hours. 3123.12 / 24 (hours in a day) = 130.13 days.

As you can see, that is way too long to process this task. I don't know PERL but I've heard its faster. If anyone has any suggestions please let me know.Smilie

Last edited by vbe; 01-29-2010 at 10:54 AM.. Reason: code tags please
# 2  
Old 01-29-2010
can you give an example of "xfiles"
# 3  
Old 01-29-2010
What platform/UNIX are you on, what's the format of the lines, and what do the user IDs look like?
# 4  
Old 01-29-2010
The x files contain http log entries. The fuchsia colored object is the user_id that I am looking for.

2009-09-29 13:59:04 DD\\ABCDE 152.225.186.39 Search Engines
and Portals GET http://ui.sina.com/assets/js/jump_home.js applica
tion/x-javascript 262 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5
.1; SV1; InfoPath.1; .NET CLR 2.0.50727; .NET CLR 1.1.4322) TCP_HIT/304 SINA.com US ??? - ?? DIRECT/12.130.152.120 192.168.40.148

---------- Post updated at 10:06 AM ---------- Previous update was at 10:04 AM ----------

the user id's look like DD\\ABCDE . the platform is Ubuntu Linux 8.10
# 5  
Old 01-29-2010
You don't seem to use regex at all. Use a "fgrep" or "grep -F" to work in fixed strings mode. This way it processing time will be nothing comparing to reading data from the disk.

---------- Post updated at 05:19 PM ---------- Previous update was at 05:11 PM ----------

OK. I can see that you're reading each file multiple times. This is the cause of the problem, not processing time.

Use basic grep regexes and first compose the string of usernames like this:

Code:
user1|user2|user3|user4|...

Then grep each source file looking for all matches at once. Don't use

Code:
cat FILE | grep STRING

This is slower then simple:

Code:
grep STRING FILE

!
Save all matches to temporary file and from this file check for each username. As this file should be much smaller then the original (I assume) you will save much time when reading it multiple time for each user.
# 6  
Old 01-29-2010
OK here's what you can do:
  • use grep with -F as dpc.ucore.info suggested
  • use the -f file switch as described in the man page. That will allow you to load and search for multiple search strings at once
  • don't loop over 41 files, but specify them all at the command line at once. Use -H to display which file it was found in if needed.
With these, you could cut your search down to
Code:
grep -F -f WashFD.txt $( cat xfiles ) > SEID.searches

# 7  
Old 01-29-2010
After running:

Code:
grep -F -H -f  WashFD.txt $( cat xfiles ) > SEID.searches

the SEID.searches file is growing very fast, however the entries do not match what is in the WashFD.txt.

Am I missing something?

Last edited by pludi; 01-29-2010 at 01:08 PM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to block first bash script until second bash script script launches web server/site?

I'm new to utilities like socat and netcat and I'm not clear if they will do what I need. I have a "compileDeployStartWebServer.sh" script and a "StartBrowser.sh" script that are started by emacs/elisp at the same time in two different processes. I'm using Cygwin bash on Windows 10. My... (3 Replies)
Discussion started by: siegfried
3 Replies

2. Shell Programming and Scripting

Bin/bash - xmessage very slow

Hello, I am showing the start of my script. I am finding that 'xmessage' is taking about 12-15 seconds to show. This in a terminal is very quick '/opt/vc/bin/vcgencmd get_camera'. Is there any way to get 'camera not detected' to show faster. Regards #!/bin/bash s=$(/opt/vc/bin/vcgencmd... (4 Replies)
Discussion started by: mad-hatter
4 Replies

3. Shell Programming and Scripting

BASH Slow Under Cron Only!

I've got a BASH script that runs much faster from the command line than when invoked under CRON. Ideas? Priority? IO? (1 Reply)
Discussion started by: gmark99
1 Replies

4. Shell Programming and Scripting

Shell script reading file slow

I have shell program as below #!/bin/sh echo ======= LogManageri start ========== #This directory is getting the raw data from remote server Raw_data=/opt/ftplogs # This directory is ready for process the data Processing_dir=/opt/processing_dir # This directory is prcoessed files and... (4 Replies)
Discussion started by: Chenchireddy
4 Replies

5. Shell Programming and Scripting

Slow down output from dhclient-script to screen

Hi I know the basic about script and sleep processes. However this is more tricky: I would like to run sh -x /sbin/dhclient-script and slow down the output of the script as a whole. How would you do it? I would like to delay output on the screen with 1 second for every line for the output... (3 Replies)
Discussion started by: medium_linux
3 Replies

6. Shell Programming and Scripting

Slow Script Execution.

Basically my requirement is to know the total number of free anonymous ports. anonymous port range is 32768- 65535. i wrote a script for that ********************************************** for i in {32768..65535} do netstat -an | grep $i > /dev/null if ... (21 Replies)
Discussion started by: mohtashims
21 Replies

7. UNIX for Dummies Questions & Answers

Help with slow KSH script

My script builds a lot of these array lists, then compares their sizes which solves my problem, but runs very slow. :( set -A comboSorted -- $( for x in ${IDs} do nawk -v s=$x ' BEGIN { testPattern="^" s "$" } { if ( $2 ~ testPattern ) { getline;getline; if ($1 == "IMAGE_SIZE") print... (1 Reply)
Discussion started by: nerdcurious
1 Replies

8. Shell Programming and Scripting

Slow Perl script: how to speed up?

I had written a perl script to compare two files: new and master and get the output of the first file i.e. the first file: words that are not in the master file STRUCTURE OF THE TWO FILES The first file is a series of names ramesh sushil jonga sudesh lugdi whereas the second file (could be... (4 Replies)
Discussion started by: gimley
4 Replies

9. Shell Programming and Scripting

script to add numbers is slow

Hi, I am running a BASH shell with the following script. The script works and gives me correct output but is very slow with large files. The more rows and columns (width and height) the slower as you can probably see. How can I do what I want more efficiently? Any ideas welcome. It has been... (10 Replies)
Discussion started by: macsurveyr
10 Replies

10. UNIX for Advanced & Expert Users

My script runs too slow :-(...

Hello experts, I have a series issue in script that result with bad peformence and I wonder if you can assist me. For example I have two files: File-New, size 15Mb. File-Old, size 1Mb. File-New content: a b c k File-Old content: d f a b (0 Replies)
Discussion started by: roybe
0 Replies
Login or Register to Ask a Question