Speed up this script!


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Speed up this script!
# 1  
Old 09-02-2009
Speed up this script!

I have a script that processes a fair amount of data -- say, 25-50 megs per run. I'd like ideas on speeding it up. The code is actually just a preprocessor -- I'm using another language to do the heavy lifting. But as it happens, the preprocessing takes much more time than the final processing so I'm optimizing this rather than that.

Here's the code. The basic idea is that, for each line of input (redirected to stdin), the program checks to see if the sequence number is in $mult and, if so, prints a line asking the other program to validate that sequence:
Code:
#!/usr/bin/perl -w

open(MULT, "mult.txt") or die("Can't find list of multiplicative sequences in mult.txt");
my $terminator = $/;
undef $/;
$mult = <MULT>;
$/ = $terminator;

// Print application-specific code -- snipped for brevity

$total = 0;
while(<>) {
	if (m/(A\d\d\d\d\d\d) ,((-?\d+,)*-?\d+),/) {
		$nm = $1;
		$seq = $2;
		if ($mult =~ /$nm/) { # Replace this line?
			print "go(\"$nm\", [$seq]);\n";
			$total++;
		}
	} else {
		print "print(\"Error reading line: $_\");\n";
	}
}

// Print application-specific code -- snipped for brevity

The file mult.txt is a short file of about a thousand lines, each of which is guaranteed to contain at most (exactly?) one line of the form A\d\d\d\d\d\d; the rest of the line is irrelevant here.

My thought for optimizing this: make an array of the \d\d\d\d\d\d values, sort, and do a binary search rather than a regular expression at the spot marked "Replace this line?". But I'm not sure how to go about that, or even if that's the 'right' optimization. Thoughts?

Also, any suggestions on making better idiomatic use of Perl would be appreciated. I'm not at all accustomed to the language.
# 2  
Old 09-03-2009
Create a hash of arrays - each array being one line of your mult.txt file.

You are searching 1000 entries with a regex - regex is a linear search, resulting in 500 lookups per average per line of stdin.

Here is Perl Programming's take on what you want to do:
Hashes of Arrays (Programming Perl)
# 3  
Old 09-03-2009
OK, I'll try that.
# 4  
Old 09-04-2009
Not much of a speed thing, however.
Code:
my $terminator = $/;
undef $/;
$mult = <MULT>;
$/ = $terminator;

According 'man perlvar' this is a no no...
The proper method would be to keep it local($/) to the smallest block... ie:
Code:
{  # Begin localization block
   local($/);
  $mult = <MULT>;
} # End localization block

Hash it!

For a simple hash example check out a recent thread of mine, It's simple so hopefully easy to understand and is similar to your needs... Delete block of text in one file based on list in another file

Also for better assistance a snippit of 'mult.txt' and a snippit of data would be very helpful in providing good useful information.

-Enjoy
fh : )_~

---------- Post updated at 06:23 PM ---------- Previous update was at 12:14 AM ----------

Thought I would tweek this a bit for ya!

I am new to Perl, My first line of Perl was just over a week ago.. (08/26/2009)
Any comments are very welcome!

3 examples depending on what you really want/need!

[edit]
NOTE:
After some thought I felt it better to modify Example 2 for cases of dirty data...
[/edit]

I am ASSUMING your data looks something like:
Code:
A123456 ,789,543,MoreData
A654320 ,789,543,MoreData
A024689 ,789,543,MoreData

I am ASSUMING your mult.txt is something like this:
Code:
A123456
A654321
A024689
A987654

Example 1, As close to your original as possible without waste.
Code:
#!/usr/bin/perl

use strict;
use warnings;

my $total;
my $multfile;
my %multhash;

my @Atmp;                  # for debugging & education purposes

open($multfile, "<", "mult.txt") or die("Can't find list of multiplicative sequences in mult.txt");
while (<$multfile>) {
  chomp;
  next if /^$/;            # skip blank lines
  $multhash{ $_ } = $_;    # add to hash, using element as the key & data
}
close($multfile);

@Atmp = (keys %multhash);  # for debugging & education purposes
print "@Atmp\n";           # for debugging & education purposes

$total = 0;
while(<>) {
  if (m/(A\d\d\d\d\d\d) ,((-?\d+,)*-?\d+),/) {
    if (exists $multhash{ $1 }) {
      print "go(\"$1\", [$2]);\n";
      $total++;
    }
  } else {
    print "print(\"Error reading line: $_\");\n";
  }
}
print "Total=$total\n";

Example 2, A bit cleaner
Code:
#!/usr/bin/perl

use strict;
use warnings;

my $total;
my $multfile;
my %multhash;

open($multfile, "<", "mult.txt") or die("Can't find list of multiplicative sequences in mult.txt");
while (<$multfile>) {
  chomp;
  next if /^$/;            # skip blank lines
  $multhash{ $_ } = $_;    # add to hash, using element as the key & data
}
close($multfile);

$total = 0;
while(<>) {
  if (m/(A\d\d\d\d\d\d) ,((-?\d+,)*-?\d+),/ && exists $multhash{ $1 }) {
    print "go(\"$1\", [$2]);\n";
    $total++;
  }
}
print "Total=$total\n";

Example 3, Lean and mean with the need for speed!
NOTE: The regex changes!
Code:
#!/usr/bin/perl

use strict;
use warnings;

my $multfile;
my %multhash;

open($multfile, "<", "mult.txt") or die("Can't find list of multiplicative sequences in mult.txt");
while (<$multfile>) {
  chomp;
  next if /^$/;            # skip blank lines
  $multhash{ $_ } = $_;    # add to hash, using element as the key & data
}
close($multfile);

while(<>) {
  m/(A\d{6}) ,(\d+,\d+)/;
  print "go(\"$1\", [$2]);\n" if exists $multhash{ $1 }
}

Hope this gets things going a bit faster for ya!

-Enjoy
fh : )_~

Last edited by Festus Hagen; 09-05-2009 at 12:12 AM.. Reason: regex change Example 3 / Modify #2 for dirty data
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Need to Speed up shell script

Hello, I am basic level shell script developer. I have developed the following script. The shell script basically tracking various files containing certain strings. I am finding options to make the script run more faster. Any help/suggestion would be appreciated :) #! /bin/bash # Greps for... (6 Replies)
Discussion started by: Bhanuprasad
6 Replies

2. Shell Programming and Scripting

Speed up the loop in shell script

Hi I have written a shell script which will test 300 to 500 IPs to find which are pinging and which are not pinging. the script which give output as 10.x.x.x is pining 10.x.x.x. is not pining - - - 10.x.x.x is pining like above. But, this script is taking... (6 Replies)
Discussion started by: kumar85shiv
6 Replies

3. Shell Programming and Scripting

Help me with speed up this script

hey guys i have a perl script wich use to compare hashes but it tookes a long time to do that so i wich i will have the soulition to do it soo fast he is the code <redacted> (1 Reply)
Discussion started by: benga
1 Replies

4. Shell Programming and Scripting

How can i speed this script up?

Hi, Im quite new to scripting and would like a bit of assistance with trying to speed up the following script. At the moment it is quite slow.... Any way to improve it? total=111120 while do total=`expr $total + 1` INCREMENT=$total firstline = "blablabla" secondline = "blablabla"... (5 Replies)
Discussion started by: brunlea
5 Replies

5. Shell Programming and Scripting

Slow Perl script: how to speed up?

I had written a perl script to compare two files: new and master and get the output of the first file i.e. the first file: words that are not in the master file STRUCTURE OF THE TWO FILES The first file is a series of names ramesh sushil jonga sudesh lugdi whereas the second file (could be... (4 Replies)
Discussion started by: gimley
4 Replies

6. Shell Programming and Scripting

Any trick to speed up script?

Hi Guys, I have a script that I am using to convert some text files to xls files. I create multiple temp. files in the process of conversion. Other than reducing the temp. files, are there any general tricks to help speed up the script? I am running it in the bash shell. Thanks. (6 Replies)
Discussion started by: npatwardhan
6 Replies

7. Filesystems, Disks and Memory

data from blktrace: read speed V.S. write speed

I analysed disk performance with blktrace and get some data: read: 8,3 4 2141 2.882115217 3342 Q R 195732187 + 32 8,3 4 2142 2.882116411 3342 G R 195732187 + 32 8,3 4 2144 2.882117647 3342 I R 195732187 + 32 8,3 4 2145 ... (1 Reply)
Discussion started by: W.C.C
1 Replies

8. Shell Programming and Scripting

Help to improve speed of text processing script

Hey together, You should know, that I'am relatively new to shell scripting, so my solution is probably a little awkward. Here is the script: #!/bin/bash live_dir=/var/lib/pokerhands/live for limit in `find $live_dir/ -type d | sed -e s#$live_dir/##`; do cat $live_dir/$limit/*... (19 Replies)
Discussion started by: lorus
19 Replies

9. Shell Programming and Scripting

any way to speed up calculations in bash script

hi i have a script that is taking the difference of multiple columns in a file from a value from a single row..so far i have a loop to do that.. all the data is floating point..fin has the difference between array1 and array2..array1 has 700 x 300= 210000 values and array2 has 700 values.. ... (11 Replies)
Discussion started by: npatwardhan
11 Replies

10. Filesystems, Disks and Memory

dmidecode, RAM speed = "Current Speed: Unknown"

Hello, I have a Supermicro server with a P4SCI mother board running Debian Sarge 3.1. This is the "dmidecode" output related to RAM info: RAM speed information is incomplete.. "Current Speed: Unknown", is there anyway/soft to get the speed of installed RAM modules? thanks!! Regards :)... (0 Replies)
Discussion started by: Santi
0 Replies
Login or Register to Ask a Question