Slow Perl script: how to speed up?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Slow Perl script: how to speed up?
# 1  
Old 03-19-2011
Slow Perl script: how to speed up?

I had written a perl script to compare two files: new and master and get the output of the first file i.e. the first file: words that are not in the master file
STRUCTURE OF THE TWO FILES
The first file is a series of names
ramesh
sushil
jonga
sudesh
lugdi
whereas the second file (could be in Upper ASCII or Unicode has the following structureSmilieexamples are from UNICODE)
jonga=जोंगा
tuti=टूटी
namashi=नामषी
biruli=बिरुली
lugdi=लुगदी
sundi=सुंडी
hembram=हेंब्रम
hessa=हेस्सा
EXPECTED OUTPUT
What I need is to identify ONLY the new words in the new file
ramesh
sushil
sudesh
since jonga and lugdi are present in the master file, they will not be listed.

Both files,especially the master are big. I wrote a PERL script which I give belw, which does the job, but it too slow. Any way of improving it to speed up the process. I use Perl under Windows:
PERL SCRIPT FOLLOWS:
Code:
#!/usr/bin/perl

open $file1, $ARGV[0];
open $file2, $ARGV[1];
while ($l1 = <$file1>) {
    chomp $l1;
    while ($l2 = <$file2>) {
	if ($l2 =~ /^$l1\=/) {
	    $found = 1;
	    break;
	}
    }
    print "$l1\n" unless $found;
    seek $file2, 0, 0;
    $found = 0;
}

Where did things go wrong. I sorted the two files before using an Awk script. But the perl script is very slow and comparing two files: 30,000 words and 200,000 words takes an awful amount of time.
Many thanks in advance for speeding up the script

Moderator's Comments:
Mod Comment Please use code tags when posting code.

Last edited by Perderabo; 03-20-2011 at 01:07 AM..
# 2  
Old 03-20-2011
How about awk?
Code:
awk -F'=' 'NR==FNR{a[$1]++;next} !a[$1]' file2 file1

# 3  
Old 03-20-2011
Dear Pravin27
I had written an awk script to do the job. But it was giving false positives when there was too much data.
Here is the script:
Practically the same as yours. When I had put it up, it was suggested to use an "O" for a large array:
BEGIN {FS="="}
NR==FNR{O[$1]++;next} !($1 in O)
Unluckily the script is fast but gives false positives, hence the PERL which gives correct positives but is slow.
I tried yours, it also gives false positives on a huge fle compare.
Manyt hanks.

Gimley
# 4  
Old 03-20-2011
Hi, I hope this will run faster than your script.

Code:
#!/usr/bin/perl

open (F1, $ARGV[0]);
open (F2, $ARGV[1]);
while (<F2>) {
chomp;
$hash{(split(/=/))[0]}++;
}
while (<F1>) {
    chomp;
    next if $hash{$_};
    print $_,"\n";
}
close(F1);
close(F2);

This User Gave Thanks to pravin27 For This Post:
# 5  
Old 03-20-2011
Dear pravin27,
It runs very fast. As far as I can see there are no false positives. I'll check and get back only if there are no false positives.
I also understood the logic which speeds it up.
Many thanks for your kindness.
Gimley
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Solaris

Rsync quite slow (using very little cpu): how to improve its speed?

I have "inherited" a OmniOS (illumos based) server. I noticed rsync is significantly slower in respect to my reference, FreeBSD 12-CURRENT, running on exactly same hardware. Using same hardware, same command with same source and target disks, OmniOS r151026 gives: test@omniosce:~# time... (11 Replies)
Discussion started by: priyadarshan
11 Replies

2. Shell Programming and Scripting

Help me with speed up this script

hey guys i have a perl script wich use to compare hashes but it tookes a long time to do that so i wich i will have the soulition to do it soo fast he is the code <redacted> (1 Reply)
Discussion started by: benga
1 Replies

3. Shell Programming and Scripting

How can i speed this script up?

Hi, Im quite new to scripting and would like a bit of assistance with trying to speed up the following script. At the moment it is quite slow.... Any way to improve it? total=111120 while do total=`expr $total + 1` INCREMENT=$total firstline = "blablabla" secondline = "blablabla"... (5 Replies)
Discussion started by: brunlea
5 Replies

4. Shell Programming and Scripting

Net::SSH::Perl slow to login.

I have some sample code that's supposed to ssh to another machine using Net::SSH::Perl, execute a command, and print the output of that command. It's very basic, and it works. However, I noticed that upon logging in: $ssh->login('username','password'); It takes roughly 10-13 seconds to... (2 Replies)
Discussion started by: mrwatkin
2 Replies

5. Filesystems, Disks and Memory

data from blktrace: read speed V.S. write speed

I analysed disk performance with blktrace and get some data: read: 8,3 4 2141 2.882115217 3342 Q R 195732187 + 32 8,3 4 2142 2.882116411 3342 G R 195732187 + 32 8,3 4 2144 2.882117647 3342 I R 195732187 + 32 8,3 4 2145 ... (1 Reply)
Discussion started by: W.C.C
1 Replies

6. Shell Programming and Scripting

Speed up this script!

I have a script that processes a fair amount of data -- say, 25-50 megs per run. I'd like ideas on speeding it up. The code is actually just a preprocessor -- I'm using another language to do the heavy lifting. But as it happens, the preprocessing takes much more time than the final processing... (3 Replies)
Discussion started by: CRGreathouse
3 Replies

7. UNIX for Advanced & Expert Users

speed test +20,000 file existance checks too slow

Need to make a very fast file existence checker. Passing in 20-50K num of files In the code below ${file} is a file with a listing of +20,000 files. test_speed is the script. I am commenting out the results of <time test_speed try>. The normal "test -f" is much much too slow when a system... (2 Replies)
Discussion started by: nullwhat
2 Replies

8. Shell Programming and Scripting

Optimize/speed-up perl extraction

Hi, Is there a way I can extract my data faster. You know my data is 1.2 GB text file with 8Million rows with 38 columns/fields. Imagine how huge this is. How I can optimized the data extraction using perl. That is why I'm creating a script to filter only those informations that I need. Is... (3 Replies)
Discussion started by: pinpe
3 Replies

9. UNIX for Advanced & Expert Users

network speed is slow

Hello, everyone: i encounter a problem these days , pls help me ,thanks in advance. my env: machine: ES40 A ES40 B os: true64 Unix 4.0f note: src.tar 8M network card speed 100M my problem: ... (3 Replies)
Discussion started by: q30
3 Replies
Login or Register to Ask a Question