Perl script to read string from file#1 and find/replace in file#2

08-09-2015

Registered User

131, 2

Join Date: Sep 2007

Last Activity: 23 September 2020, 2:43 PM EDT

Posts: 131

Thanks Given: 26

Thanked 2 Times in 2 Posts

Perl script to read string from file#1 and find/replace in file#2

Hello Forum.

I have a file called abc.sed with the following commands;

s/1/one/g
s/2/two/g
...

I also have a second file called abc.dat and would like to substitute all occurrences of "1 with one", "2 with two", etc and create a new file called abc_new.dat

Code:

sed -f abc.sed abc.dat > abc_new.dat

For small files, this command works fine but for large files, it's very slow

I read that Perl might be faster in doing this kind of operation.

Can you please help me write the Perl code if you think it can work faster?

I am a newbie in Perl scripting.

Thanks and appreciate all the help you can provide.

pchang

View Public Profile for pchang

Find all posts by pchang

08-09-2015

Registered User

1,781, 705

Join Date: May 2008

Last Activity: 10 November 2021, 5:38 PM EST

Posts: 1,781

Thanks Given: 62

Thanked 705 Times in 653 Posts

How long is that list of substitutes in abc.sed?

---------- Post updated at 11:36 PM ---------- Previous update was at 10:17 PM ----------

Give it a try and tell me if it does make a difference.

Code:

#!/usr/bin/perl

use strict;
use warnings;

my %replace;

open my $fh, '<', "abc.sed" or die "$!\n";
my @gsubs = <$fh>;
close $fh;
@gsubs = map{s/^s\/|\/g|\n//g; split "/"} @gsubs;
%replace = @gsubs;

my $search = join '|', keys %replace;

open $fh, '<', "abc.dat" or die "$!\n";
while(<$fh>) {
    s/($search)/$replace{$1}/ge;
    print;
}
close $fh;

Aia

View Public Profile for Aia

Find all posts by Aia

08-09-2015

Registered User

131, 2

Join Date: Sep 2007

Last Activity: 23 September 2020, 2:43 PM EDT

Posts: 131

Thanks Given: 26

Thanked 2 Times in 2 Posts

Thanks Aia for your suggestion.

abc.sed could vary in size from 1 to maybe 10,000 records

I'm trying to find a solution that will work faster than sed. It doesn't have to be Perl specifically. If you have some other ideas to process the file faster, that would be great.

I tried your code but getting the following error:

Code:

more test.pl
#!/usr/bin/perl

use strict;
use warnings;

my %replace;

open my $fh, '<', "abc.sed" or die "$!\n";
my @gsubs = <$fh>;
close $fh;
@gsubs = map{s/^s\/|\/g|\n//g; split "/"} @gsubs;
%replace = @gsubs;

my $search = join '|', keys %replace;

open $fh, '<', "abc.dat" or die "$!\n";
while(<$fh>) {
    s/($search)/$replace{$1}/ge;
    print;
}
close $fh;

Code:

sh test.pl

test.pl: line 3: use: command not found
test.pl: line 4: use: command not found
test.pl: line 6: my: command not found
Couldn't get a file descriptor referring to the console
test.pl: line 9: syntax error near unexpected token `;'
test.pl: line 9: `my @gsubs = <$fh>;'

I made test.pl executable and /usr/bin/perl executable exists on the Linux box.

Thanks.

---------- Post updated at 10:08 AM ---------- Previous update was at 07:26 AM ----------

I'm thinking of another option to use instead of sed or Perl, how about tr?

Do you think that this will work faster?

Thanks.

pchang

View Public Profile for pchang

Find all posts by pchang

08-09-2015

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

If you could modify your abc.sed file from the format:

Code:

s/1/one/g
s/2/two/g
...

to instead be in the format:

Code:

g/1/s//one/g
g/2/s//two/g
...
w abc_new.dat
q

and save the modified contents in a file named abc.ed
and then try the command:

Code:

ed –s abc.dat < abc.ed

or:

Code:

ex –s abc.dat < abc.ed

instead of:

Code:

sed –f abc.sed abc.dat > abc_new.dat

it would be interesting to know if either of these make any difference in how long it takes.

I would imagine the difference between ed or ex and sed could be significant depending on the number of substitutions being made and on the number of lines in the file being modified. But there is no way to verify my imagination without a benchmark to test it against. Since I don't have any way to guess at the real substitutions you're performing nor of the data being processed, it is hard for me to create data on my system that would be a reasonable benchmark that might simulate your data running these commands on your OS on your hardware.

And depending on your OS, there might or might not be a significant difference between the ed and ex utilities for your data.

Hope this helps...

PS: The reason I imagine that ed and ex would be faster than sed is that each substitution command is run once for the entire file with these utilities while sed runs each substitution command once for each input line.

Last edited by Don Cragun; 08-09-2015 at 11:52 AM.. Reason: Add PS.

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

08-09-2015

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

I doubt there will be a significantly faster solution; sed itself is lightweight and fast. It may be outperformed by a few percent by sth. else, but not orders of magnitude. The task itself is heavy duty: it must compare the data file line by line, char by char, with 10000 to-be-substituted patterns - that will take its time no matter what tool you use.
Be aware that tr can only substitute char by char, not char by word and thus will not solve your problem.

RudiC

View Public Profile for RudiC

Find all posts by RudiC

08-09-2015

Registered User

1,781, 705

Join Date: May 2008

Last Activity: 10 November 2021, 5:38 PM EST

Posts: 1,781

Thanks Given: 62

Thanked 705 Times in 653 Posts

Quote:

Originally Posted by pchang

I tried your code but getting the following error:
[...]
I made test.pl executable and /usr/bin/perl executable exists on the Linux box.

It looks like the code is not being interpreted by Perl, but rather sh.
Executing it as /usr/bin/perl test.pl might fix that.
However, I doubt that it would produce a satisfactory result. Not in that amount of substitutions.

The substitution process is as good as it gets in sed or Perl. All that I was trying to do was to transfer the burden of the task to the REGEX engine, instead of the loop from line to line.

Aia

View Public Profile for Aia

Find all posts by Aia

08-09-2015

Registered User

131, 2

Join Date: Sep 2007

Last Activity: 23 September 2020, 2:43 PM EDT

Posts: 131

Thanks Given: 26

Thanked 2 Times in 2 Posts

thanks guys for all your suggestions.

I'm starting to think outside the Unix utilities and maybe a Java program will speed things up? What do you think?

Unfortunately, I don't have any experience with Java coding.

pchang

View Public Profile for pchang

Find all posts by pchang

Shell Programming and Scripting

Perl script to read string from file#1 and find/replace in file#2

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Script to replace a string with pattern read from a file

Discussion started by: Jag02

2. UNIX for Beginners Questions & Answers

Find and replace a string in a text file

Discussion started by: forevertl

3. Shell Programming and Scripting

[Need help] perl script to find the occurance of string from a text file

Discussion started by: acdc

4. Shell Programming and Scripting

How to read file, and replace certain string with another string?

Discussion started by: type8code0

5. Shell Programming and Scripting

perl- read search and replace string from the file

Discussion started by: sasharma

6. Shell Programming and Scripting

find string and replace with string in other file

Discussion started by: attila

7. Shell Programming and Scripting

How to find a certain string in a file and replace it with a value from another file using sed/awk?

Discussion started by: paramad

8. Shell Programming and Scripting

Find and replace string from file which contains variable and path - SH

Discussion started by: hakermania

9. Shell Programming and Scripting

find and replace a string in a file without the use of temp file

Discussion started by: raghutapal

10. Shell Programming and Scripting

Find/replace to new file: ksh -> perl

Discussion started by: McLan