Perl script to read string from file#1 and find/replace in file#2


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Perl script to read string from file#1 and find/replace in file#2
# 1  
Old 08-09-2015
Perl script to read string from file#1 and find/replace in file#2

Hello Forum.

I have a file called abc.sed with the following commands;

s/1/one/g
s/2/two/g
...

I also have a second file called abc.dat and would like to substitute all occurrences of "1 with one", "2 with two", etc and create a new file called abc_new.dat

Code:
sed -f abc.sed abc.dat > abc_new.dat

For small files, this command works fine but for large files, it's very slow

I read that Perl might be faster in doing this kind of operation.

Can you please help me write the Perl code if you think it can work faster?

I am a newbie in Perl scripting.

Thanks and appreciate all the help you can provide.
# 2  
Old 08-09-2015
How long is that list of substitutes in abc.sed?

---------- Post updated at 11:36 PM ---------- Previous update was at 10:17 PM ----------

Give it a try and tell me if it does make a difference.
Code:
#!/usr/bin/perl

use strict;
use warnings;

my %replace;

open my $fh, '<', "abc.sed" or die "$!\n";
my @gsubs = <$fh>;
close $fh;
@gsubs = map{s/^s\/|\/g|\n//g; split "/"} @gsubs;
%replace = @gsubs;

my $search = join '|', keys %replace;

open $fh, '<', "abc.dat" or die "$!\n";
while(<$fh>) {
    s/($search)/$replace{$1}/ge;
    print;
}
close $fh;

# 3  
Old 08-09-2015
Thanks Aia for your suggestion.

abc.sed could vary in size from 1 to maybe 10,000 records

I'm trying to find a solution that will work faster than sed. It doesn't have to be Perl specifically. If you have some other ideas to process the file faster, that would be great.


I tried your code but getting the following error:
Code:
more test.pl
#!/usr/bin/perl

use strict;
use warnings;

my %replace;

open my $fh, '<', "abc.sed" or die "$!\n";
my @gsubs = <$fh>;
close $fh;
@gsubs = map{s/^s\/|\/g|\n//g; split "/"} @gsubs;
%replace = @gsubs;

my $search = join '|', keys %replace;

open $fh, '<', "abc.dat" or die "$!\n";
while(<$fh>) {
    s/($search)/$replace{$1}/ge;
    print;
}
close $fh;

Code:
sh test.pl

test.pl: line 3: use: command not found
test.pl: line 4: use: command not found
test.pl: line 6: my: command not found
Couldn't get a file descriptor referring to the console
test.pl: line 9: syntax error near unexpected token `;'
test.pl: line 9: `my @gsubs = <$fh>;'

I made test.pl executable and /usr/bin/perl executable exists on the Linux box.

Thanks.

---------- Post updated at 10:08 AM ---------- Previous update was at 07:26 AM ----------

I'm thinking of another option to use instead of sed or Perl, how about tr?

Do you think that this will work faster?

Thanks.
# 4  
Old 08-09-2015
If you could modify your abc.sed file from the format:
Code:
s/1/one/g
s/2/two/g
...

to instead be in the format:
Code:
g/1/s//one/g
g/2/s//two/g
...
w abc_new.dat
q

and save the modified contents in a file named abc.ed
and then try the command:
Code:
ed –s abc.dat < abc.ed

or:
Code:
ex –s abc.dat < abc.ed

instead of:
Code:
sed –f abc.sed abc.dat > abc_new.dat

it would be interesting to know if either of these make any difference in how long it takes.

I would imagine the difference between ed or ex and sed could be significant depending on the number of substitutions being made and on the number of lines in the file being modified. But there is no way to verify my imagination without a benchmark to test it against. Since I don't have any way to guess at the real substitutions you're performing nor of the data being processed, it is hard for me to create data on my system that would be a reasonable benchmark that might simulate your data running these commands on your OS on your hardware.

And depending on your OS, there might or might not be a significant difference between the ed and ex utilities for your data.

Hope this helps...

PS: The reason I imagine that ed and ex would be faster than sed is that each substitution command is run once for the entire file with these utilities while sed runs each substitution command once for each input line.

Last edited by Don Cragun; 08-09-2015 at 11:52 AM.. Reason: Add PS.
# 5  
Old 08-09-2015
I doubt there will be a significantly faster solution; sed itself is lightweight and fast. It may be outperformed by a few percent by sth. else, but not orders of magnitude. The task itself is heavy duty: it must compare the data file line by line, char by char, with 10000 to-be-substituted patterns - that will take its time no matter what tool you use.
Be aware that tr can only substitute char by char, not char by word and thus will not solve your problem.
# 6  
Old 08-09-2015
Quote:
Originally Posted by pchang
I tried your code but getting the following error:
[...]
I made test.pl executable and /usr/bin/perl executable exists on the Linux box.
It looks like the code is not being interpreted by Perl, but rather sh.
Executing it as /usr/bin/perl test.pl might fix that.
However, I doubt that it would produce a satisfactory result. Not in that amount of substitutions.

The substitution process is as good as it gets in sed or Perl. All that I was trying to do was to transfer the burden of the task to the REGEX engine, instead of the loop from line to line.
# 7  
Old 08-09-2015
thanks guys for all your suggestions.

I'm starting to think outside the Unix utilities and maybe a Java program will speed things up? What do you think?

Unfortunately, I don't have any experience with Java coding.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Script to replace a string with pattern read from a file

I have two files blocks.txt and rules.txt. In blocks.txt i have the following entries Linux1 Linux2 Linux3 ..... Linux10 In rules.txt i have the lines where a filename pattern starts like 'blk-name.*' I want to replace 'blk-name' with the names read from blocks.txt file I tried... (2 Replies)
Discussion started by: Jag02
2 Replies

2. UNIX for Beginners Questions & Answers

Find and replace a string in a text file

Dear all, I want to find all the "," in my text file and then replace the commas to a tab. I found a script online but I don't know how to modify the script for my case. Any one can help? Thank you. @echo off &setlocal set "search=%1" set "replace=%2" set "textfile=Input.txt" set... (2 Replies)
Discussion started by: forevertl
2 Replies

3. Shell Programming and Scripting

[Need help] perl script to find the occurance of string from a text file

I have two files 1. input.txt 2. keyword.txt input.txt has contents like .src_ref 0 "call.s" 24 first 0x000000 0x5a80 0x0060 BRA.l 0x60 .src_ref 0 "call.s" 30 first 0x000002 0x1bc5 RETI .src_ref 0 "call.s" 31 first 0x000003 0x6840 ... (2 Replies)
Discussion started by: acdc
2 Replies

4. Shell Programming and Scripting

How to read file, and replace certain string with another string?

Hi all, the value in the following file is just an example. It could be a different value/network addresses. Here is my example of initial output in a file name net.txt Initial Output, net.txt The goal is to produce the following format which is to convert from CIDR to Netmask... (6 Replies)
Discussion started by: type8code0
6 Replies

5. Shell Programming and Scripting

perl- read search and replace string from the file

Dear all, I have a number of files and each file has two sections separated by a blank line. At the top section, I have lines which describes the values of the alphabetical characters, # s #; 0.123 # p #; 12.3 # d #; -2.33 # f #; 5.68 <blank line> sssssss spfdffff sdfffffff Now I... (4 Replies)
Discussion started by: sasharma
4 Replies

6. Shell Programming and Scripting

find string and replace with string in other file

Dear all, I need your help, I have file like this: file1:23456 01910964830098775635 34567 01942809546554654323 67589 26546854368698023653 09778 58716868568576876878 08675 86178546154065406546 08573 54165843543054354305 . .file2: 23456 25 34567 26 67589 27 (2 Replies)
Discussion started by: attila
2 Replies

7. Shell Programming and Scripting

How to find a certain string in a file and replace it with a value from another file using sed/awk?

Hi Everyone, I am new to this forum and new to sed/awk programming too !! I need to find particular string in file1(text file) and replace it with a value from another text file(file2) the file2 has only one line and the value to be replaced with is in the second column. file 1: (assert (=... (21 Replies)
Discussion started by: paramad
21 Replies

8. Shell Programming and Scripting

Find and replace string from file which contains variable and path - SH

e.g. /home/$USER/.config replace it with "" (empty) Is this possible? I think you should play a bit with sharps ## and sed:b: (2 Replies)
Discussion started by: hakermania
2 Replies

9. Shell Programming and Scripting

find and replace a string in a file without the use of temp file

Hi - I am looking for a replacing a string in a in multiple *.sql files in directory with a new string without using a temporary file Normally I can use sed command as below for W in ls `FILE*.sql` do sed 's/OLD/NEW/g' $W > TEMPFILE.dat mv TEMPFILE.dat $W done But Here in my... (9 Replies)
Discussion started by: raghutapal
9 Replies

10. Shell Programming and Scripting

Find/replace to new file: ksh -> perl

I have korn shell script that genretaets 100 file based on template replacing the number. The template file is as below: $ cat template file number: NUMBER The shell script is as below: $ cat gen.sh #!/bin/ksh i=1; while ((i <= 100)); do sed "s/NUMBER/$i/" template > file_${i} ((... (1 Reply)
Discussion started by: McLan
1 Replies
Login or Register to Ask a Question