Perl - grep issue in filenames with wildcards


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Perl - grep issue in filenames with wildcards
# 1  
Old 02-09-2011
Perl - grep issue in filenames with wildcards

Hi
I have 2 directories t1 and t2 with some files in it. I have to see whether the files present in t1 is also there in t2 or not. Currently, both the directories contain the same files as shown below:

$ABC.TXT
def.txt

Now, when I run the below script, it tells def.txt is found, $ABC.TXT not found. Since the filename itself contains wildcard, grep seems to be having some issue:


Code:
#!/usr/bin/perl
 
 
my @src=`ls /home/t1/`;
my @dest=`ls /home/t2/`;
 
foreach my $file (@src){
   @match=grep(/$file/, @dest);
   if (@match==0){
      print "Found $file";
   }else{
      print "Not found $file";
   }
}

Output:
Code:
Found def.txt
Not found $ABC.TXT

Please advice. I have put 2 files for samples. I also have some files like $$ABC.TXT, $$$ABC.TXT.

Guru
# 2  
Old 02-09-2011
How about...

Code:
#!/usr/bin/perl
 
 
my @src=`ls /home/t1/`;
my @dest=`ls /home/t2/`;
 
SRC: foreach my $src (@src){
   DEST: foreach my $dest (@dest) {
        if ( $src eq $dest ){
          print "Found $file";
          next SRC;
       }
   }
   print "Not found $src";
}

# 3  
Old 02-09-2011
Hi Jerry
Thanks for your reply. The number of files I have in the source and destination directories are around 50k, and hence using a nested for will cause some performance issue.

Can this be achieved using grep itself?

Guru
# 4  
Old 02-09-2011
Hi,

Other solution using 'perl' too. I hope it can be useful for you:
Code:
$ cat script.pl
use strict;
use warnings;

my %src = ();
my %dest = ();
map { $src{$_} = 1 } qx{ ls -1 /home/t1 };
map { $dest{$_} = 1 } qx{ ls -1 /home/t2 };

map { print "Found $_"; delete $src{$_} } grep { $dest{$_} } keys %src;
for (keys %src) {
        print "Not found $_";
}
$ perl script.pl
(...output suppressed...)

Regards,
Birei

Last edited by birei; 02-09-2011 at 11:50 AM..
# 5  
Old 02-10-2011
Quote:
Originally Posted by guruprasadpr
...
The number of files I have in the source and destination directories are around 50k, and hence using a nested for will cause some performance issue.
...
Have you tested or benchmarked this claim?
The "grep" method essentially does a nested loop as well. It's just more concise and a more "Perlish" way of doing things.

With a setup consisting of two directories "t1" and "t2" having 10,005 identical files, my benchmark shows this -

Code:
$
$
$ cat -n cmpdir.pl
  1  #!perl
  2  use Benchmark qw (cmpthese);
  3
  4  my @src  = `ls -1 ./t1/`;
  5  my @dest = `ls -1 ./t2/`;
  6
  7  sub using_grep {
  8    my $found = 0;
  9    my $notfound = 0;
 10    foreach my $file (@src){
 11       @match = grep {/$file/} @dest;
 12       if ($#match == -1){
 13         $notfound++;
 14       } else {
 15         $found++;
 16       }
 17    }
 18    return "$found, $notfound";
 19  }
 20
 21  sub using_loop {
 22    my $found = 0;
 23    my $notfound = 0;
 24    SRC: foreach my $src (@src) {
 25      DEST: foreach my $dest (@dest) {
 26        if ( $src eq $dest ) {
 27          $found++;
 28          next SRC;
 29        }
 30      }
 31      $notfound++;
 32    }
 33    return "$found, $notfound";
 34  }
 35
 36  print "From using_grep, (found, notfound) = ", using_grep(), "\n";
 37  print "From using_loop, (found, notfound) = ", using_loop(), "\n\n";
 38
 39  cmpthese (2, {
 40      using_grep => sub {using_grep()},
 41      using_loop => sub {using_loop()},
 42    }
 43  );
 44
$
$
$ perl cmpdir.pl
From using_grep, (found, notfound) = 10005, 0
From using_loop, (found, notfound) = 10005, 0
 
        s/iter using_grep using_loop
using_grep    148         --       -95%
using_loop   7.27      1943%         --
$
$

This shows that -

(A) Perl's "grep" operator takes 148 seconds on an average to compare 10,005 files in directories "t1" and "t2".
(B) Perl's nested loop method takes 7.27 seconds on an average to compare 10,005 files in directories "t1" and "t2".

tyler_durden

---------- Post updated 02-10-11 at 01:44 PM ---------- Previous update was 02-09-11 at 02:15 PM ----------

Quote:
Originally Posted by guruprasadpr
...it tells def.txt is found, $ABC.TXT not found. Since the filename itself contains wildcard, grep seems to be having some issue:
...I also have some files like $$ABC.TXT, $$$ABC.TXT.
...
For academic interest, if you were looking for a solution for the problem mentioned in your original post, you will have to quote the string you are searching for, so that it isn't interpolated -

Code:
$
$ # Elements $C, $$D, $$$E of array @x exist in array @y, but
$ # that is not reported because Perl tries to interpolate
$ # the "$" regex metacharacter.
$
$ perl -le '@x = qw(    B   $C   $$D  $$$E    F );
            @y = qw( A  B   $C   $$D  $$$E      );
            foreach $i (@x) {
              $exists = grep {/$i/} @y;
              printf("%-10s %-20s in \@y\n", $i, $exists==1 ? "exists" : "does not exist");
            }'
B          exists               in @y
$C         does not exist       in @y
$$D        does not exist       in @y
$$$E       does not exist       in @y
F          does not exist       in @y
$
$
$ # Works fine with quotemeta function
$
$ perl -le '@x = qw(    B   $C   $$D  $$$E    F );
            @y = qw( A  B   $C   $$D  $$$E      );
            foreach $i (@x) {
              $exists = grep {/\Q$i\E/} @y;
              printf("%-10s %-20s in \@y\n", $i, $exists==1 ? "exists" : "does not exist");
            }'
B          exists               in @y
$C         exists               in @y
$$D        exists               in @y
$$$E       exists               in @y
F          does not exist       in @y
$
$

Have a look at the quotemeta function -

quotemeta - perldoc.perl.org

and also the gory details of parsing quoted constructs -

perlop - perldoc.perl.org

in the Perl documentation.

HTH,
tyler_durden

Last edited by durden_tyler; 02-10-2011 at 03:33 PM..
This User Gave Thanks to durden_tyler For This Post:
# 6  
Old 02-10-2011
Thanks a lot tyler_durden. This was what I was looking for. Learnt quite a few things in this post.

Guru.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Grep multiple patterns that contain wildcards

job_count=`grep -e "The job called .* has finished | The job called .* is running" logfile.txt | wc -l` Any idea how to count those 2 patterns so i have a total count of the finished and running jobs from the log file? If i do either of the patterns its works okay but adding them together... (8 Replies)
Discussion started by: finn
8 Replies

2. UNIX for Dummies Questions & Answers

[Solved] Wildcards used in find, ls and grep commands

Platforms : Solaris 10 and RHEL 5.6 I always get double quotes , single quotes and asteriks mixed up for find, ls and grep commands. The below commands retrieve the correct results. But , unders stress , I get all these mixed up :mad: .So, i wanted to get a clear picture. Please check if... (7 Replies)
Discussion started by: John K
7 Replies

3. Shell Programming and Scripting

Grep wildcards

Hi all I want to search for number in file presented with wildcard as shown below. cat file.txt 1405 1623 1415 ....... ....... How to search for the number 141526 for example? If the number exist print "Number 141526 exist" if no, print "The number not exist" Thank you in advance. (3 Replies)
Discussion started by: vasil
3 Replies

4. Shell Programming and Scripting

GREP Issue in Perl

Im storing multiple functions in a varaible called $check... The variable check contains the following: a() b() c() ... ..etc now im checking individually which function is kept in which file using GREP if ( grep \$check \i, <FILE> ) The problem is im getting the output for the... (1 Reply)
Discussion started by: rajkrishna89
1 Replies

5. UNIX for Dummies Questions & Answers

Help with grep - not showing filenames

Hello, I'm looking for a search string within about 50 files but when the string is found it doesn't tell me in which member it has been found. Any ideas how I can do this? Cheers Rob (4 Replies)
Discussion started by: Grueben
4 Replies

6. Shell Programming and Scripting

Perl, open multiple files with wildcards

I have a question regarding Perl scripting. If I want to say open files that all look like this and assign them to a filehandle and then assign the filehandle to a variable, how do I do this? The file names are strand1.fa.gz.tmp strand2.fa.gz.tmp strand3.fa.gz.tmp strand4.fa.gz.tmp ...... (6 Replies)
Discussion started by: japaneseguitars
6 Replies

7. Shell Programming and Scripting

Perl script to search and extract using wildcards.

Good evening All, I have a perl script to pull out all occurrences of a files beginning with xx and ending in .p. I will then loop through all 1K files in a directory. I can grep for xx*.p files but it gives me the entire line. I wish to output to a single colum with only the hits found. ... (3 Replies)
Discussion started by: CammyD
3 Replies

8. Shell Programming and Scripting

using wildcards in this perl command

Hi there, is it possible to use wild cards in this statement ssh $remote_server 'perl -pi -e "s,EXP_SERIAL_19b8be67=\"\",EXP_SERIAL_`hostid`=\"UNKNOWN\"," /var/myfile' This command works fine but the bit in bold (the 8 character hostid) will not always be 19b8be67 so I was hoping I could... (2 Replies)
Discussion started by: hcclnoodles
2 Replies

9. Shell Programming and Scripting

how to grep all the filenames in a script

hello, we have a script (let say ABC) which contains the below: lp -d lp06 -t 'REPORT FOR FINANCE -FIN-' $daily/FINRPT lp -d lp06 -t 'REPORT FOR FINANCE -FIN1-' $daily/FINRPT1 lp -d lp06 -t 'REPORT FOR HUMAN RESOURCE -HR-' $dd_daily/HRRPT lp -d lp06 -t 'REPORT FOR MANAGEMENT -MGM-'... (10 Replies)
Discussion started by: newbie168
10 Replies

10. UNIX for Dummies Questions & Answers

grep and wildcards

Hi guys, a small problem today, I'm grepping a log file containing lines like this below: Mar 09 16:04:00 blabla Mar 09 16:04:02 blabla Mar 09 16:04:05 blabla Mar 09 16:04:15 blabla Mar 09 16:05:06 blabla Mar 09 16:05:23 blabla Mar 09 16:05:25 blabla ... in this file I'm grepping... (5 Replies)
Discussion started by: Lomic
5 Replies
Login or Register to Ask a Question