The UNIX and Linux Forums  
Hello and Welcome from United States to the UNIX and Linux Forums! Thank You for Visiting and Joining Our Global Community.

Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
.
google unix.com



Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
regex question xiamin Shell Programming and Scripting 3 03-05-2009 02:53 AM
Perl regex help - matching parentheses cvp Shell Programming and Scripting 7 02-27-2009 05:38 PM
how do i strip this line using perl regex. ramky79 Shell Programming and Scripting 1 03-18-2008 12:10 PM
regex question arushunter Shell Programming and Scripting 8 01-04-2007 05:49 PM
regex question rocketkids UNIX for Dummies Questions & Answers 6 02-12-2004 05:49 AM

Closed Thread
English Japanese Spanish French German Portuguese Italian Dutch Swedish Russian Norwegian Hungarian Hebrew Danish Powered by Powered by Google
 
LinkBack Thread Tools Search this Thread Rate Thread Display Modes
  #1 (permalink)  
Old 07-16-2008
figaro figaro is offline
Registered User
  
 

Join Date: Jan 2007
Posts: 267
Perl regex question

I have the following code:
Code:
#!/usr/bin/perl -w

@files = <*.csv>;
foreach $file (@files) {
  open(FH, $file) || die("Error: Cannot open file $file for reading.");
  my @dt = ($file =~ /^(\w+).(\d{6})\.csv$/);
  while (<FH>) { 
    print "@dt[0] $_\n";
  }
  close(FH);
}
There is redundancy in this code as it first checks for all files ending in ".csv" (line 3) and subsequently parses the filename (line 6) looking for characters and digits. How do I change line 3 into a regular expression, such that line 6 can be removed and the array @dt be determined there?
  #2 (permalink)  
Old 07-16-2008
KevinADC KevinADC is offline Forum Advisor  
Registered User
  
 

Join Date: Jan 2008
Posts: 731
You can't. And there really is no redundancy as the glob <> first finds all files with .csv extention so you can open them, the regexp then parses those strings (the filenames) to extract more specific information.
  #3 (permalink)  
Old 07-16-2008
KevinADC KevinADC is offline Forum Advisor  
Registered User
  
 

Join Date: Jan 2008
Posts: 731
Well, I did come up with this, but it may not be any more efficient than what you had and might even be less efficient, you would have to benchmark both codes to know which is really better.

Code:
my %files = map {/^(\w+).\d{6}\.csv$/; $_ => $1} <*.csv>;
print Dumper \%files;
foreach my $file (keys %files) {
  open(FH, $file) || die("Error: Cannot open file $file for reading.");
  while (<FH>) { 
    print "$files{$file} $_\n";
  }
  close(FH);
}
this regexp probably needs refining:

/^(\w+).\d{6}\.csv$/

what is the dot in there for after (\w+)?
  #4 (permalink)  
Old 07-17-2008
figaro figaro is offline
Registered User
  
 

Join Date: Jan 2007
Posts: 267
Thank you for your reply and have been experimenting with this a little. Performance gain (or loss) is minor. Am still working on a built-in timer, but the differential is mere seconds (if any) on a total body of about 200 files and combined requiring 40MB.

And the dot (.) is the part of the file name: w+ being the standard file name and d{6} being the 24hr time of the time of download. So a file would have a name such as: scores.234506.csv

Last edited by figaro; 07-17-2008 at 01:09 PM..
  #5 (permalink)  
Old 07-17-2008
KevinADC KevinADC is offline Forum Advisor  
Registered User
  
 

Join Date: Jan 2008
Posts: 731
the dot should be escaped then, like the other dot in the regexp:

my %files = map {/^(\w+)\.\d{6}\.csv$/; $_ => $1} <*.csv>;
  #6 (permalink)  
Old 07-17-2008
ghostdog74 ghostdog74 is offline Forum Advisor  
Registered User
  
 

Join Date: Sep 2006
Posts: 2,512
if the file names are always named like scores.234506.csv, you can just use split on dots and then get array element 1. That should be your number. Easier than regexp.
  #7 (permalink)  
Old 07-17-2008
KevinADC KevinADC is offline Forum Advisor  
Registered User
  
 

Join Date: Jan 2008
Posts: 731
split() is a regexp.
Closed Thread

Bookmarks

Tags
perl, perl regex, regex

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT -4. The time now is 04:29 AM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited. Language Translations Powered by .
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios
The UNIX and Linux Forums Content Copyright ©1993-2009. All Rights Reserved.Ad Management by RedTyger

Content Relevant URLs by vBSEO 3.2.0