Perl regex question


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Perl regex question
# 1  
Old 07-16-2008
Perl regex question

I have the following code:
Code:
#!/usr/bin/perl -w

@files = <*.csv>;
foreach $file (@files) {
  open(FH, $file) || die("Error: Cannot open file $file for reading.");
  my @dt = ($file =~ /^(\w+).(\d{6})\.csv$/);
  while (<FH>) { 
    print "@dt[0] $_\n";
  }
  close(FH);
}

There is redundancy in this code as it first checks for all files ending in ".csv" (line 3) and subsequently parses the filename (line 6) looking for characters and digits. How do I change line 3 into a regular expression, such that line 6 can be removed and the array @dt be determined there?
# 2  
Old 07-16-2008
You can't. And there really is no redundancy as the glob <> first finds all files with .csv extention so you can open them, the regexp then parses those strings (the filenames) to extract more specific information.
# 3  
Old 07-16-2008
Well, I did come up with this, but it may not be any more efficient than what you had and might even be less efficient, you would have to benchmark both codes to know which is really better.

Code:
my %files = map {/^(\w+).\d{6}\.csv$/; $_ => $1} <*.csv>;
print Dumper \%files;
foreach my $file (keys %files) {
  open(FH, $file) || die("Error: Cannot open file $file for reading.");
  while (<FH>) { 
    print "$files{$file} $_\n";
  }
  close(FH);
}

this regexp probably needs refining:

/^(\w+).\d{6}\.csv$/

what is the dot in there for after (\w+)?
# 4  
Old 07-17-2008
Thank you for your reply and have been experimenting with this a little. Performance gain (or loss) is minor. Am still working on a built-in timer, but the differential is mere seconds (if any) on a total body of about 200 files and combined requiring 40MB.

And the dot (.) is the part of the file name: w+ being the standard file name and d{6} being the 24hr time of the time of download. So a file would have a name such as: scores.234506.csv

Last edited by figaro; 07-17-2008 at 02:09 PM..
# 5  
Old 07-17-2008
the dot should be escaped then, like the other dot in the regexp:

my %files = map {/^(\w+)\.\d{6}\.csv$/; $_ => $1} <*.csv>;
# 6  
Old 07-17-2008
if the file names are always named like scores.234506.csv, you can just use split on dots and then get array element 1. That should be your number. Easier than regexp.
# 7  
Old 07-18-2008
split() is a regexp.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Perl, RegEx - Help me to understand the regex!

I am not a big expert in regex and have just little understanding of that language. Could you help me to understand the regular Perl expression: ^(?!if\b|else\b|while\b|)(?:+?\s+){1,6}(+\s*)\(*\) *?(?:^*;?+){0,10}\{ ------ This is regex to select functions from a C/C++ source and defined in... (2 Replies)
Discussion started by: alex_5161
2 Replies

2. Shell Programming and Scripting

Perl regex question

Hi Guys, I am trying to work out the regular expression that I would need to capture the below information. I need to find the word SAC followed by using the data thats contained on the next line. I have other expressions that i have configured but none are where the output is on two... (2 Replies)
Discussion started by: mutley2202
2 Replies

3. Shell Programming and Scripting

Regex Question

Hi I am trying to match lines having following string BIND dn="uid= putting something like this is not working : /\sBIND dn="uid=/ Any suggestion. Thanks. John (9 Replies)
Discussion started by: john_prince
9 Replies

4. Shell Programming and Scripting

Converting perl regex to sed regex

I am having trouble parsing rpm filenames in a shell script.. I found a snippet of perl code that will perform the task but I really don't have time to rewrite the entire script in perl. I cannot for the life of me convert this code into something sed-friendly: if ($rpm =~ /(*)-(*)-(*)\.(.*)/)... (1 Reply)
Discussion started by: suntzu
1 Replies

5. Shell Programming and Scripting

Question on regex with * and .

I have a basic question regarding * and . while using regex: # echo 3 | grep ^*$ 3 I think I understood why it outputs "3" here (because '*' matches zero or more of the previous character) but I don't understand the output of the following command: # echo 3 | grep ^.$ # I thought I... (7 Replies)
Discussion started by: mirage
7 Replies

6. Shell Programming and Scripting

perl: question about the regex "=~"

Hello all Is there a "not" reversal method for the =~ regex thingy in perl ? for example, in the snippet below, i have placed a ! in front of the =~ to "not it".. although it quite obviously doesn't work and is just me trying to get across the question in a way that somebody may understand :o... (2 Replies)
Discussion started by: rethink
2 Replies

7. Shell Programming and Scripting

Perl regex question

$var=~ s#(\n?<a>.*?</a>\n)##s $pat=$1 Recently i came across this bit of a code. Can someone please explain the function of these two line? (5 Replies)
Discussion started by: King Nothing
5 Replies

8. Shell Programming and Scripting

regex question

Hi, im sure this is really simple but i cant quite figure it out. how do i test against a word at the beginning of the line but up to the point of a delimiter i.e. ":" for example if i wanted to test against the user in the /etc/passwd file peter:x:101:100:peters account:/var/peter:/bin/sh ... (3 Replies)
Discussion started by: hcclnoodles
3 Replies

9. Shell Programming and Scripting

regex question

Hi I have a question on regex There is a line in a script like my_file="$(echo SunMonTueWed | sed "s//_&g") " My question what does the expression _&g do. Obviously in this example the output is _Sun_Mon_Tue_Wed Another question can i use some trick to get the result like... (3 Replies)
Discussion started by: xiamin
3 Replies

10. UNIX for Dummies Questions & Answers

regex question

hi, i got a problem with understanding regular expressions. what i wanna do is scanning the wtmp logfile for ips and if a specific ip is echoed id like to be a part of a text to be assigned to it. the scanning is done with #! /bin/bash cat wtmp | strings | egrep -o "+\.+\.+\." | sort -u... (6 Replies)
Discussion started by: rocketkids
6 Replies
Login or Register to Ask a Question