Perl: Regular expression tweaking?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Perl: Regular expression tweaking?
# 1  
Old 10-25-2012
Perl: Regular expression tweaking?

Hello!
I'm trying to tweak my regular expression to take care of this tedious little "blank space" problem. I don't know what's causing the " : 2 times, lines 1, 5," to be printed.

Here is what the input looks like:
Image

Here's what the output is supposed to look like:
Image

Here's my code:
Code:
    $line[$i] = lc($line[$i]);

    $line[$i] =~ s/\W/ /g;
    $line[$i] =~ s/\b[^ \t]\b//g;
    # Removes any word less than two letters.
    $line[$i] =~ s/[^a-z\n ]//g;

Here's my input and output:
Code:
Test:desktop D2K$ perl working.pl 
#!/usr/bin/perl -w

use strict;

# This line will print a hello world line.
print "Hello world!\n";

exit 0;
      :    2 times, lines: 1,  5, 
   bin:    1 times, lines: 1, 
  exit:    1 times, lines: 8
 hello:    2 times, lines: 5, 
  line:    2 times, lines: 5, 
  perl:    1 times, lines: 1, 
 print:    2 times, lines: 5,  6, 
strict:    1 times, lines: 3, 
  this:    1 times, lines: 5, 
   use:    1 times, lines: 3, 
   usr:    1 times, lines: 1, 
  will:    1 times, lines: 5, 
 world:    2 times, lines: 5, 
Test:desktop D2K$

Thanks for looking! Smilie
# 2  
Old 10-26-2012
Try it using just these two regular expressions:
Code:
$line[$i] =~ s/[\W\s]/ /g;
$line[$i] =~ s/ \w | \w\w |\s/g;

This User Gave Thanks to spacebar For This Post:
# 3  
Old 10-26-2012
Thanks for the reply. I tried the expression's you've provided and it still has problems. I believe it's just the blank line in general when doing output. Your response is good enough for a thanks. Smilie
# 4  
Old 10-26-2012
Quote:
Originally Posted by D2K
...
I don't know what's causing the " : 2 times, lines 1, 5," to be printed.

...
Here's my code:
Code:
    $line[$i] = lc($line[$i]);
 
    $line[$i] =~ s/\W/ /g;
    $line[$i] =~ s/\b[^ \t]\b//g;
    # Removes any word less than two letters.
    $line[$i] =~ s/[^a-z\n ]//g;

...
You haven't provided enough information. In order for us to figure out why " : 2 times, lines 1, 5," is being printed, you'll have to show the code that actually loops through your data structure and prints that line.

The code that you have pasted simply "prepares" each line to be populated into some kind of data structure. And it looks okay to me. See below:

Code:
$
$ # Your input file
$ cat -n input.pl
     1  #!/usr/bin/perl -w
     2
     3  use strict;
     4
     5  # This line will print a hello world line.
     6  print "Hello world!\n";
     7
     8  exit 0;
$
$ # Your Perl program that processes your input file
$ # (This is "likely", since you haven't posted your Perl program).
$ cat -n process.pl
     1  #!perl -w
     2  my $file = "input.pl";
     3  open (FH, "<", $file) or die "Can't open $file: $!";
     4  while (<FH>) {
     5    push @line, $_;
     6  }
     7  close (FH) or die "Can't close $file: $!";
     8
     9  for ($i=0; $i<=$#line; $i++) {
    10    print "i = $i\n";
    11    $line[$i] = lc($line[$i]);
    12    $line[$i] =~ s/\W/ /g;
    13    $line[$i] =~ s/\b[^ \t]\b//g;
    14    $line[$i] =~ s/[^a-z\n ]//g;
    15    print $line[$i],"\n";
    16  }
    17
$
$ # Perl program execution
$ perl process.pl
i = 0
   usr bin perl
i = 1
i = 2
use strict
i = 3
i = 4
  this line will print  hello world line
i = 5
print  hello world
i = 6
i = 7
exit
$
$

So, yes, it is fine. You are able to "cleanse" each array element to retain only the words that you are interested in.

But what do you do next? Split into array and populate a hash? Something else? Whatever you are doing next is where the problem lies.

tyler_durden
# 5  
Old 10-26-2012
Quote:
Originally Posted by durden_tyler
You haven't provided enough information. In order for us to figure out why " : 2 times, lines 1, 5," is being printed, you'll have to show the code that actually loops through your data structure and prints that line.

The code that you have pasted simply "prepares" each line to be populated into some kind of data structure. And it looks okay to me. See below:

Code:
$
$ # Your input file
$ cat -n input.pl
     1  #!/usr/bin/perl -w
     2
     3  use strict;
     4
     5  # This line will print a hello world line.
     6  print "Hello world!\n";
     7
     8  exit 0;
$
$ # Your Perl program that processes your input file
$ # (This is "likely", since you haven't posted your Perl program).
$ cat -n process.pl
     1  #!perl -w
     2  my $file = "input.pl";
     3  open (FH, "<", $file) or die "Can't open $file: $!";
     4  while (<FH>) {
     5    push @line, $_;
     6  }
     7  close (FH) or die "Can't close $file: $!";
     8
     9  for ($i=0; $i<=$#line; $i++) {
    10    print "i = $i\n";
    11    $line[$i] = lc($line[$i]);
    12    $line[$i] =~ s/\W/ /g;
    13    $line[$i] =~ s/\b[^ \t]\b//g;
    14    $line[$i] =~ s/[^a-z\n ]//g;
    15    print $line[$i],"\n";
    16  }
    17
$
$ # Perl program execution
$ perl process.pl
i = 0
   usr bin perl
i = 1
i = 2
use strict
i = 3
i = 4
  this line will print  hello world line
i = 5
print  hello world
i = 6
i = 7
exit
$
$

So, yes, it is fine. You are able to "cleanse" each array element to retain only the words that you are interested in.

But what do you do next? Split into array and populate a hash? Something else? Whatever you are doing next is where the problem lies.

tyler_durden
Sorry about that. The only little tweak I have now is that the hash is picking up the blank lines from input. I don't want the blank line. That's the reason I posted the " : 2 times, lines 1, 5," in the example. I know there is a regular expression I can use during the loop in the output, but I'm not sure what it would look like.

---------- Post updated at 02:40 PM ---------- Previous update was at 01:58 PM ----------

Thanks everyone for looking Smilie. I just created an 'if' statement in my output look like the one below:
Code:
if ( ! $_ =~ m/^\s*$/)
         {
            printf ("%${currLongest}s: %4d times, lines: %0s\n", $_, $occurCount{$_}, $lineConcat{$_});
         }

# 6  
Old 10-26-2012
Not sure if your problem is solved, but here's a solution:

Code:
$
$ # The data file
$ cat -n input.pl
     1  #!/usr/bin/perl -w
     2
     3  use strict;
     4
     5  # This line will print a hello world line.
     6  print "Hello world!\n";
     7
     8  exit 0;
$
$ # The Perl program
$ cat -n process1.pl
     1  #!perl -w
     2  use strict;
     3
     4  my @line;
     5  my %words;
     6
     7  # Store the file contents in an array
     8  my $file = $ARGV[0];
     9  open (FH, "<", $file) or die "Can't open $file: $!";
    10  while (<FH>) {
    11    push @line, $_;
    12  }
    13  close (FH) or die "Can't close $file: $!";
    14
    15  for (my $i=0; $i<=$#line; $i++) {
    16    $line[$i] = lc($line[$i]);
    17    $line[$i] =~ s/\W/ /g;
    18    $line[$i] =~ s/\b[^ \t]\b//g;
    19    $line[$i] =~ s/[^a-z\n ]//g;
    20
    21    # Create a hash called "words", each key of which is a word.
    22    # The value is an arrayref, the first element of which is the word count
    23    # and the remaining elements are the line numbers where the words occur.
    24    foreach my $j (split/[ ]+/, $line[$i]) {
    25      if ($j ne "") {
    26        # increment the word count
    27        $words{$j}->[0]++;
    28        # ensure that the line numbers are unique
    29        if ($#{$words{$j}} == 0 or ($i+1) != $words{$j}->[$#{$words{$j}}]) {
    30          push @{$words{$j}}, ($i+1);
    31        }
    32      }
    33    }
    34  }
    35
    36  # finally, simply the print the hash in desired format
    37  foreach my $k (sort keys %words) {
    38    my $count = shift @{$words{$k}};
    39    printf ("%10s: %5d times, lines: %s\n", $k, $count, join(",", @{$words{$k}}));
    40  }
    41
$
$
$ # The execution of the Perl program
$ perl process1.pl input.pl
       bin:     1 times, lines: 1
      exit:     1 times, lines: 8
     hello:     2 times, lines: 5,6
      line:     2 times, lines: 5
      perl:     1 times, lines: 1
     print:     2 times, lines: 5,6
    strict:     1 times, lines: 3
      this:     1 times, lines: 5
       use:     1 times, lines: 3
       usr:     1 times, lines: 1
      will:     1 times, lines: 5
     world:     2 times, lines: 5,6
$
$

tyler_durden
This User Gave Thanks to durden_tyler For This Post:
# 7  
Old 10-27-2012
Quote:
Originally Posted by durden_tyler
Not sure if your problem is solved, but here's a solution:
Thanks for the reply. I fixed the issue earlier. The post must have been concatenated to my earlier post with the code I used.Smilie
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Perl regular expression

Hi , I have the below array my @actionText = ("delivered to governor on 21/23/3345" , "deliver jllj" , "ram 2345/43"); When i am trying to grep the contents of array and if mathced substituting with the digitis or some date format from the element like below my @action = grep { $_ =~... (7 Replies)
Discussion started by: ragilla
7 Replies

2. Programming

Perl: How to read from a file, do regular expression and then replace the found regular expression

Hi all, How am I read a file, find the match regular expression and overwrite to the same files. open DESTINATION_FILE, "<tmptravl.dat" or die "tmptravl.dat"; open NEW_DESTINATION_FILE, ">new_tmptravl.dat" or die "new_tmptravl.dat"; while (<DESTINATION_FILE>) { # print... (1 Reply)
Discussion started by: jessy83
1 Replies

3. Shell Programming and Scripting

Hidden Characters in Regular Expression Matching Perl - Perl Newbie

I am completely new to perl programming. My father is helping me learn said programming language. However, I am stuck on one of the assignments he has given me, and I can't find very much help with it via google, either because I have a tiny attention span, or because I can be very very dense. ... (4 Replies)
Discussion started by: kittyluva2
4 Replies

4. Shell Programming and Scripting

Perl Regular Expression

Hello, I am trying to use perl LWP module to read and get a specfic URL page. The issue is that the URL ends with the data and time and time is not consistent it changes all the time. if anyone could help me how to write a regular expressin that would work in the LWP::UserAgent get function to... (0 Replies)
Discussion started by: bataf
0 Replies

5. Shell Programming and Scripting

Need perl regular expression

Hi, I am looking for a Perl regular expression to match the below pattern of a java script file. var so = object.device.load('camera','value'); I want to grep out such lines present in the *.js files. The conditions are: a) the line may start with blank space(s) b) always the... (3 Replies)
Discussion started by: royalibrahim
3 Replies

6. Shell Programming and Scripting

Regular expression in Perl

Hi, I need and expression for a word like abc_xyz_ykklm The expresion should indicate that the word starts with abc and end with ykklm but does not contain xyz string in the middle. Example: abc_tmn_ykklm is ok and abc_xyz_ykklm is not Ok. Please help. Regards. (1 Reply)
Discussion started by: asth
1 Replies

7. Shell Programming and Scripting

perl regular expression

Dear all, I have a simple issue on a perl regular expression. I want to get the characters in red from the next lines : POWER_key LEFT_key RIGHT_key OK_key DOWN_key and so on... Thanks in advance for reply. Ludo (1 Reply)
Discussion started by: lsaas
1 Replies

8. Shell Programming and Scripting

PERL regular expression

Hello all, I need to match the red expressions in the following lines : MACRO_P+P-_scrambledServices_REM_PRC30.xml MACRO_P+P-_scrambledServices_REM_RS636.xml MACRO_P+P-_scrambledServices_REM_RS535.xml and so on... Can anyone give me a PERL regular expression to match those characters ? ... (5 Replies)
Discussion started by: lsaas
5 Replies

9. Shell Programming and Scripting

regular expression in perl

hi, i want to extract the sessionID from this line. QnA Session Id : here the output should be-- QnA_SessionID=128589 Thanks NT (3 Replies)
Discussion started by: namishtiwari
3 Replies

10. Shell Programming and Scripting

perl regular expression

letz say that my file has 7 records with only one field. So my file has: 11111111 000000000000000 1111 aaaabbbccc 1111111222000000 aaaaaaaa zz All i need is: 1. when the field has a repetition of the same instance(a-z or 0-9), i would consideer it to be invalid.... (1 Reply)
Discussion started by: helengoldman
1 Replies
Login or Register to Ask a Question