Regular Expression matching in PERL Post: 302320300

Sponsored Content

Top Forums Programming Regular Expression matching in PERL Post 302320300 by Feliix1956 on Wednesday 27th of May 2009 03:16:25 PM

05-27-2009

Registered User

I know its quite late to reply but this is how I would do what is described here:

Code:

#open file, read only
open(DATA, "<filename.txt");

open(SUBJECT, ">subject.txt");
open(COMMENT, ">comment.txt");
open(LENGTH, ">length.txt");

my $filetoprint = "";

#start a run through the file
while(<DATA>)
{
 #grab next line
 my $line = $_;
 # trim line breaks from $line and return it to the variable
 chomp($line);

 # Check start of line
 if ($line =~ m/^SUBJECT(.+)/)
 {
  # set variable indicator to Subject
  $filetoprint = "Subject";
  # remove first word from $line by passing the matched portion back into it
  $line "$1";
 }

 # Check start of line
 if ($line =~ m/^LENGTH(.+)/)
 {
  # set variable indicator to Length
  $filetoprint = "Length";
  # remove first word from $line by passing the matched portion back into it
  $line "$1";
 }


 # Check start of line
 if ($line =~ m/^COMMENT(.+)/)
 {
  # set variable indicator to Comment
  $filetoprint = "Comment";
  # remove first word from $line by passing the matched portion back into it
  $line "$1";
 }

# if there has been a previous match (this line or any following print out to the appropriate file
 if ($filetoprint eq "Subject") {print SUBJECT "$line\n";}
 if ($filetoprint eq "Comment") {print COMMENT "$line\n";}
 if ($filetoprint eq "Length")  {print LENGTH  "$line\n";}

}

close SUBJECT;
close COMMENT;
close LENGTH ;

Hope this helps anyone with a similar problem. you can also add a "terminating" string by writing a regular expression match for the desired character/string then set $filetoprint back to "" and printing anything from the line leading up to the match into the output file so it isnt lost.

to discern between one block and another you could add a variable that you increase by 1 each time you match a new chunk indicator (like for example a subject line) then you could add the number to the beginning of the line in the output file.

An advanced version might be to store the data in an array of hashes, reference the array by the number that iterates while reading the file and store the data from each line in the named part of the hash corresponding to the data type. eg in pseudo code:

Code:

if ($filetoprint eq detail)
{
 #print the detail content to the detail element of the current hash in the array
 $arrayofhashes[$i]->[detail] = "${$arrayofhashes[$i]->[detail]}$line\n";
}
etc

then you can count the array and print out in the format you want for webmail or forum software

Feliix1956

View Public Profile for Feliix1956

Find all posts by Feliix1956

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Regular expression matching a new line

I have written a script to test some isdn links in my network and I am trying to format the output to be more readable. Each line of the output has a different number of digits as follows... Sitename , spid1 12345678901234 1234567890 1234567 , spid2 1234567890 1234567890 1234567 Sitename , ...

2. Shell Programming and Scripting

regular expression in perl

hi, i want to extract the sessionID from this line. QnA Session Id : here the output should be-- QnA_SessionID=128589 Thanks NT

3. Shell Programming and Scripting

Help: Regular Expression for Negate Matching String

Hi guys, as per subject I am having problem with regular expressions. Example, if i got a string "javax.servlet.http.HttpServlet.service" that may occurred anywhere within a text file. How can I used the negate pattern matching of regular expression? I tried the below pattern but it...

4. Shell Programming and Scripting

Regular expression matching in BASH (equivalent of =~ in Perl)

In Perl I can write a condition that evaluates a match expression like this: if ($foo =~ /^bar/) { do blah blah blah } How do I write this in shell? What I need to know is what operator do I use? The '=~' doesn't seem to fit. I've tried different operators, I browsed the man page for...

5. Shell Programming and Scripting

Regular expression matching

Hi, I have a variable in my script that gets its value from a procstack output. It could be a number of any length, or it could just be a '1' with 0 or more white spaces around it. I would like to detect when this variable is just a 1 and not a 1234, for example. This is as far as I got: ...

6. Shell Programming and Scripting

Matching single quote in a regular expression

I trying to match the begining of the following line in a perl script with a regular expression. $ENV{'ORACLE_HOME'} I tried this regluar expession: /\$ENV\{\'ORACLE_HOME\'\}/ Instead of match, I got a blank prompt > It seems to be a problem with the single quote. If I take it...

7. Shell Programming and Scripting

Hidden Characters in Regular Expression Matching Perl - Perl Newbie

I am completely new to perl programming. My father is helping me learn said programming language. However, I am stuck on one of the assignments he has given me, and I can't find very much help with it via google, either because I have a tiny attention span, or because I can be very very dense. ...

8. Programming

Perl: How to read from a file, do regular expression and then replace the found regular expression

Hi all, How am I read a file, find the match regular expression and overwrite to the same files. open DESTINATION_FILE, "<tmptravl.dat" or die "tmptravl.dat"; open NEW_DESTINATION_FILE, ">new_tmptravl.dat" or die "new_tmptravl.dat"; while (<DESTINATION_FILE>) { # print...

9. UNIX for Dummies Questions & Answers

delete lines matching a regular expression

I have a very large file (over 700 million lines) that has some lines that I need to delete. An example of 5 lines of the file: HS4_80:8:2303:19153:193032 153 k80:138891 HS4_80:8:2105:5544:43174 89 k88:81949 165 k88:81949 323 0 * = 323 0 ...

10. Shell Programming and Scripting

regular expression matching whole words

Hi Consider the file this is a good line when running grep '\b(good|great|excellent)\b' file5 I expect it to match the line but it doesn't... what am i doing wrong?? (ultimately this regex will be in a awk script- just using grep to test it) Thanks, Storms

LEARN ABOUT OSX

trim

textutil::trim(n)				    Text and string utilities, macro processing 				 textutil::trim(n)

__________________________________________________________________________________________________________________________________________________

NAME

       textutil::trim - Procedures to trim strings

SYNOPSIS

       package require Tcl  8.2

       package require textutil::trim  ?0.7?

       ::textutil::trim::trim string ?regexp?

       ::textutil::trim::trimleft string ?regexp?

       ::textutil::trim::trimright string ?regexp?

       ::textutil::trim::trimPrefix string prefix

       ::textutil::trim::trimEmptyHeading string

_________________________________________________________________

DESCRIPTION

       The package textutil::trim provides commands that trim strings using arbitrary regular expressions.

       The complete set of procedures is described below.

       ::textutil::trim::trim string ?regexp?
	      Remove  in  string  any  leading	and  trailing  substring according to the regular expression regexp and return the result as a new
	      string.  This is done for all lines in the string, that is any substring between 2 newline chars, or between the	beginning  of  the
	      string  and  a  newline, or between a newline and the end of the string, or, if the string contain no newline, between the beginning
	      and the end of the string.  The regular expression regexp defaults to "[ \t]+".

       ::textutil::trim::trimleft string ?regexp?
	      Remove in string any leading substring according to the regular expression regexp and return the result as a new string. This  apply
	      on  any  line in the string, that is any substring between 2 newline chars, or between the beginning of the string and a newline, or
	      between a newline and the end of the string, or, if the string contain no newline, between the beginning and the end of the  string.
	      The regular expression regexp defaults to "[ \t]+".

       ::textutil::trim::trimright string ?regexp?
	      Remove in string any trailing substring according to the regular expression regexp and return the result as a new string. This apply
	      on any line in the string, that is any substring between 2 newline chars, or between the beginning of the string and a  newline,	or
	      between  a newline and the end of the string, or, if the string contain no newline, between the beginning and the end of the string.
	      The regular expression regexp defaults to "[ \t]+".

       ::textutil::trim::trimPrefix string prefix
	      Removes the prefix from the beginning of string and returns the result. The string is left unchanged if it doesn't  have	prefix	at
	      its beginning.

       ::textutil::trim::trimEmptyHeading string
	      Looks  for  empty  lines (including lines consisting of only whitespace) at the beginning of the string and removes it. The modified
	      string is returned as the result of the command.

BUGS, IDEAS, FEEDBACK
       This document, and the package it describes, will undoubtedly contain bugs and other problems.  Please report such in the category textutil
       of the Tcllib SF Trackers [http://sourceforge.net/tracker/?group_id=12883].  Please also report any ideas for enhancements you may have for
       either package and/or documentation.

SEE ALSO

       regexp(n), split(n), string(n)

KEYWORDS

       prefix, regular expression, string, trimming

CATEGORY

       Text processing

textutil								0.7							 textutil::trim(n)

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Regular expression matching a new line

Discussion started by: drheams

2. Shell Programming and Scripting

regular expression in perl

Discussion started by: namishtiwari

3. Shell Programming and Scripting

Help: Regular Expression for Negate Matching String

Discussion started by: DrivesMeCrazy

4. Shell Programming and Scripting

Regular expression matching in BASH (equivalent of =~ in Perl)

Discussion started by: indiana_tas