perl basic multiple pattern matching

11-01-2010

Registered User

2, 0

Join Date: Nov 2010

Last Activity: 2 November 2010, 8:32 AM EDT

Posts: 2

Thanks Given: 1

Thanked 0 Times in 0 Posts

perl basic multiple pattern matching

Hi everyone, and thank you for your help with this. I am VERY new with perl so all of your help is appreciated. I have tried google but as I don't know the proper terms to search for and could be daunting for a newbie scripter... I know this is very easy for most of you! Thanks!

I have a multi-gig file of the repeated format:

Code:

<form name="profileForm" action="/profile.php" method="post">		

<style type="text/css">

.required {
	font-size: small;
	color: #f00;
}

</style>

<table border="0" cellpadding="3" cellspacing="1">
<tr>
	<td>&nbsp;</td>
	<td class="label">First Name</td>
	<td class="label">Last Name</td>
</tr>
<tr valign="top">
	<td class="label">Name: <span class="required">*</span></td>
	<td class="row"><input type="text" maxlength="20" name="firstName" size="20" value="su"  /></td>
	<td class="row"><input type="text" maxlength="20" name="lastName" size="20" value="chingping"   />
	<input type="hidden" name="customerNumber" value=""  /></td>
</tr>
<tr valign="top">
	<td class="label">Job Title: <span class="required">*</span></td>
	<td colspan="2" class="row"><input type="text" maxlength="30" name="jobTitleOther" size="30" value="miss"  /></td>
</tr>

<tr valign="top">
	<td class="label">Company: <span class="required">*</span></td>
	<td colspan="2" class="row"><input type="text" maxlength="30" name="company" size="30" value="omd"  /></td>
</tr>

I want to use perl to read in this text file, "out1.txt" and (parse?) it into the values, firstname, last name, job title, company, etc. etc. and output to a csv file

I know that for each of these values, they occur within a specific pattern eg. the "Company" value I want will be always be <td colspan="2" class="row"><input type="text" maxlength="30" name="company" size="30" value="HERE" /></td>. And the other patterns will occur in the same place in similar strings. I know ALL records will exist for each "person"

Is there a good script that is already written that is close OR can someone help me formulate from this to perl :

Code:

Open file
Read in each line 
While new line exists,
If pattern is (for example) 	<td class="row"><input type="text" maxlength="20" name="firstName" size="20" value="su"  /></td>  
 output "firstName" to the first csv field, or if pattern 
is <td colspan="2" class="row"><input type="text" maxlength="30" name="company" size="30" value="HERE" output value "HERE"  to the third csv field,

I am just looking for basic framework for one or two sequential patterns, the while loop, etc.

The problems for me is matching values in a specific location of multiple known strings in sequential order and putting them into a csv file.

Thanks for your help!

Last edited by sinusoid; 11-01-2010 at 03:50 PM.. Reason: making a little more clear

sinusoid

View Public Profile for sinusoid

Find all posts by sinusoid

11-02-2010

Registered User

34, 8

Join Date: Oct 2007

Last Activity: 3 December 2012, 1:00 AM EST

Posts: 34

Thanks Given: 0

Thanked 8 Times in 7 Posts

Just one of a million Perl solutions:

Code:

#!/usr/bin/perl -w

open(IN,"out1.txt") || die("Could not open infile!");
open(OUT,">extract.txt") || die("Could not open outfile!");
foreach $line (<IN>) {
  if (rindex($line,"firstName") > -1) {
    @splitLine = split(/"/, $line);
    print(OUT $splitLine[11].",");
  } elsif (rindex($line,"lastName") > -1) {
    @splitLine = split(/"/, $line);
    print(OUT $splitLine[11].",");
  } elsif (rindex($line,"jobTitleOther") > -1) {
    @splitLine = split(/"/, $line);
    print(OUT $splitLine[13].",");
  } elsif (rindex($line,"company") > -1) {
    @splitLine = split(/"/, $line);
    print(OUT $splitLine[13]."\n");
  }
}

This User Gave Thanks to turk451 For This Post:

turk451

View Public Profile for turk451

Find all posts by turk451

11-02-2010

Registered User

2, 0

Join Date: Nov 2010

Last Activity: 2 November 2010, 8:32 AM EDT

Posts: 2

Thanks Given: 1

Thanked 0 Times in 0 Posts

trying now ---- you are a LIFE saver.

---------- Post updated at 08:32 AM ---------- Previous update was at 08:00 AM ----------

okay -- so quick question so I can modify

Can someone quickly explain

for

Code:

if (rindex($line,"firstName") > -1) {
   @splitLine = split(/"/, $line);

is it indexing the last character position in "firstName" and then splitting on that, or is the split(/"/" a regex expression... not sure

sinusoid

View Public Profile for sinusoid

Find all posts by sinusoid

11-02-2010

Registered User

2,100, 402

Join Date: Apr 2009

Last Activity: 11 February 2020, 10:24 AM EST

Posts: 2,100

Thanks Given: 26

Thanked 402 Times in 360 Posts

Yet another Perl solution:

Code:

$
$ # show the content of the input data file "f0"
$ cat f0
<form name="profileForm" action="/profile.php" method="post">
<style type="text/css">
.required {
        font-size: small;
        color: #f00;
}
</style>
<table border="0" cellpadding="3" cellspacing="1">
<tr>
        <td>&nbsp;</td>
        <td class="label">First Name</td>
        <td class="label">Last Name</td>
</tr>
<tr valign="top">
        <td class="label">Name: <span class="required">*</span></td>
        <td class="row"><input type="text" maxlength="20" name="firstName" size="20" value="su"  /></td>
        <td class="row"><input type="text" maxlength="20" name="lastName" size="20" value="chingping"   />
        <input type="hidden" name="customerNumber" value=""  /></td>
</tr>
<tr valign="top">
        <td class="label">Job Title: <span class="required">*</span></td>
        <td colspan="2" class="row"><input type="text" maxlength="30" name="jobTitleOther" size="30" value="miss"  /></td>
</tr>
<tr valign="top">
        <td class="label">Company: <span class="required">*</span></td>
        <td colspan="2" class="row"><input type="text" maxlength="30" name="company" size="30" value="omd"  /></td>
</tr>
$
$ # run the Perl one-liner on the file "f0"
$ perl -lne 'if (/.*name="(firstName|lastName|jobTitleOther|company)".*?value="(.*?)"/) {
               $x .= ",$2";
               if ($1 eq "company") {print substr($x,1); $x=""}
             }' f0
su,chingping,miss,omd
$
$
$

tyler_durden

---------- Post updated at 09:00 AM ---------- Previous update was at 08:39 AM ----------

Quote:

Originally Posted by sinusoid

...
Can someone quickly explain

for

Code:

if (rindex($line,"firstName") > -1) {
   @splitLine = split(/"/, $line);

is it indexing the last character position in "firstName" and then splitting on that, or is the split(/"/" a regex expression...

The "rindex" function in this expression:

Code:

rindex (str, substr)

returns the position of the last (i.e. rightmost) occurrence of substr in str.
If substr doesn't exist in str, then it returns -1.

So the condition -

Code:

if (rindex($line,"firstName") > -1) {

checks if the rightmost index of "firstName" in $line is greater than -1. In other words, it checks if "firstName" exists in $line.

If it does, then this statement -

Code:

   @splitLine = split(/"/, $line);

splits $line on the literal double-quotes character and assigns the tokens (or split elements) to the array "@splitLine".

As an example:

Code:

@x = split (/:/, "abc:def:ghijk:l")

will split the string "abc:def:ghijk:l" on the literal semi-colon character (":") and assign the split elements to the array @x. So, after that operation, @x will have-

"abc" at index 0,
"def" at index 1,
"ghijk" at index 2 and
"l" at index 3.

The "//" in the split function allows regexes to be used, instead of literal characters. So, for instance, if the string you want to split is "a b c d e", and the number of spaces between the elements is variable, then you can use a regex in the split condition like so:

Code:

$
$
$ perl -le '@x = split(/[ ]+/, "a       b  c    d      e"); print $_ foreach (@x)'
a
b
c
d
e
$
$

You could use double-quotes instead of "//".

After $line is split on double-quotes and assigned to @splitLine, the value of "firstName" is the 11 element of that array.

HTH,
tyler_durden

durden_tyler

View Public Profile for durden_tyler

Find all posts by durden_tyler

11-03-2010

Registered User

34, 8

Join Date: Oct 2007

Last Activity: 3 December 2012, 1:00 AM EST

Posts: 34

Thanks Given: 0

Thanked 8 Times in 7 Posts

I like Tyler's solution better

turk451

View Public Profile for turk451

Find all posts by turk451

Shell Programming and Scripting

perl basic multiple pattern matching

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Perl - Use of *? in Matching Pattern

Discussion started by: jnrohit2k

2. Shell Programming and Scripting

Pattern matching in Perl

Discussion started by: X-Or

3. Shell Programming and Scripting

Help need with PERL multiple search pattern matching!

Discussion started by: sags007_99

4. Shell Programming and Scripting

Need help with perl pattern matching

Discussion started by: sags007_99

5. Shell Programming and Scripting

Pattern Matching in PERL

Discussion started by: aravindj80

6. Shell Programming and Scripting

Perl Pattern matching...

Discussion started by: msrahman

7. Shell Programming and Scripting

Perl pattern matching!!

Discussion started by: nmattam

8. Shell Programming and Scripting

Perl Pattern Matching

Discussion started by: nmattam

9. Shell Programming and Scripting

Perl Pattern Matching !!! Help

Discussion started by: maxmave

10. Shell Programming and Scripting

perl pattern matching

Discussion started by: zedex