perl basic multiple pattern matching


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting perl basic multiple pattern matching
# 1  
Old 11-01-2010
Computer perl basic multiple pattern matching

Hi everyone, and thank you for your help with this. I am VERY new with perl so all of your help is appreciated. I have tried google but as I don't know the proper terms to search for and could be daunting for a newbie scripter... I know this is very easy for most of you! Thanks!

I have a multi-gig file of the repeated format:

Code:
<form name="profileForm" action="/profile.php" method="post">		

<style type="text/css">

.required {
	font-size: small;
	color: #f00;
}

</style>

<table border="0" cellpadding="3" cellspacing="1">
<tr>
	<td>&nbsp;</td>
	<td class="label">First Name</td>
	<td class="label">Last Name</td>
</tr>
<tr valign="top">
	<td class="label">Name: <span class="required">*</span></td>
	<td class="row"><input type="text" maxlength="20" name="firstName" size="20" value="su"  /></td>
	<td class="row"><input type="text" maxlength="20" name="lastName" size="20" value="chingping"   />
	<input type="hidden" name="customerNumber" value=""  /></td>
</tr>
<tr valign="top">
	<td class="label">Job Title: <span class="required">*</span></td>
	<td colspan="2" class="row"><input type="text" maxlength="30" name="jobTitleOther" size="30" value="miss"  /></td>
</tr>

<tr valign="top">
	<td class="label">Company: <span class="required">*</span></td>
	<td colspan="2" class="row"><input type="text" maxlength="30" name="company" size="30" value="omd"  /></td>
</tr>


I want to use perl to read in this text file, "out1.txt" and (parse?) it into the values, firstname, last name, job title, company, etc. etc. and output to a csv file

I know that for each of these values, they occur within a specific pattern eg. the "Company" value I want will be always be <td colspan="2" class="row"><input type="text" maxlength="30" name="company" size="30" value="HERE" /></td>. And the other patterns will occur in the same place in similar strings. I know ALL records will exist for each "person"

Is there a good script that is already written that is close OR can someone help me formulate from this to perl :

Code:
Open file
Read in each line 
While new line exists,
If pattern is (for example) 	<td class="row"><input type="text" maxlength="20" name="firstName" size="20" value="su"  /></td>  
 output "firstName" to the first csv field, or if pattern 
is <td colspan="2" class="row"><input type="text" maxlength="30" name="company" size="30" value="HERE" output value "HERE"  to the third csv field,

I am just looking for basic framework for one or two sequential patterns, the while loop, etc.

The problems for me is matching values in a specific location of multiple known strings in sequential order and putting them into a csv file.

Thanks for your help!

Last edited by sinusoid; 11-01-2010 at 03:50 PM.. Reason: making a little more clear
# 2  
Old 11-02-2010
Just one of a million Perl solutions:
Code:
#!/usr/bin/perl -w

open(IN,"out1.txt") || die("Could not open infile!");
open(OUT,">extract.txt") || die("Could not open outfile!");
foreach $line (<IN>) {
  if (rindex($line,"firstName") > -1) {
    @splitLine = split(/"/, $line);
    print(OUT $splitLine[11].",");
  } elsif (rindex($line,"lastName") > -1) {
    @splitLine = split(/"/, $line);
    print(OUT $splitLine[11].",");
  } elsif (rindex($line,"jobTitleOther") > -1) {
    @splitLine = split(/"/, $line);
    print(OUT $splitLine[13].",");
  } elsif (rindex($line,"company") > -1) {
    @splitLine = split(/"/, $line);
    print(OUT $splitLine[13]."\n");
  }
}

This User Gave Thanks to turk451 For This Post:
# 3  
Old 11-02-2010
trying now ---- you are a LIFE saver.

---------- Post updated at 08:32 AM ---------- Previous update was at 08:00 AM ----------

okay -- so quick question so I can modify

Can someone quickly explain


for
Code:
if (rindex($line,"firstName") > -1) {
   @splitLine = split(/"/, $line);

is it indexing the last character position in "firstName" and then splitting on that, or is the split(/"/" a regex expression... not sure
# 4  
Old 11-02-2010
Yet another Perl solution:

Code:
$
$ # show the content of the input data file "f0"
$ cat f0
<form name="profileForm" action="/profile.php" method="post">
<style type="text/css">
.required {
        font-size: small;
        color: #f00;
}
</style>
<table border="0" cellpadding="3" cellspacing="1">
<tr>
        <td>&nbsp;</td>
        <td class="label">First Name</td>
        <td class="label">Last Name</td>
</tr>
<tr valign="top">
        <td class="label">Name: <span class="required">*</span></td>
        <td class="row"><input type="text" maxlength="20" name="firstName" size="20" value="su"  /></td>
        <td class="row"><input type="text" maxlength="20" name="lastName" size="20" value="chingping"   />
        <input type="hidden" name="customerNumber" value=""  /></td>
</tr>
<tr valign="top">
        <td class="label">Job Title: <span class="required">*</span></td>
        <td colspan="2" class="row"><input type="text" maxlength="30" name="jobTitleOther" size="30" value="miss"  /></td>
</tr>
<tr valign="top">
        <td class="label">Company: <span class="required">*</span></td>
        <td colspan="2" class="row"><input type="text" maxlength="30" name="company" size="30" value="omd"  /></td>
</tr>
$
$ # run the Perl one-liner on the file "f0"
$ perl -lne 'if (/.*name="(firstName|lastName|jobTitleOther|company)".*?value="(.*?)"/) {
               $x .= ",$2";
               if ($1 eq "company") {print substr($x,1); $x=""}
             }' f0
su,chingping,miss,omd
$
$
$

tyler_durden

---------- Post updated at 09:00 AM ---------- Previous update was at 08:39 AM ----------

Quote:
Originally Posted by sinusoid
...
Can someone quickly explain


for
Code:
if (rindex($line,"firstName") > -1) {
   @splitLine = split(/"/, $line);

is it indexing the last character position in "firstName" and then splitting on that, or is the split(/"/" a regex expression...
The "rindex" function in this expression:

Code:
rindex (str, substr)

returns the position of the last (i.e. rightmost) occurrence of substr in str.
If substr doesn't exist in str, then it returns -1.

So the condition -

Code:
if (rindex($line,"firstName") > -1) {

checks if the rightmost index of "firstName" in $line is greater than -1. In other words, it checks if "firstName" exists in $line.

If it does, then this statement -

Code:
   @splitLine = split(/"/, $line);

splits $line on the literal double-quotes character and assigns the tokens (or split elements) to the array "@splitLine".

As an example:

Code:
@x = split (/:/, "abc:def:ghijk:l")

will split the string "abc:def:ghijk:l" on the literal semi-colon character (":") and assign the split elements to the array @x. So, after that operation, @x will have-

"abc" at index 0,
"def" at index 1,
"ghijk" at index 2 and
"l" at index 3.

The "//" in the split function allows regexes to be used, instead of literal characters. So, for instance, if the string you want to split is "a b c d e", and the number of spaces between the elements is variable, then you can use a regex in the split condition like so:

Code:
$
$
$ perl -le '@x = split(/[ ]+/, "a       b  c    d      e"); print $_ foreach (@x)'
a
b
c
d
e
$
$

You could use double-quotes instead of "//".

After $line is split on double-quotes and assigned to @splitLine, the value of "firstName" is the 11 element of that array.

HTH,
tyler_durden
# 5  
Old 11-03-2010
I like Tyler's solution better Smilie
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Perl - Use of *? in Matching Pattern

I am using Perl version 5.8.4 and trying to understand the use of regular expression. Following is my code and output. $string = "Perl is a\nScripting language"; ($start) = ($string =~ /\A(.*?) /); @lines = ($string =~ /^(.*?) /gm); print "First Word (using \\A): $start\n","Line... (4 Replies)
Discussion started by: jnrohit2k
4 Replies

2. Shell Programming and Scripting

Pattern matching in Perl

Hi, I have a list of IP, eg : 192.168.0.15 192.168.0.24 192.168.2.110 192.168.2.200 And I would like the shortest pattern who match with '192.168.0' and '192.168.2' (without the last dot and number). (7 Replies)
Discussion started by: X-Or
7 Replies

3. Shell Programming and Scripting

Help need with PERL multiple search pattern matching!

My example file is as given below: conn=1 uid=oracle conn=2 uid=db2 conn=3 uid=oracle conn=4 uid=hash conn=5 uid=skher conn=6 uid=oracle conn=7 uid=mpalkar conn=8 uid=anarke conn=9 uid=oracle conn=1 op=-1 msgId=-1 - fd=104 slot=104 LDAPS connection from 10.10.5.6 to 10.18.6.5 conn=2... (3 Replies)
Discussion started by: sags007_99
3 Replies

4. Shell Programming and Scripting

Need help with perl pattern matching

My log file looks as given below, its actually a huge file around 1 GB and these are some of the line: conn=5368758 op=10628050 msgId=64 - RESULT err=0 tag=101 nentries=1 etime=0 conn=7462122 op=-1 msgId=-1 - fd=247 slot=247 LDAPS connection from 10.13.18.12:37645 to 10.18.6.45 conn=7462122... (5 Replies)
Discussion started by: sags007_99
5 Replies

5. Shell Programming and Scripting

Pattern Matching in PERL

I have a 2 files in .gz format and it consists of 5 million lines the format of the file would be gzcat file1.gz | more abcde aerere ffgh56 .. .. 12345 gzcat file2.gz | more abcde , 12345 , 67890, ffgh56 , 45623 ,12334 whatever the string is in the file1 should be matched... (3 Replies)
Discussion started by: aravindj80
3 Replies

6. Shell Programming and Scripting

Perl Pattern matching...

I am doing a file patterhn matching for a text file in PERL I am using this,,, but it says that no file is found $filepattern = '\d{1,4}.*A0NW9693.NDM.HBIDT.*.AD34XADJ.txt'; Can anyone help me out with Perl Pattern Matching concepts and how to do pattern matching for this txt file:... (4 Replies)
Discussion started by: msrahman
4 Replies

7. Shell Programming and Scripting

Perl pattern matching!!

Hi experts, I have many occurances of the following headers in a file. I need to grep for the word changed/inserted in the header, calculate the difference between the two numbers and list the count incrementally. Headers in a file look like this: ------------------- ---------------------... (6 Replies)
Discussion started by: nmattam
6 Replies

8. Shell Programming and Scripting

Perl Pattern Matching

Hello experts, I have a file containing the following text(shortened here). File Begin ---------- < # Billboard.d3fc1302a677.imagePath=S:\\efcm_T4 < Billboard.d3fc1302a677.imagePath=S:\\efcm_T4 --- > # Billboard.d3fc1302a677.imagePath=S:\\efcm_Cassini >... (2 Replies)
Discussion started by: nmattam
2 Replies

9. Shell Programming and Scripting

Perl Pattern Matching !!! Help

Hello I got the below one from in one of this forums For Ex: Loading File System Networking in nature now i need to extract the patterns between the words File and Networking : i.e. sample output: System cmd used : cat <file> | sed 's/.*File //' | sed 's/Closing.*$//' Actually... (0 Replies)
Discussion started by: maxmave
0 Replies

10. Shell Programming and Scripting

perl pattern matching

hi i am trying to get digits inside brackes from file , whose structure is defined below CREATE TABLE TELM (SOC_NO CHAR (3) NOT NULL, TXN_AMOUNT NUMBER (17,3) SIGN_ON_TIME CHAR (8) TELLER_APP_LIMIT NUMBER (17,3) FIL01 ... (2 Replies)
Discussion started by: zedex
2 Replies
Login or Register to Ask a Question