PERL pattern matching in a file


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers PERL pattern matching in a file
# 1  
Old 05-19-2011
PERL pattern matching in a file

Hi Gurus,

I have a file like below.. I have to match each with predefined pattern. If matches then have to write the entire record to a separate file. If not make the value as NULL and write the entire record into another file.

| is the delimiter

Code:
ravi123|2344|M
R123Vi|2345|F
R_'345|278|M
ra12|*&|F

Pattern is [0-9a-zA-Z] | [0-9] | [MF]

If any value for each record is not matching with above pattern write to reject file
Else if all the values are matching in a record write to accept file.

Reject file
Code:
R_'345|278|M
ra12|*&|F

Accept file ( make not matched column values NULL)
Code:
ravi123|2344|M
R123Vi|2345|F
|278|M
ra12||F

Please help me out.

Last edited by radoulov; 05-19-2011 at 11:23 AM.. Reason: Code tags.
# 2  
Old 05-19-2011
OK, so you need to
  1. open the file containing your record for reading and accept and reject files also
  2. while there are records in the file
    1. If the line matches a regex like the following
      Code:
      /^[0-9a-zA-Z]+|[0-9]+|[MF]$/

      1. write to accept file
    2. if on the other hand it didn't match
      1. write to reject file
  3. And finally let the user know you're done when you've processed all the records and close your open files.

Last edited by Skrynesaver; 05-19-2011 at 11:09 AM..
# 3  
Old 05-19-2011
Yes. We need to search entire record whether matching or not. If atleast one column attribute is not matching we have write to reject file. And make the accept file as below

Original - Ravi*123 | 234 | M

Reject file - Atleast one column is not matching so write entire record as it is to reject file Ravi*123 | 234 | M

Accept file - If any column is not matching pattern.. make it NULL and move to Accept file.. |234|M
# 4  
Old 05-19-2011
So how far have you got?
# 5  
Old 05-19-2011
I tried reading each and every line of the file.
And for each line, split using "|", and store each column value in an array.
Now for each column value check whether matching pattern or not.
If not matching write entire record to reject file.

But I am stuck with writing to accept file making unmatched column value as NULL and write to accept file.

Code:
@pat = ("[0-9A-Za-z]","[0-9]","[MF]");
open (input, "input.txt") or die $!;
while (<INPUt>)
{
$input_rec = $_;
@arr = split(/\|/, $input_rec);
$count = 0;
$valid = 1;
foreach $i (@arr)
{
        $i =~ s/$pat[$count]+/1/g;
        if ( $i == 1 )
    {
       $count++;
       $valid++;
    }
    else
    {
      $count++;
     #      $accept[$count] = NULL;
    }
 
}
     if ($valid != 4)
     {
     `echo $input_rec >> reject.txt`;
     }
 
}

Moderator's Comments:
Mod Comment Please start using [CODE] tags for source listings, console output, ...

Last edited by pvksandeep; 05-19-2011 at 12:49 PM..
# 6  
Old 05-19-2011
A couple of things:
  • You don't need to split the line if you already know what each field and the separator should look like
  • Did you read the link to "perldoc -f open" above? please read it before using the code below
  • In any Perl script that you would consider saving as a tool/utility use strict and warnings .
  • Ok with that out of the way try the following
Code:
#!/usr/bin/perl

use strict;
use warnings;

open (my $records_file, '<', 'input.txt') || die "Couldn't open input.txt:\n\t $!";
open (my $accepted , '>', 'accepted.txt')||die "Couldn't open accepted.txt\n\t$!";
open (my $rejects , '>', 'rejects.txt')||die "Couldn't open rejects.txt\n\t$!";
while (<$records_file>){
   if (/^[0-9a-zA-Z]+\|[0-9]+\|[MF]$/ ){
      print $accepted $_;
   }
   else {
      print $rejects $_;
   }
}
close($accepted);
close($rejects);
close($records_file);
print "The valid records from input.txt are in the file accepted.txt and the invalid are in rejects.txt\n";
exit;

# 7  
Old 05-19-2011
Thanks Skrynesaver.
Only doubt here is .. If atleast one column value is not matched in a record. I have to replace it with NULL.
That is , in Accepted.txt if a column value is not matched like

Code:

Original -- Ravi123|23AA|M
Pattern --[0-9a-zA-Z]|[0-9]|[MF]

Accepted -- Ravi123||M

I thought this will be possible only by splitting each and replacing column value with NULL if not matched.


Thanks
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Perl - Use of *? in Matching Pattern

I am using Perl version 5.8.4 and trying to understand the use of regular expression. Following is my code and output. $string = "Perl is a\nScripting language"; ($start) = ($string =~ /\A(.*?) /); @lines = ($string =~ /^(.*?) /gm); print "First Word (using \\A): $start\n","Line... (4 Replies)
Discussion started by: jnrohit2k
4 Replies

2. Shell Programming and Scripting

Pattern matching in Perl

Hi, I have a list of IP, eg : 192.168.0.15 192.168.0.24 192.168.2.110 192.168.2.200 And I would like the shortest pattern who match with '192.168.0' and '192.168.2' (without the last dot and number). (7 Replies)
Discussion started by: X-Or
7 Replies

3. Shell Programming and Scripting

Need help with perl pattern matching

My log file looks as given below, its actually a huge file around 1 GB and these are some of the line: conn=5368758 op=10628050 msgId=64 - RESULT err=0 tag=101 nentries=1 etime=0 conn=7462122 op=-1 msgId=-1 - fd=247 slot=247 LDAPS connection from 10.13.18.12:37645 to 10.18.6.45 conn=7462122... (5 Replies)
Discussion started by: sags007_99
5 Replies

4. Shell Programming and Scripting

Pattern Matching in PERL

I have a 2 files in .gz format and it consists of 5 million lines the format of the file would be gzcat file1.gz | more abcde aerere ffgh56 .. .. 12345 gzcat file2.gz | more abcde , 12345 , 67890, ffgh56 , 45623 ,12334 whatever the string is in the file1 should be matched... (3 Replies)
Discussion started by: aravindj80
3 Replies

5. Shell Programming and Scripting

Perl Pattern matching...

I am doing a file patterhn matching for a text file in PERL I am using this,,, but it says that no file is found $filepattern = '\d{1,4}.*A0NW9693.NDM.HBIDT.*.AD34XADJ.txt'; Can anyone help me out with Perl Pattern Matching concepts and how to do pattern matching for this txt file:... (4 Replies)
Discussion started by: msrahman
4 Replies

6. Shell Programming and Scripting

Perl pattern matching!!

Hi experts, I have many occurances of the following headers in a file. I need to grep for the word changed/inserted in the header, calculate the difference between the two numbers and list the count incrementally. Headers in a file look like this: ------------------- ---------------------... (6 Replies)
Discussion started by: nmattam
6 Replies

7. Shell Programming and Scripting

Perl Pattern Matching

Hello experts, I have a file containing the following text(shortened here). File Begin ---------- < # Billboard.d3fc1302a677.imagePath=S:\\efcm_T4 < Billboard.d3fc1302a677.imagePath=S:\\efcm_T4 --- > # Billboard.d3fc1302a677.imagePath=S:\\efcm_Cassini >... (2 Replies)
Discussion started by: nmattam
2 Replies

8. Shell Programming and Scripting

Perl -Pattern Matching help..!

Hi, I got doubt in Pattern matching, could you tell me how the following differs in action ?? if ( $line1==/$line2/ ) if ( $line1=~/$line2/ ) if ( $line1=~m/$line2/) What is the significance of '~' in matching. Thanks in advance CoolBhai (5 Replies)
Discussion started by: coolbhai
5 Replies

9. Shell Programming and Scripting

Perl Pattern Matching !!! Help

Hello I got the below one from in one of this forums For Ex: Loading File System Networking in nature now i need to extract the patterns between the words File and Networking : i.e. sample output: System cmd used : cat <file> | sed 's/.*File //' | sed 's/Closing.*$//' Actually... (0 Replies)
Discussion started by: maxmave
0 Replies

10. Shell Programming and Scripting

perl pattern matching

hi i am trying to get digits inside brackes from file , whose structure is defined below CREATE TABLE TELM (SOC_NO CHAR (3) NOT NULL, TXN_AMOUNT NUMBER (17,3) SIGN_ON_TIME CHAR (8) TELLER_APP_LIMIT NUMBER (17,3) FIL01 ... (2 Replies)
Discussion started by: zedex
2 Replies
Login or Register to Ask a Question