Sponsored Content
Top Forums UNIX for Beginners Questions & Answers Help with understanding this regex in a Perl script parsing a 'complex' string Post 303035077 by newbie_01 on Tuesday 14th of May 2019 01:00:28 AM
Old 05-14-2019
Help with understanding this regex in a Perl script parsing a 'complex' string

Hi,

I need some guidance with understanding this Perl script below. I am not the author of the script and the author has not leave any documentation. I supposed it is meant to be 'easy' if you're a Perl or regex guru. I am having problem understanding what regex to use Smilie The script does warn about tweaking the regex to suit the ever changing string Smilie

This is the script

Code:
[host01]$ cat x.pl
#!/usr/bin/perl
#
# ./logparse.pl <logfile> <service_name_to_search> | sort | uniq
#

$log = $ARGV[0];
$service_name = $ARGV[1];
$found = 0;
open LOG, $log || die "cannot open logfile $!";
while ($line = <LOG>){
        if ( $line =~ /\(SERVICE_NAME=$service_name\).*\(HOST=([\d.\w]+)\)\(USER=(\w+)\)/ ) {
                print $service_name . "\t" . $1 . "\t" . $2 . "\t" . $3 . "\n";
                $found = 1;
           }
        elsif ( $line =~ /\(USER=(\w+)\).*\(SERVICE_NAME=$service_name.*\).*\(HOST=([\d.\w]+)\)*/ ) {
                print $service_name . "\t" . $1 . "\t" . $2 . "\t" . $3 . "\n";
                $found = 1;
           }
        elsif ( $line =~ /\(CONNECT_DATA=\((\w+).*\(SERVICE_NAME=$service_name.*\).*\(HOST=([\d.\w]+)\)*/ ) {
                print $service_name . "\t" . $1 . "\t" . $2 . "\t" . $3 . "\n";
                $found = 1;
           }
        }
close LOG;

if ( $found == "0" ) {
   print "\n" ;
   print "There is no nothing found for " . $service_name . "\n" ;
   print "Maybe the regex needs changing " . "\n" ;
   print "The string format has been known to change " . "\n" ;
   print "\n" ;
}

Here's some sample files to parse and run against this script.

Code:
#==> test1.log <==
#2018-07-23 13:19:38 * (CONNECT_DATA=(CID=(PROGRAM=JDBC Thin Client)(HOST=__jdbc__)(USER=mickey))(SERVER=DEDICATED)(SERVICE_NAME=work_app.com.ph)) * (ADDRESS=(PROTOCOL=tcp)(HOST=12.123.11.123)(PORT=53102)) * establish * work_app.com.ph * 0
#2018-07-23 09:12:12 * (CONNECT_DATA=(CID=(PROGRAM=SQL Developer)(HOST=__jdbc__)(USER=minnie))(SERVICE_NAME=work_app.com.ph)(SERVER=dedicated)(INSTANCE_NAME=testp11)) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.214.14.29)(PORT=53548)) * establish * work_app.com.ph * 0
#
#==> test2.log <==
#2019-05-12 04:17:10 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=fail_app.com.ph)(CID=(PROGRAM=C:\Windows\system32\exec01.exe)(HOST=MNLAPP01)(USER=!sysadmin01))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.11.11.123)(PORT=62625)) * establish * fail_app.com.ph * 0
#2019-05-12 04:17:10 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=fail_app.com.ph)(CID=(PROGRAM=C:\Windows\system32\exec02.exe)(HOST=MNLAPP01)(USER=!sysadmin01))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.11.11.123)(PORT=62627)) * establish * fail_app.com.ph * 0
#2019-05-12 04:17:10 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=fail_app.com.ph)(CID=(PROGRAM=C:\Windows\system32\exec03.exe)(HOST=MNLAPP01)(USER=!sysadmin01))(INSTANCE_NAME=xxxt23)) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.11.11.123)(PORT=62626)) * establish * fail_app.com.ph * 0
#2019-05-12 04:17:11 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=fail_app.com.ph)(CID=(PROGRAM=C:\Windows\system32\exec01.exe)(HOST=MNLAPP01)(USER=!sysadmin01))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.11.11.123)(PORT=62629)) * establish * fail_app.com.ph * 0

Sample run of the script is as below:

Code:
[host01]$ ./x.pl test1.log work_app
work_app        mickey  12.123.11.123
work_app        minnie  10.214.14.29
[host01]$ ./x.pl test2.log fail_app
fail_app        SERVER  10.11.11.123
fail_app        SERVER  10.11.11.123
fail_app        SERVER  10.11.11.123
fail_app        SERVER  10.11.11.123

Using awk and paste, this is what I am hoping to get with the Perl script

Code:
   awk '{ print $4 }' test2.log | awk -F"(" '{ print $6 }' | awk -F")" '{ print $1 }' > program.tmp.99
   awk '{ print $4 }' test2.log | awk -F"(" '{ print $7 }' | awk -F")" '{ print $1 }' > host.tmp.99
   awk '{ print $4 }' test2.log | awk -F"(" '{ print $8 }' | awk -F")" '{ print $1 }' > user.tmp.99
   awk '{ print $6 }' test2.log | awk -F"(" '{ print $4 }' | awk -F")" '{ print $1 }' > host_ip.tmp.99

   paste program.tmp.99 host.tmp.99 user.tmp.99 host_ip.tmp.99 | sort | uniq

[host01]$ paste program.tmp.99 host.tmp.99 user.tmp.99 host_ip.tmp.99 | sort | uniq
PROGRAM=C:\Windows\system32\exec01.exe  HOST=MNLAPP01   USER=!sysadmin01        HOST=10.11.11.123
PROGRAM=C:\Windows\system32\exec02.exe  HOST=MNLAPP01   USER=!sysadmin01        HOST=10.11.11.123
PROGRAM=C:\Windows\system32\exec03.exe  HOST=MNLAPP01   USER=!sysadmin01        HOST=10.11.11.123

May I please ask someone to kindly explain how the regex is parsing the string? I've been pulling whatever is left of my hair all day and still can't figure out how is it doing what it is meant to be doing. At the moment, I use awk to tmp files and paste to get what I wanted. It is not the best solution I know, sorry.

For the first run of x.pl, it looks alright, but am expecting hoping to get the PROGRAM value as well. I am hoping it should be $1 Smilie

Code:
[host01]$ ./x.pl test1.log work_app
work_app        mickey  12.123.11.123
work_app        minnie  10.214.14.29

For the second run of x.pl, I was hoping to get the output from using awk+paste.

Code:
[host01]$ ./x.pl test2.log fail_app
fail_app        SERVER  10.11.11.123
fail_app        SERVER  10.11.11.123
fail_app        SERVER  10.11.11.123
fail_app        SERVER  10.11.11.123

I believe the answers to my problem is trying to figure how is the Perl regex is dissecting the string into several fields. I can understand this line here does the work of search/match for the search string but how does ot break it down into several fields

Code:
$line =~ /\(SERVICE_NAME=$service_name\).*\(HOST=([\d.\w]+)\)\(USER=(\w+)\)/

The connection changes also based on the program so sometimes I need information before that SERVICE_NAME and sometimes I need information after and sometimes both? Smilie

Some regex tutorial will be much appreacited :-)
Please advise. Thanks.
 

10 More Discussions You Might Find Interesting

1. Programming

Parsing a string in PERL

I have an extractfile (with fields delimited by pipes '|') and I want to prepend a counter based on the below requirements: - The counter starts at 3. - The counter increments only if the date (67th field of the extractfile) is different. Below is what I started off with: $cnt=2;... (3 Replies)
Discussion started by: ChicagoBlues
3 Replies

2. Shell Programming and Scripting

Perl Regex string opperation

I'm working on a basic log parser in perl. Input file looks like: len: 120713 foo bar file size of: testdir1/testdir1/testdir1/testdir1/testfile0 is 120713Of course there are tens of thousands of lines... I'm trying to compare the len and filesize values. #!/usr/bin/perl use strict; use... (2 Replies)
Discussion started by: dkozel
2 Replies

3. Shell Programming and Scripting

Need help understanding perl script error

I solicited this site earlier this week and got a good answer for a perl Script so I made this script from what understood from the answers But now I have a bug and I'm stump. It doesn't parse correctly the Output it stays on the first line My $f2 and reprints in a endless loop I'm sure there... (3 Replies)
Discussion started by: Ex-Capsa
3 Replies

4. Shell Programming and Scripting

Perl REGEX - How do extract a string in a line?

Hi Guys, In the following line: cn=portal.090710.191533.428571000,cn=groups,dc=mp,dc=rj,dc=gov,dc=br I need to extract this string: portal.090710.191533.428571000 As you can see this string always will be bettween "cn=" and "," strings. Someone know one regular expression to... (4 Replies)
Discussion started by: maverick-ski
4 Replies

5. Shell Programming and Scripting

Perl: Regex, string matching

Hi, I've a logfile which i need to parse and get the logs depending upon the user input. here, i'm providing an option to enter the string which can be matched with the log entries. e.g. one of the logfile entry reads like this - $str = " mpgw(BLUESOAPFramework):... (6 Replies)
Discussion started by: butterfly20
6 Replies

6. UNIX for Dummies Questions & Answers

Use Regex to identify / format a complex string

First of all, please have mercy on me. I am not a noob to programming, but I am about as noob as you can get with regex. That being said, I have a problem. I've got a string that looks something like this: Publication - Bob M. Jones, Tony X. Stark, and Fred D. Man, \"Really Awesome Article... (1 Reply)
Discussion started by: egill
1 Replies

7. Shell Programming and Scripting

Complex Regex Perl

Hi the below perl snippet will replace any three letter string in the beginning with a two letter string which is specified..but if i want to modfiy only certain characters for eg.. ABC - AB CAB - AB AAA - No Modifcations 1AB - AB AB8 - AB Whatever coming before or after of AB only have... (2 Replies)
Discussion started by: rajkrishna89
2 Replies

8. Shell Programming and Scripting

perl regex string match issue..kindly help

i have a script in which i need to skip comments, and i am able to achieve it partially... IN text file: {**************************** {test : test...test } Script: while (<$fh>) { push ( @data, $_); } if ( $data =~ m/(^{\*+$)/ ){ } With the above match i am... (5 Replies)
Discussion started by: avskrm
5 Replies

9. Shell Programming and Scripting

Parsing expect_out using regex in expect script

Hi, I am trying to write an expect script. Being a newbie in expect, maybee this is a silly doubt but i am stuck here. So essentially , i want the o/p of one router command to be captured . Its something like this Stats Input Rx : 1234 Input Bytes : 3456 My expect script looks ... (5 Replies)
Discussion started by: ashy_g
5 Replies

10. Shell Programming and Scripting

Help understanding perl script

Hello, A former sys admin placed this script on one of our boxes and it needs to be adjusted, but I'm not familiar with perl. Can someone help break this down for me? I'm particularly interested in the -mtime function. What's the time frame being referenced here. ... (5 Replies)
Discussion started by: bbbngowc
5 Replies
All times are GMT -4. The time now is 04:18 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy