Help with understanding this regex in a Perl script parsing a 'complex' string


 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers Help with understanding this regex in a Perl script parsing a 'complex' string
# 1  
Old 05-14-2019
Help with understanding this regex in a Perl script parsing a 'complex' string

Hi,

I need some guidance with understanding this Perl script below. I am not the author of the script and the author has not leave any documentation. I supposed it is meant to be 'easy' if you're a Perl or regex guru. I am having problem understanding what regex to use Smilie The script does warn about tweaking the regex to suit the ever changing string Smilie

This is the script

Code:
[host01]$ cat x.pl
#!/usr/bin/perl
#
# ./logparse.pl <logfile> <service_name_to_search> | sort | uniq
#

$log = $ARGV[0];
$service_name = $ARGV[1];
$found = 0;
open LOG, $log || die "cannot open logfile $!";
while ($line = <LOG>){
        if ( $line =~ /\(SERVICE_NAME=$service_name\).*\(HOST=([\d.\w]+)\)\(USER=(\w+)\)/ ) {
                print $service_name . "\t" . $1 . "\t" . $2 . "\t" . $3 . "\n";
                $found = 1;
           }
        elsif ( $line =~ /\(USER=(\w+)\).*\(SERVICE_NAME=$service_name.*\).*\(HOST=([\d.\w]+)\)*/ ) {
                print $service_name . "\t" . $1 . "\t" . $2 . "\t" . $3 . "\n";
                $found = 1;
           }
        elsif ( $line =~ /\(CONNECT_DATA=\((\w+).*\(SERVICE_NAME=$service_name.*\).*\(HOST=([\d.\w]+)\)*/ ) {
                print $service_name . "\t" . $1 . "\t" . $2 . "\t" . $3 . "\n";
                $found = 1;
           }
        }
close LOG;

if ( $found == "0" ) {
   print "\n" ;
   print "There is no nothing found for " . $service_name . "\n" ;
   print "Maybe the regex needs changing " . "\n" ;
   print "The string format has been known to change " . "\n" ;
   print "\n" ;
}

Here's some sample files to parse and run against this script.

Code:
#==> test1.log <==
#2018-07-23 13:19:38 * (CONNECT_DATA=(CID=(PROGRAM=JDBC Thin Client)(HOST=__jdbc__)(USER=mickey))(SERVER=DEDICATED)(SERVICE_NAME=work_app.com.ph)) * (ADDRESS=(PROTOCOL=tcp)(HOST=12.123.11.123)(PORT=53102)) * establish * work_app.com.ph * 0
#2018-07-23 09:12:12 * (CONNECT_DATA=(CID=(PROGRAM=SQL Developer)(HOST=__jdbc__)(USER=minnie))(SERVICE_NAME=work_app.com.ph)(SERVER=dedicated)(INSTANCE_NAME=testp11)) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.214.14.29)(PORT=53548)) * establish * work_app.com.ph * 0
#
#==> test2.log <==
#2019-05-12 04:17:10 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=fail_app.com.ph)(CID=(PROGRAM=C:\Windows\system32\exec01.exe)(HOST=MNLAPP01)(USER=!sysadmin01))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.11.11.123)(PORT=62625)) * establish * fail_app.com.ph * 0
#2019-05-12 04:17:10 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=fail_app.com.ph)(CID=(PROGRAM=C:\Windows\system32\exec02.exe)(HOST=MNLAPP01)(USER=!sysadmin01))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.11.11.123)(PORT=62627)) * establish * fail_app.com.ph * 0
#2019-05-12 04:17:10 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=fail_app.com.ph)(CID=(PROGRAM=C:\Windows\system32\exec03.exe)(HOST=MNLAPP01)(USER=!sysadmin01))(INSTANCE_NAME=xxxt23)) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.11.11.123)(PORT=62626)) * establish * fail_app.com.ph * 0
#2019-05-12 04:17:11 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=fail_app.com.ph)(CID=(PROGRAM=C:\Windows\system32\exec01.exe)(HOST=MNLAPP01)(USER=!sysadmin01))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.11.11.123)(PORT=62629)) * establish * fail_app.com.ph * 0

Sample run of the script is as below:

Code:
[host01]$ ./x.pl test1.log work_app
work_app        mickey  12.123.11.123
work_app        minnie  10.214.14.29
[host01]$ ./x.pl test2.log fail_app
fail_app        SERVER  10.11.11.123
fail_app        SERVER  10.11.11.123
fail_app        SERVER  10.11.11.123
fail_app        SERVER  10.11.11.123

Using awk and paste, this is what I am hoping to get with the Perl script

Code:
   awk '{ print $4 }' test2.log | awk -F"(" '{ print $6 }' | awk -F")" '{ print $1 }' > program.tmp.99
   awk '{ print $4 }' test2.log | awk -F"(" '{ print $7 }' | awk -F")" '{ print $1 }' > host.tmp.99
   awk '{ print $4 }' test2.log | awk -F"(" '{ print $8 }' | awk -F")" '{ print $1 }' > user.tmp.99
   awk '{ print $6 }' test2.log | awk -F"(" '{ print $4 }' | awk -F")" '{ print $1 }' > host_ip.tmp.99

   paste program.tmp.99 host.tmp.99 user.tmp.99 host_ip.tmp.99 | sort | uniq

[host01]$ paste program.tmp.99 host.tmp.99 user.tmp.99 host_ip.tmp.99 | sort | uniq
PROGRAM=C:\Windows\system32\exec01.exe  HOST=MNLAPP01   USER=!sysadmin01        HOST=10.11.11.123
PROGRAM=C:\Windows\system32\exec02.exe  HOST=MNLAPP01   USER=!sysadmin01        HOST=10.11.11.123
PROGRAM=C:\Windows\system32\exec03.exe  HOST=MNLAPP01   USER=!sysadmin01        HOST=10.11.11.123

May I please ask someone to kindly explain how the regex is parsing the string? I've been pulling whatever is left of my hair all day and still can't figure out how is it doing what it is meant to be doing. At the moment, I use awk to tmp files and paste to get what I wanted. It is not the best solution I know, sorry.

For the first run of x.pl, it looks alright, but am expecting hoping to get the PROGRAM value as well. I am hoping it should be $1 Smilie

Code:
[host01]$ ./x.pl test1.log work_app
work_app        mickey  12.123.11.123
work_app        minnie  10.214.14.29

For the second run of x.pl, I was hoping to get the output from using awk+paste.

Code:
[host01]$ ./x.pl test2.log fail_app
fail_app        SERVER  10.11.11.123
fail_app        SERVER  10.11.11.123
fail_app        SERVER  10.11.11.123
fail_app        SERVER  10.11.11.123

I believe the answers to my problem is trying to figure how is the Perl regex is dissecting the string into several fields. I can understand this line here does the work of search/match for the search string but how does ot break it down into several fields

Code:
$line =~ /\(SERVICE_NAME=$service_name\).*\(HOST=([\d.\w]+)\)\(USER=(\w+)\)/

The connection changes also based on the program so sometimes I need information before that SERVICE_NAME and sometimes I need information after and sometimes both? Smilie

Some regex tutorial will be much appreacited :-)
Please advise. Thanks.
# 2  
Old 05-14-2019
My perl is non-existent, so no help possible here. But - why not a simple awk solution, like
Code:
awk '
match ($4, "SERVICE_NAME=" SRV) {if (match ($4, /PROGRAM=[^)]*/)) P  = substr ($4, RSTART, RLENGTH)
                                 if (match ($4, /USER=[^)]*/))    U  = substr ($4, RSTART, RLENGTH)
                                 if (match ($4, /HOST=[^)]*/))    H  = substr ($4, RSTART, RLENGTH)
                                 if (match ($6, /HOST=[^)]*/))    IP = substr ($6, RSTART, RLENGTH)
                                 print P, H, U, IP
                                }
' SRV="fail_app" OFS="\t" file2
PROGRAM=C:\Windows\system32\exec01.exe    HOST=MNLAPP01    USER=!sysadmin01    HOST=10.11.11.123
PROGRAM=C:\Windows\system32\exec02.exe    HOST=MNLAPP01    USER=!sysadmin01    HOST=10.11.11.123
PROGRAM=C:\Windows\system32\exec03.exe    HOST=MNLAPP01    USER=!sysadmin01    HOST=10.11.11.123
PROGRAM=C:\Windows\system32\exec01.exe    HOST=MNLAPP01    USER=!sysadmin01    HOST=10.11.11.123

EDIT: or


Code:
awk '
function chop(FLD, STR)          {if (match ($FLD, STR "=[^)]*")) return substr ($FLD, RSTART, RLENGTH)
                                 }
match ($4, "SERVICE_NAME=" SRV)  {print chop(4, "PROGRAM"), chop(4, "USER"), chop(4,"HOST"), chop(6, "HOST")
                                 }
' SRV="fail_app" OFS="\t" file2


Last edited by RudiC; 05-14-2019 at 05:31 AM..
These 2 Users Gave Thanks to RudiC For This Post:
# 3  
Old 05-29-2019
Hi RudiC

I tried both of your suggestion and they both work fine with test2.log but not with test1.log. Is there any way to get it to work for both or do I need to use different awk code for each?

Code:
$ head -100 test*log
==> test1.log <==
2018-07-23 13:19:38 * (CONNECT_DATA=(CID=(PROGRAM=JDBC Thin Client)(HOST=__jdbc__)(USER=mickey))(SERVER=DEDICATED)(SERVICE_NAME=work_app.com.ph)) * (ADDRESS=(PROTOCOL=tcp)(HOST=12.123.11.123)(PORT=53102)) * establish * work_app.com.ph * 0
2018-07-23 09:12:12 * (CONNECT_DATA=(CID=(PROGRAM=SQL Developer)(HOST=__jdbc__)(USER=minnie))(SERVICE_NAME=work_app.com.ph)(SERVER=dedicated)(INSTANCE_NAME=testp11)) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.214.14.29)(PORT=53548)) * establish * work_app.com.ph * 0

==> test2.log <==
2019-05-12 04:17:10 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=fail_app.com.ph)(CID=(PROGRAM=C:\Windows\system32\exec01.exe)(HOST=MNLAPP01)(USER=!sysadmin01))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.11.11.123)(PORT=62625)) * establish * fail_app.com.ph * 0
2019-05-12 04:17:10 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=fail_app.com.ph)(CID=(PROGRAM=C:\Windows\system32\exec02.exe)(HOST=MNLAPP01)(USER=!sysadmin01))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.11.11.123)(PORT=62627)) * establish * fail_app.com.ph * 0
2019-05-12 04:17:10 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=fail_app.com.ph)(CID=(PROGRAM=C:\Windows\system32\exec03.exe)(HOST=MNLAPP01)(USER=!sysadmin01))(INSTANCE_NAME=xxxt23)) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.11.11.123)(PORT=62626)) * establish * fail_app.com.ph * 0
2019-05-12 04:17:11 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=fail_app.com.ph)(CID=(PROGRAM=C:\Windows\system32\exec01.exe)(HOST=MNLAPP01)(USER=!sysadmin01))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.11.11.123)(PORT=62629)) * establish * fail_app.com.ph * 0

These connect strings are from the Oracle DB listener logs and it contains several version of these connection strings. So far, these are the only two formats that I've seen, hopefully there is not another one.

What am currently doing is grep and re-direct all of them to a file and then further break down those two files based on (CONNECT_DATA=(CID= and (CONNECT_DATA=(SERVER=DEDICATED) and then run those four (4) awk and paste for each set and then combine them both Smilie. If I found another version of how the CONNECT_DATA looks like, I supposed I create another for that case. Not sure if there is any other way around it.

Would have been if Oracle themselves had provided their own parser Smilie
# 4  
Old 05-29-2019
OK, let's use the star (*) as the field separator, and don't forget to adapt the SRV variable:
Code:
awk '
function chop(FLD, STR)          {if (match ($FLD, STR "=[^)]*")) return substr ($FLD, RSTART, RLENGTH)
                                 }
match ($0, "SERVICE_NAME=" SRV)  {print chop(2, "PROGRAM"), chop(2, "USER"), chop(2,"HOST"), chop(3, "HOST")
                                 }
' SRV="work_app" FS="\*" OFS="\t" file1
PROGRAM=JDBC Thin Client    USER=mickey    HOST=__jdbc__    HOST=12.123.11.123
PROGRAM=SQL Developer    USER=minnie    HOST=__jdbc__    HOST=10.214.14.29
PROGRAM=JDBC Thin Client    USER=mickey    HOST=__jdbc__    HOST=12.123.11.123
PROGRAM=SQL Developer    USER=minnie    HOST=__jdbc__    HOST=10.214.14.29

Code:
awk '
function chop(FLD, STR)          {if (match ($FLD, STR "=[^)]*")) return substr ($FLD, RSTART, RLENGTH)
                                 }
match ($0, "SERVICE_NAME=" SRV)  {print chop(2, "PROGRAM"), chop(2, "USER"), chop(2,"HOST"), chop(3, "HOST")
                                 }
' SRV="work_app" FS="\*" OFS="\t" file2
philipp@philipp-All-Series:~/MediathekView/playground$ awk '
function chop(FLD, STR)          {if (match ($FLD, STR "=[^)]*")) return substr ($FLD, RSTART, RLENGTH)
                                 }
match ($0, "SERVICE_NAME=" SRV)  {print chop(2, "PROGRAM"), chop(2, "USER"), chop(2,"HOST"), chop(3, "HOST")
                                 }
' SRV="fail_app" FS="\*" OFS="\t" file2
PROGRAM=C:\Windows\system32\exec01.exe    USER=!sysadmin01    HOST=MNLAPP01    HOST=10.11.11.123
PROGRAM=C:\Windows\system32\exec02.exe    USER=!sysadmin01    HOST=MNLAPP01    HOST=10.11.11.123
PROGRAM=C:\Windows\system32\exec03.exe    USER=!sysadmin01    HOST=MNLAPP01    HOST=10.11.11.123
PROGRAM=C:\Windows\system32\exec01.exe    USER=!sysadmin01    HOST=MNLAPP01    HOST=10.11.11.123

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Help understanding perl script

Hello, A former sys admin placed this script on one of our boxes and it needs to be adjusted, but I'm not familiar with perl. Can someone help break this down for me? I'm particularly interested in the -mtime function. What's the time frame being referenced here. ... (5 Replies)
Discussion started by: bbbngowc
5 Replies

2. Shell Programming and Scripting

Parsing expect_out using regex in expect script

Hi, I am trying to write an expect script. Being a newbie in expect, maybee this is a silly doubt but i am stuck here. So essentially , i want the o/p of one router command to be captured . Its something like this Stats Input Rx : 1234 Input Bytes : 3456 My expect script looks ... (5 Replies)
Discussion started by: ashy_g
5 Replies

3. Shell Programming and Scripting

perl regex string match issue..kindly help

i have a script in which i need to skip comments, and i am able to achieve it partially... IN text file: {**************************** {test : test...test } Script: while (<$fh>) { push ( @data, $_); } if ( $data =~ m/(^{\*+$)/ ){ } With the above match i am... (5 Replies)
Discussion started by: avskrm
5 Replies

4. Shell Programming and Scripting

Complex Regex Perl

Hi the below perl snippet will replace any three letter string in the beginning with a two letter string which is specified..but if i want to modfiy only certain characters for eg.. ABC - AB CAB - AB AAA - No Modifcations 1AB - AB AB8 - AB Whatever coming before or after of AB only have... (2 Replies)
Discussion started by: rajkrishna89
2 Replies

5. UNIX for Dummies Questions & Answers

Use Regex to identify / format a complex string

First of all, please have mercy on me. I am not a noob to programming, but I am about as noob as you can get with regex. That being said, I have a problem. I've got a string that looks something like this: Publication - Bob M. Jones, Tony X. Stark, and Fred D. Man, \"Really Awesome Article... (1 Reply)
Discussion started by: egill
1 Replies

6. Shell Programming and Scripting

Perl: Regex, string matching

Hi, I've a logfile which i need to parse and get the logs depending upon the user input. here, i'm providing an option to enter the string which can be matched with the log entries. e.g. one of the logfile entry reads like this - $str = " mpgw(BLUESOAPFramework):... (6 Replies)
Discussion started by: butterfly20
6 Replies

7. Shell Programming and Scripting

Perl REGEX - How do extract a string in a line?

Hi Guys, In the following line: cn=portal.090710.191533.428571000,cn=groups,dc=mp,dc=rj,dc=gov,dc=br I need to extract this string: portal.090710.191533.428571000 As you can see this string always will be bettween "cn=" and "," strings. Someone know one regular expression to... (4 Replies)
Discussion started by: maverick-ski
4 Replies

8. Shell Programming and Scripting

Need help understanding perl script error

I solicited this site earlier this week and got a good answer for a perl Script so I made this script from what understood from the answers But now I have a bug and I'm stump. It doesn't parse correctly the Output it stays on the first line My $f2 and reprints in a endless loop I'm sure there... (3 Replies)
Discussion started by: Ex-Capsa
3 Replies

9. Shell Programming and Scripting

Perl Regex string opperation

I'm working on a basic log parser in perl. Input file looks like: len: 120713 foo bar file size of: testdir1/testdir1/testdir1/testdir1/testfile0 is 120713Of course there are tens of thousands of lines... I'm trying to compare the len and filesize values. #!/usr/bin/perl use strict; use... (2 Replies)
Discussion started by: dkozel
2 Replies

10. Programming

Parsing a string in PERL

I have an extractfile (with fields delimited by pipes '|') and I want to prepend a counter based on the below requirements: - The counter starts at 3. - The counter increments only if the date (67th field of the extractfile) is different. Below is what I started off with: $cnt=2;... (3 Replies)
Discussion started by: ChicagoBlues
3 Replies
Login or Register to Ask a Question