Using forward slash in search pattern in perl script

06-14-2016

Registered User

44, 4

Join Date: Dec 2011

Last Activity: 10 May 2017, 4:57 AM EDT

Location: Bangalore

Posts: 44

Thanks Given: 32

Thanked 4 Times in 1 Post

Using forward slash in search pattern in perl script

I have existing pattern in the perl script as:

Code:

my $pattern = "^Line.*?:|^Errors*: [^0]|^SEVERE:.*?:|^Null pointer exception occurred";

and I wanted to include below keywords in my search pattern

Code:

 "I/O exception" and "FileNotFoundException"

the problem is when I include my pattern like

Code:

my $pattern = "^Line.*?:|^Errors*: [^0]|^SEVERE:.*?:|^Null pointer exception occurred|I/O exception|FileNotFoundException";

I am not sure the forward slash used in I/O Exception will still be valid or not as forward slash is an special character.

Similar way I have existing code as:

Code:

@fails = grep /fail$/, ( grep !/^\./, readdir LOGDIR );

Here also I wanted to include I/O Exception and FileNotFoundException.

to include like

Code:

@fails = grep /I/O exception\|FileNotFoundException\|fail$/, ( grep !/^\./, readdir LOGDIR );

is the right way?

I really do not understand regex and so facing problem please help.

ambarginni

View Public Profile for ambarginni

Find all posts by ambarginni

06-14-2016

Registered User

6,384, 2,214

Join Date: May 2005

Last Activity: 28 October 2019, 4:59 PM EDT

Location: In the leftmost byte of /dev/kmem

Posts: 6,384

Thanks Given: 143

Thanked 2,214 Times in 1,548 Posts

Quote:

Originally Posted by ambarginni

and I wanted to include below keywords in my search pattern

Code:

 "I/O exception" and "FileNotFoundException"

It is relatively easy: the forward slash has a special meaning. Whenever you want to use a character with a special meaning literally you need to "escape" it. Escaping is done by prepending it with a backslash:

Code:

my $pattern = "^Line.*?:|^Errors*: [^0]|[...]|I\/O exception";

Notice that "|" separates different patterns to seach for and is like a logical OR. The expression

Code:

"ab|cd|ef|gh"

searches for any occurrence of "ab" or "cd" or "ef" or "gh". Therefore to add a pattern just add a "|" at the end and then the pattern you intend to search for.

I hope this helps.

bakunin

This User Gave Thanks to bakunin For This Post:

bakunin

View Public Profile for bakunin

Find all posts by bakunin

06-15-2016

Registered User

44, 4

Join Date: Dec 2011

Last Activity: 10 May 2017, 4:57 AM EDT

Location: Bangalore

Posts: 44

Thanks Given: 32

Thanked 4 Times in 1 Post

Thank you Bakunin.

I have appended the keywords as you suggested.

Further I wanted to understand the code better.. can you please help me..
in the below code

Quote:

grep !/^\./

what does that mean....

Code:

@fails = grep /FileNotFoundException\|I\/O exception\|fail$/, ( grep !/^\./, readdir LOGDIR );

ambarginni

View Public Profile for ambarginni

Find all posts by ambarginni

06-15-2016

Registered User

6,384, 2,214

Join Date: May 2005

Last Activity: 28 October 2019, 4:59 PM EDT

Location: In the leftmost byte of /dev/kmem

Posts: 6,384

Thanks Given: 143

Thanked 2,214 Times in 1,548 Posts

Quote:

Originally Posted by ambarginni

Further I wanted to understand the code better.. can you please help me..
in the below code what does that mean....

Code:

@fails = grep /FileNotFoundException\|I\/O exception\|fail$/, ( grep !/^\./, readdir LOGDIR );

To be honest, i can only guess, because i habitually avoid perl like the plague. It seems to me that: !/^\./ is the regular expression, where "!" means: reverse the search, give every line NOT matched by the following expression.

The following regular expression is /^\./: /.../ is only the (traditional) delimiter (like the double quotes for strings: ".."), "^" at the beginning means "beginning of line" and "\." is an escaped (we had that already) literal full stop.

So the whole expression means "every line NOT starting with a full stop as first character.

To understand regexes better, here is a little introduction into the concept:

As i already said regular expressions are delimited by "/", so we will write it like that here, regardless of the tool we use to test it needing this delimiter or not. The simplest form of regular expression is to search for fixed strings:

Code:

/aBcde/

searches for the string "aBcde". Note that it searches for "aBcde" BUT NOT for "abcde"! Every character you put into the search string represents itself and itself only!

Now, this is of rather limited use and to make regexes more flexible there are so-called meta-characters. Metacharacters do not match themselves but modify the way other characters are matched. Suppose we would like to search for "aBcde" or "abcde" and do not want to care about capitalisation of the "b". This can be done by a "range":

Code:

/a[Bb]cde/

The [Bb] means: either "B" or "b" - but not both! The input string aBbcde would NOT be matched! you can signify ranges of characters using the dash:

Code:

/a[a-z]cde/         # all non-capitalized characters
/a[A-Za-z]cde/      # all capitalized or non-capitalized characters
/a[a-z0-9]cde/      # all non-capitalized characters or numbers
/x[0-9][0-9][0-9]x/ # a three-digit number surrounded by "x"

It is also possible to "invert" these ranges, by using a caret "^" as the first character inside the range:

Code:

/a[^0-9]cde/      # an "a" followed by any single charater except a number followed by "cde"

Note that inside these ranges all metacharacter LOSE their special meaning.

The next metacharacter is similar to that ranges but even more general: it is the full stop ".". It matches any single character:

Code:

/ab.de/    # any 5-digit string starting with "ab" and ending with "de"

All metacharacters revert only to their literal meaning if they are prepended by a backslash - even the backslash itself:

[code]/\..\./ # a dot followed by any single character followed by a dot

Code:

/\\..\./   # a backslash followed by two characters followed by a dot

Because the first backslash escapes the second one the second one counts as a simple backslash without the escaping ability. Therefore the previously escaped dot becomes a metacharacter again.

Now there are not only metacharacters to match certain other characters (or groups thereof) but which modify other expressions. The first you need is ther asterisk "*". It means zero or more of the expression before.

Code:

/ab*c/  # a followed by any number of b's (even none) followed by c

Notice that the last example matches "abc" and "abbbc" but also "ac"! If you want to match at least one "b" so that "ac" is not matched at all you need to double it:

Code:

/abb*c/  # matches "abc" and "abbc", "abbbc", etc. but not "ac"

There is also another construct to count the number of expressions to match: "\{m,n\}" where "m" and "n" are numbers. This works similar to the asterisk, but limits the number of allowed occurrences to be between m and n.

Code:

/ab\{1,3\}c/  # matches "abc" and "abbc" and "abbbc", but not "ac" or "abbbbc", etc.

You may have noticed that i talked about "expressions" rather than "characters" in the last part. In the simplest form an "expression" is a single character or metacharacter:

Code:

/@.\{2,3\}@/   # any two or three characters, surrounded by ats

But this is an "expression":

Code:

/![0-5]\{2,3\}!/   # any two or three digits 0-5, surrounded by exclamation marks

And characters or other expressions can be further grouped by braces:

Code:

/\([0-9]\{3\}\.\)\{2\}/   # two groups of each 3 digits followed by a literal dot

This would match "123.456." or "913.756.", etc.

Greedyness

There is a cause of endless misunderstandings caused by the range of a possible match. Consider the following input:

Code:

abcdXfdgdkjXsfdsdX2387X

Now suppose we have the following regular expression, which part of the above string would it match:

Code:

/a.*X/   # an "a" followed by any number of any characters followed by "X"

Possible answers:
abcdXfdgdkjXsfdsdX2387X
abcdXfdgdkjXsfdsdX2387X
abcdXfdgdkjXsfdsdX2387X
abcdXfdgdkjXsfdsdX2387X

The right answer, in all Unix-like regex machines is: the last one. because "any character" includes the X, the longest possible match is used. This is called "greedyness" of regular expressions. If it matches the longest possible string it is called "greedy" if it matches the shortest possible string this is called "non-greedy". Unix-regexes are usually greedy.

That rises the question how we would match the non-greedy variant. This is done usually with inverted character classes. The first answer above can be built this way:

Code:

/a[^X]*X/   # an "a" followed by any number of non-X followed by "X"

Now this is only a short introduction. If you want to know more about regexes you might want to read Dale Doughertys phantastic book "sed & awk", published by O'Reilly.

Further pointers: understand the (slight) differences between "extended regular expressions" (EREs) and "basic regular expressions" (BREs) - btw., i have introduced BREs here and that there are UNIX-EREs and UNIX-BREs and also GNU-EREs and GNU-BREs (used in the Linux counterparts of Unix utilities like sed and awk). There are also perl-REs, which is still a slightly different regexp-engine. They are all quite similar, though, and the basic workings are always the same, so if you know one you know about 90% of all the others too.

I hope this helps.

bakunin

This User Gave Thanks to bakunin For This Post:

bakunin

View Public Profile for bakunin

Find all posts by bakunin

06-15-2016

Registered User

5,091, 1,931

Join Date: May 2012

Last Activity: 15 July 2020, 4:46 AM EDT

Location: Simplicity

Posts: 5,091

Thanks Given: 565

Thanked 1,931 Times in 1,668 Posts

The slash is not a sepecial character in an RE, but it is a delimiter for an RE in perl.
I am not a perl expert, but it looks like you need to escape a constant / within the delimiting slashes

Code:

grep /I\/O exception|FileNotFoundException/, ...;

but not in a variable

Code:

my $pattern = "I/O exception|FileNotFoundException";
grep /$pattern/, ...;

Because grep takes any perl expression, one can take the m operator that allows other RE delimiters, for example the #

Code:

grep m#I/O exception|FileNotFoundException#, ...;

Last edited by MadeInGermany; 06-15-2016 at 02:27 PM..

MadeInGermany

View Public Profile for MadeInGermany

Find all posts by MadeInGermany

06-15-2016

Registered User

1,781, 705

Join Date: May 2008

Last Activity: 10 November 2021, 5:38 PM EST

Posts: 1,781

Thanks Given: 62

Thanked 705 Times in 653 Posts

Quote:

Originally Posted by ambarginni

I have appended the keywords as you suggested.

Further I wanted to understand the code better.. can you please help me..
in the below code what does that mean....

Quote:

grep !/^\./

Code:

@fails = grep /FileNotFoundException\|I\/O exception\|fail$/, ( grep !/^\./, readdir LOGDIR );

Actually, grep !/^\./ should not be considered as stated. grep in Perl is not similar to the grep family of utilities in the Unix word. While often, it uses regular expression as part of the block or expression, that's the only similitude.
The proper way of thinking of grep is as this:

Code:

grep BLOCK LIST
grep EXPRESSION,LIST

Notice the coma.

The first will be seen as:

Code:

my @list_result = grep {!/^\./} @given_list;

Inside that block {} you can put a lot of normal code, in this case is a regular expression and the return of that expression is negated. If that evaluates to true, the content of $_ (which contains an element of the list @given_list) is appended to @list_result.

The second will be seen as:

Code:

my @list_result = grep !/^\./, @given_list;

Code:

my @list_result = grep (!/^\./, @given_list);

The difference between BLOCK and EXPRESSION is that the later only accepts one expression instead of whole block of code. For this purpose, it is the same.

Going back to this portion:

Code:

@fails = grep /FileNotFoundException\|I\/O exception\|fail$/, ( grep !/^\./, readdir LOGDIR );

First, you need to eliminate those escape characters highlighted in red, otherwise you are just making the | a normal pipe character and not the regex alternation, indicating OR
I\/O that needs to be escaped only because you are using the default delimiters m// and when the code gets parsed it sees the first / it reads until I/ and it thinks is done, and it does not know what to do with the rest.
If the default delimiter is used, the m is optional, but it is not optional if another character is used as delimiter.
Can you now see what that line of code is? Can you tell if it is a grep BLOCK LIST or a grep EXPRESSION,LIST?
It is a nested set of grep EXPRESSION,LIST.
The first one is grep !/^\./, readdir LOGDIR which will return a LIST of the filenames in LOGDIR that do not start with a period, being itself the LIST argument passed to the second grep /FileNotFoundException|I\/O exception|fail$/, (...) represented by the three dots.

As a note, the last grep is not reading the content of the files and searching for the strings FileNotFoundException or I/O exception or fail$. It's using those regex against the filename, if any of those filenames match in the name, that element gets appended to @fails.

Last edited by Aia; 06-15-2016 at 08:33 PM..

This User Gave Thanks to Aia For This Post:

Aia

View Public Profile for Aia

Find all posts by Aia

UNIX for Beginners Questions & Answers

Using forward slash in search pattern in perl script

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Escaping Forward Slash

Discussion started by: alvinoo

2. UNIX for Dummies Questions & Answers

Issues with sort and forward slash

Discussion started by: newbie2010

3. UNIX for Dummies Questions & Answers

Awk pattern with letters and forward slash

Discussion started by: francy.casa

4. Shell Programming and Scripting

AWK or SED to replace forward slash

Discussion started by: dshakey

5. UNIX for Dummies Questions & Answers

Replace Forward Slash with sed

Discussion started by: depakjan

6. Shell Programming and Scripting

Significance of forward slash(/) while specifying a directory

Discussion started by: proactiveaditya

7. Shell Programming and Scripting

Using sed to append backward slash before forward slash

Discussion started by: sarbjit

8. Shell Programming and Scripting

escaping / (forward slash)

Discussion started by: farooqpervaiz

9. Shell Programming and Scripting

Help with SED and forward slash

Discussion started by: gseyforth

10. Shell Programming and Scripting

grep for forward slash

Discussion started by: wxornot