Using forward slash in search pattern in perl script


 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers Using forward slash in search pattern in perl script
# 1  
Old 06-14-2016
Using forward slash in search pattern in perl script

I have existing pattern in the perl script as:
Code:
my $pattern = "^Line.*?:|^Errors*: [^0]|^SEVERE:.*?:|^Null pointer exception occurred";

and I wanted to include below keywords in my search pattern
Code:
 "I/O exception" and "FileNotFoundException"

the problem is when I include my pattern like
Code:
my $pattern = "^Line.*?:|^Errors*: [^0]|^SEVERE:.*?:|^Null pointer exception occurred|I/O exception|FileNotFoundException";

I am not sure the forward slash used in I/O Exception will still be valid or not as forward slash is an special character.

Similar way I have existing code as:

Code:
@fails = grep /fail$/, ( grep !/^\./, readdir LOGDIR );

Here also I wanted to include I/O Exception and FileNotFoundException.

to include like
Code:
@fails = grep /I/O exception\|FileNotFoundException\|fail$/, ( grep !/^\./, readdir LOGDIR );

is the right way?

I really do not understand regex and so facing problem please help.
# 2  
Old 06-14-2016
Quote:
Originally Posted by ambarginni
and I wanted to include below keywords in my search pattern
Code:
 "I/O exception" and "FileNotFoundException"

It is relatively easy: the forward slash has a special meaning. Whenever you want to use a character with a special meaning literally you need to "escape" it. Escaping is done by prepending it with a backslash:

Code:
my $pattern = "^Line.*?:|^Errors*: [^0]|[...]|I\/O exception";

Notice that "|" separates different patterns to seach for and is like a logical OR. The expression

Code:
"ab|cd|ef|gh"

searches for any occurrence of "ab" or "cd" or "ef" or "gh". Therefore to add a pattern just add a "|" at the end and then the pattern you intend to search for.

I hope this helps.

bakunin
This User Gave Thanks to bakunin For This Post:
# 3  
Old 06-15-2016
Thank you Bakunin.

I have appended the keywords as you suggested.

Further I wanted to understand the code better.. can you please help me..
in the below code
Quote:
grep !/^\./
what does that mean....

Code:
@fails = grep /FileNotFoundException\|I\/O exception\|fail$/, ( grep !/^\./, readdir LOGDIR );

# 4  
Old 06-15-2016
Quote:
Originally Posted by ambarginni
Further I wanted to understand the code better.. can you please help me..
in the below code what does that mean....

Code:
@fails = grep /FileNotFoundException\|I\/O exception\|fail$/, ( grep !/^\./, readdir LOGDIR );

To be honest, i can only guess, because i habitually avoid perl like the plague. It seems to me that: !/^\./ is the regular expression, where "!" means: reverse the search, give every line NOT matched by the following expression.

The following regular expression is /^\./: /.../ is only the (traditional) delimiter (like the double quotes for strings: ".."), "^" at the beginning means "beginning of line" and "\." is an escaped (we had that already) literal full stop.

So the whole expression means "every line NOT starting with a full stop as first character.

To understand regexes better, here is a little introduction into the concept:

As i already said regular expressions are delimited by "/", so we will write it like that here, regardless of the tool we use to test it needing this delimiter or not. The simplest form of regular expression is to search for fixed strings:

Code:
/aBcde/

searches for the string "aBcde". Note that it searches for "aBcde" BUT NOT for "abcde"! Every character you put into the search string represents itself and itself only!

Now, this is of rather limited use and to make regexes more flexible there are so-called meta-characters. Metacharacters do not match themselves but modify the way other characters are matched. Suppose we would like to search for "aBcde" or "abcde" and do not want to care about capitalisation of the "b". This can be done by a "range":

Code:
/a[Bb]cde/

The [Bb] means: either "B" or "b" - but not both! The input string aBbcde would NOT be matched! you can signify ranges of characters using the dash:

Code:
/a[a-z]cde/         # all non-capitalized characters
/a[A-Za-z]cde/      # all capitalized or non-capitalized characters
/a[a-z0-9]cde/      # all non-capitalized characters or numbers
/x[0-9][0-9][0-9]x/ # a three-digit number surrounded by "x"

It is also possible to "invert" these ranges, by using a caret "^" as the first character inside the range:

Code:
/a[^0-9]cde/      # an "a" followed by any single charater except a number followed by "cde"

Note that inside these ranges all metacharacter LOSE their special meaning.

The next metacharacter is similar to that ranges but even more general: it is the full stop ".". It matches any single character:

Code:
/ab.de/    # any 5-digit string starting with "ab" and ending with "de"

All metacharacters revert only to their literal meaning if they are prepended by a backslash - even the backslash itself:

[code]/\..\./ # a dot followed by any single character followed by a dot
Code:
/\\..\./   # a backslash followed by two characters followed by a dot

Because the first backslash escapes the second one the second one counts as a simple backslash without the escaping ability. Therefore the previously escaped dot becomes a metacharacter again.

Now there are not only metacharacters to match certain other characters (or groups thereof) but which modify other expressions. The first you need is ther asterisk "*". It means zero or more of the expression before.

Code:
/ab*c/  # a followed by any number of b's (even none) followed by c

Notice that the last example matches "abc" and "abbbc" but also "ac"! If you want to match at least one "b" so that "ac" is not matched at all you need to double it:

Code:
/abb*c/  # matches "abc" and "abbc", "abbbc", etc. but not "ac"

There is also another construct to count the number of expressions to match: "\{m,n\}" where "m" and "n" are numbers. This works similar to the asterisk, but limits the number of allowed occurrences to be between m and n.

Code:
/ab\{1,3\}c/  # matches "abc" and "abbc" and "abbbc", but not "ac" or "abbbbc", etc.

You may have noticed that i talked about "expressions" rather than "characters" in the last part. In the simplest form an "expression" is a single character or metacharacter:

Code:
/@.\{2,3\}@/   # any two or three characters, surrounded by ats

But this is an "expression":

Code:
/![0-5]\{2,3\}!/   # any two or three digits 0-5, surrounded by exclamation marks

And characters or other expressions can be further grouped by braces:

Code:
/\([0-9]\{3\}\.\)\{2\}/   # two groups of each 3 digits followed by a literal dot

This would match "123.456." or "913.756.", etc.

Greedyness

There is a cause of endless misunderstandings caused by the range of a possible match. Consider the following input:

Code:
abcdXfdgdkjXsfdsdX2387X

Now suppose we have the following regular expression, which part of the above string would it match:

Code:
/a.*X/   # an "a" followed by any number of any characters followed by "X"

Possible answers:
abcdXfdgdkjXsfdsdX2387X
abcdXfdgdkjXsfdsdX2387X
abcdXfdgdkjXsfdsdX2387X
abcdXfdgdkjXsfdsdX2387X

The right answer, in all Unix-like regex machines is: the last one. because "any character" includes the X, the longest possible match is used. This is called "greedyness" of regular expressions. If it matches the longest possible string it is called "greedy" if it matches the shortest possible string this is called "non-greedy". Unix-regexes are usually greedy.

That rises the question how we would match the non-greedy variant. This is done usually with inverted character classes. The first answer above can be built this way:

Code:
/a[^X]*X/   # an "a" followed by any number of non-X followed by "X"

Now this is only a short introduction. If you want to know more about regexes you might want to read Dale Doughertys phantastic book "sed & awk", published by O'Reilly.

Further pointers: understand the (slight) differences between "extended regular expressions" (EREs) and "basic regular expressions" (BREs) - btw., i have introduced BREs here and that there are UNIX-EREs and UNIX-BREs and also GNU-EREs and GNU-BREs (used in the Linux counterparts of Unix utilities like sed and awk). There are also perl-REs, which is still a slightly different regexp-engine. They are all quite similar, though, and the basic workings are always the same, so if you know one you know about 90% of all the others too.

I hope this helps.

bakunin
This User Gave Thanks to bakunin For This Post:
# 5  
Old 06-15-2016
The slash is not a sepecial character in an RE, but it is a delimiter for an RE in perl.
I am not a perl expert, but it looks like you need to escape a constant / within the delimiting slashes
Code:
grep /I\/O exception|FileNotFoundException/, ...;

but not in a variable
Code:
my $pattern = "I/O exception|FileNotFoundException";
grep /$pattern/, ...;

Because grep takes any perl expression, one can take the m operator that allows other RE delimiters, for example the #
Code:
grep m#I/O exception|FileNotFoundException#, ...;


Last edited by MadeInGermany; 06-15-2016 at 02:27 PM..
# 6  
Old 06-15-2016
Quote:
Originally Posted by ambarginni
I have appended the keywords as you suggested.

Further I wanted to understand the code better.. can you please help me..
in the below code what does that mean....
Quote:
grep !/^\./
Code:
@fails = grep /FileNotFoundException\|I\/O exception\|fail$/, ( grep !/^\./, readdir LOGDIR );

Actually, grep !/^\./ should not be considered as stated. grep in Perl is not similar to the grep family of utilities in the Unix word. While often, it uses regular expression as part of the block or expression, that's the only similitude.
The proper way of thinking of grep is as this:
Code:
grep BLOCK LIST
grep EXPRESSION,LIST

Notice the coma.

The first will be seen as:
Code:
my @list_result = grep {!/^\./} @given_list;

Inside that block {} you can put a lot of normal code, in this case is a regular expression and the return of that expression is negated. If that evaluates to true, the content of $_ (which contains an element of the list @given_list) is appended to @list_result.

The second will be seen as:
Code:
my @list_result = grep !/^\./, @given_list;

or
Code:
my @list_result = grep (!/^\./, @given_list);

The difference between BLOCK and EXPRESSION is that the later only accepts one expression instead of whole block of code. For this purpose, it is the same.

Going back to this portion:
Code:
@fails = grep /FileNotFoundException\|I\/O exception\|fail$/, ( grep !/^\./, readdir LOGDIR );

First, you need to eliminate those escape characters highlighted in red, otherwise you are just making the | a normal pipe character and not the regex alternation, indicating OR
I\/O that needs to be escaped only because you are using the default delimiters m// and when the code gets parsed it sees the first / it reads until I/ and it thinks is done, and it does not know what to do with the rest.
If the default delimiter is used, the m is optional, but it is not optional if another character is used as delimiter.
Can you now see what that line of code is? Can you tell if it is a grep BLOCK LIST or a grep EXPRESSION,LIST?
It is a nested set of grep EXPRESSION,LIST.
The first one is grep !/^\./, readdir LOGDIR which will return a LIST of the filenames in LOGDIR that do not start with a period, being itself the LIST argument passed to the second grep /FileNotFoundException|I\/O exception|fail$/, (...) represented by the three dots.

As a note, the last grep is not reading the content of the files and searching for the strings FileNotFoundException or I/O exception or fail$. It's using those regex against the filename, if any of those filenames match in the name, that element gets appended to @fails.

Last edited by Aia; 06-15-2016 at 08:33 PM..
This User Gave Thanks to Aia For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Escaping Forward Slash

./split2.sh: line 1: split/ssl/pop3s.txt: No such file or directory sort: cannot read: split/ssl/pop3s.txt: No such file or directory Hi there, I am pulling data from the following source: ssl/http ssl/http ssl/http-alt ssl/https ssl/https ssl/https ssl/https ssl/https ssl/https... (3 Replies)
Discussion started by: alvinoo
3 Replies

2. UNIX for Dummies Questions & Answers

Issues with sort and forward slash

I have some directories I am trying to sort. When I attempt to sort them and they are in this format, everything works great: file /vol/trees10 /vol/trees2 /vol/trees7 cat file |sort -ts -k2 -n /vol/trees2 /vol/trees7 /vol/trees10 This makes thefiles in the order... (9 Replies)
Discussion started by: newbie2010
9 Replies

3. UNIX for Dummies Questions & Answers

Awk pattern with letters and forward slash

Hi, I have a tab delimited file "test.txt" like this: id1 342 C/T id2 7453 T/A/-/G/C id3 531 T/C id4 756 A/T/G id5 23 A/G id6 717 T/A/C id7 718 C/T/A And so on, with the possible choices for letters being A,C,T,G. I would like to exclude from my file all the lines that do not have... (3 Replies)
Discussion started by: francy.casa
3 Replies

4. Shell Programming and Scripting

AWK or SED to replace forward slash

hi hope somebody can help, there seems to be bit on the net about this, but still cant make it work the way i need. i have a file live this mm dd ff /dev/name1 mm dd ff /dev/name2 mm dd ff /dev/name3 mm dd ff /dev/name4 i need to update /dev/name1 etc to /newdev/new/name1 etc so... (5 Replies)
Discussion started by: dshakey
5 Replies

5. UNIX for Dummies Questions & Answers

Replace Forward Slash with sed

i need to replace '/' forward slash with \/(backward slash follwed by a forward slash) using sed command when the forward slash occurs as a first character in a file.. Tried something like this but doesn't seem to work. find $1 -print0 | xargs -0 sed -i -e 's/^\//\\\//g' Can someone... (19 Replies)
Discussion started by: depakjan
19 Replies

6. Shell Programming and Scripting

Significance of forward slash(/) while specifying a directory

What is the significance of the forward slash(/) while specifying a directory? cp -av /dir/ /opt/ and cp -av /dir /opt Does effectively the same job it seems? (2 Replies)
Discussion started by: proactiveaditya
2 Replies

7. Shell Programming and Scripting

Using sed to append backward slash before forward slash

Hi all, I need to know way of inserting backward slash before forward slash. My problem is that i need to supply directory path as an argument while invoking cshell script. This argument is further used in script (i.e. sed is used to insert this path in some file). So i need to place \ in front... (2 Replies)
Discussion started by: sarbjit
2 Replies

8. Shell Programming and Scripting

escaping / (forward slash)

how to escape / (forward slash) in a string. I have following scnerio: sed s/${var1}{$var2} var1 and var2 both contain slashes, but sed gives error if there is a slash in var1 or var2. sed is used here to replace var1 with var2. Thanks in advance (1 Reply)
Discussion started by: farooqpervaiz
1 Replies

9. Shell Programming and Scripting

Help with SED and forward slash

Using the script: (Called replaceit) #!/bin/ksh String=$1 Replace=$2 sed -e "s/${orig}/${new}/g" oldfile.txt > newfile.txt In oldfile.txt, I'm looking for: getenv("Work") And change it To: /u/web I execute the script: replaceit "getenv(\""Work\"")" /u/web I'm getting sed... (3 Replies)
Discussion started by: gseyforth
3 Replies

10. Shell Programming and Scripting

grep for forward slash

How can I use grep to grab a line that contains a forward slash? I've tried: grep "/pd " file, Inevitably it just grabs pd not /pd. (3 Replies)
Discussion started by: wxornot
3 Replies
Login or Register to Ask a Question