Sponsored Content
Top Forums UNIX for Beginners Questions & Answers Using forward slash in search pattern in perl script Post 302975545 by bakunin on Wednesday 15th of June 2016 05:59:50 AM
Old 06-15-2016
Quote:
Originally Posted by ambarginni
Further I wanted to understand the code better.. can you please help me..
in the below code what does that mean....

Code:
@fails = grep /FileNotFoundException\|I\/O exception\|fail$/, ( grep !/^\./, readdir LOGDIR );

To be honest, i can only guess, because i habitually avoid perl like the plague. It seems to me that: !/^\./ is the regular expression, where "!" means: reverse the search, give every line NOT matched by the following expression.

The following regular expression is /^\./: /.../ is only the (traditional) delimiter (like the double quotes for strings: ".."), "^" at the beginning means "beginning of line" and "\." is an escaped (we had that already) literal full stop.

So the whole expression means "every line NOT starting with a full stop as first character.

To understand regexes better, here is a little introduction into the concept:

As i already said regular expressions are delimited by "/", so we will write it like that here, regardless of the tool we use to test it needing this delimiter or not. The simplest form of regular expression is to search for fixed strings:

Code:
/aBcde/

searches for the string "aBcde". Note that it searches for "aBcde" BUT NOT for "abcde"! Every character you put into the search string represents itself and itself only!

Now, this is of rather limited use and to make regexes more flexible there are so-called meta-characters. Metacharacters do not match themselves but modify the way other characters are matched. Suppose we would like to search for "aBcde" or "abcde" and do not want to care about capitalisation of the "b". This can be done by a "range":

Code:
/a[Bb]cde/

The [Bb] means: either "B" or "b" - but not both! The input string aBbcde would NOT be matched! you can signify ranges of characters using the dash:

Code:
/a[a-z]cde/         # all non-capitalized characters
/a[A-Za-z]cde/      # all capitalized or non-capitalized characters
/a[a-z0-9]cde/      # all non-capitalized characters or numbers
/x[0-9][0-9][0-9]x/ # a three-digit number surrounded by "x"

It is also possible to "invert" these ranges, by using a caret "^" as the first character inside the range:

Code:
/a[^0-9]cde/      # an "a" followed by any single charater except a number followed by "cde"

Note that inside these ranges all metacharacter LOSE their special meaning.

The next metacharacter is similar to that ranges but even more general: it is the full stop ".". It matches any single character:

Code:
/ab.de/    # any 5-digit string starting with "ab" and ending with "de"

All metacharacters revert only to their literal meaning if they are prepended by a backslash - even the backslash itself:

[code]/\..\./ # a dot followed by any single character followed by a dot
Code:
/\\..\./   # a backslash followed by two characters followed by a dot

Because the first backslash escapes the second one the second one counts as a simple backslash without the escaping ability. Therefore the previously escaped dot becomes a metacharacter again.

Now there are not only metacharacters to match certain other characters (or groups thereof) but which modify other expressions. The first you need is ther asterisk "*". It means zero or more of the expression before.

Code:
/ab*c/  # a followed by any number of b's (even none) followed by c

Notice that the last example matches "abc" and "abbbc" but also "ac"! If you want to match at least one "b" so that "ac" is not matched at all you need to double it:

Code:
/abb*c/  # matches "abc" and "abbc", "abbbc", etc. but not "ac"

There is also another construct to count the number of expressions to match: "\{m,n\}" where "m" and "n" are numbers. This works similar to the asterisk, but limits the number of allowed occurrences to be between m and n.

Code:
/ab\{1,3\}c/  # matches "abc" and "abbc" and "abbbc", but not "ac" or "abbbbc", etc.

You may have noticed that i talked about "expressions" rather than "characters" in the last part. In the simplest form an "expression" is a single character or metacharacter:

Code:
/@.\{2,3\}@/   # any two or three characters, surrounded by ats

But this is an "expression":

Code:
/![0-5]\{2,3\}!/   # any two or three digits 0-5, surrounded by exclamation marks

And characters or other expressions can be further grouped by braces:

Code:
/\([0-9]\{3\}\.\)\{2\}/   # two groups of each 3 digits followed by a literal dot

This would match "123.456." or "913.756.", etc.

Greedyness

There is a cause of endless misunderstandings caused by the range of a possible match. Consider the following input:

Code:
abcdXfdgdkjXsfdsdX2387X

Now suppose we have the following regular expression, which part of the above string would it match:

Code:
/a.*X/   # an "a" followed by any number of any characters followed by "X"

Possible answers:
abcdXfdgdkjXsfdsdX2387X
abcdXfdgdkjXsfdsdX2387X
abcdXfdgdkjXsfdsdX2387X
abcdXfdgdkjXsfdsdX2387X

The right answer, in all Unix-like regex machines is: the last one. because "any character" includes the X, the longest possible match is used. This is called "greedyness" of regular expressions. If it matches the longest possible string it is called "greedy" if it matches the shortest possible string this is called "non-greedy". Unix-regexes are usually greedy.

That rises the question how we would match the non-greedy variant. This is done usually with inverted character classes. The first answer above can be built this way:

Code:
/a[^X]*X/   # an "a" followed by any number of non-X followed by "X"

Now this is only a short introduction. If you want to know more about regexes you might want to read Dale Doughertys phantastic book "sed & awk", published by O'Reilly.

Further pointers: understand the (slight) differences between "extended regular expressions" (EREs) and "basic regular expressions" (BREs) - btw., i have introduced BREs here and that there are UNIX-EREs and UNIX-BREs and also GNU-EREs and GNU-BREs (used in the Linux counterparts of Unix utilities like sed and awk). There are also perl-REs, which is still a slightly different regexp-engine. They are all quite similar, though, and the basic workings are always the same, so if you know one you know about 90% of all the others too.

I hope this helps.

bakunin
This User Gave Thanks to bakunin For This Post:
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

grep for forward slash

How can I use grep to grab a line that contains a forward slash? I've tried: grep "/pd " file, Inevitably it just grabs pd not /pd. (3 Replies)
Discussion started by: wxornot
3 Replies

2. Shell Programming and Scripting

Help with SED and forward slash

Using the script: (Called replaceit) #!/bin/ksh String=$1 Replace=$2 sed -e "s/${orig}/${new}/g" oldfile.txt > newfile.txt In oldfile.txt, I'm looking for: getenv("Work") And change it To: /u/web I execute the script: replaceit "getenv(\""Work\"")" /u/web I'm getting sed... (3 Replies)
Discussion started by: gseyforth
3 Replies

3. Shell Programming and Scripting

escaping / (forward slash)

how to escape / (forward slash) in a string. I have following scnerio: sed s/${var1}{$var2} var1 and var2 both contain slashes, but sed gives error if there is a slash in var1 or var2. sed is used here to replace var1 with var2. Thanks in advance (1 Reply)
Discussion started by: farooqpervaiz
1 Replies

4. Shell Programming and Scripting

Using sed to append backward slash before forward slash

Hi all, I need to know way of inserting backward slash before forward slash. My problem is that i need to supply directory path as an argument while invoking cshell script. This argument is further used in script (i.e. sed is used to insert this path in some file). So i need to place \ in front... (2 Replies)
Discussion started by: sarbjit
2 Replies

5. Shell Programming and Scripting

Significance of forward slash(/) while specifying a directory

What is the significance of the forward slash(/) while specifying a directory? cp -av /dir/ /opt/ and cp -av /dir /opt Does effectively the same job it seems? (2 Replies)
Discussion started by: proactiveaditya
2 Replies

6. UNIX for Dummies Questions & Answers

Replace Forward Slash with sed

i need to replace '/' forward slash with \/(backward slash follwed by a forward slash) using sed command when the forward slash occurs as a first character in a file.. Tried something like this but doesn't seem to work. find $1 -print0 | xargs -0 sed -i -e 's/^\//\\\//g' Can someone... (19 Replies)
Discussion started by: depakjan
19 Replies

7. Shell Programming and Scripting

AWK or SED to replace forward slash

hi hope somebody can help, there seems to be bit on the net about this, but still cant make it work the way i need. i have a file live this mm dd ff /dev/name1 mm dd ff /dev/name2 mm dd ff /dev/name3 mm dd ff /dev/name4 i need to update /dev/name1 etc to /newdev/new/name1 etc so... (5 Replies)
Discussion started by: dshakey
5 Replies

8. UNIX for Dummies Questions & Answers

Awk pattern with letters and forward slash

Hi, I have a tab delimited file "test.txt" like this: id1 342 C/T id2 7453 T/A/-/G/C id3 531 T/C id4 756 A/T/G id5 23 A/G id6 717 T/A/C id7 718 C/T/A And so on, with the possible choices for letters being A,C,T,G. I would like to exclude from my file all the lines that do not have... (3 Replies)
Discussion started by: francy.casa
3 Replies

9. UNIX for Dummies Questions & Answers

Issues with sort and forward slash

I have some directories I am trying to sort. When I attempt to sort them and they are in this format, everything works great: file /vol/trees10 /vol/trees2 /vol/trees7 cat file |sort -ts -k2 -n /vol/trees2 /vol/trees7 /vol/trees10 This makes thefiles in the order... (9 Replies)
Discussion started by: newbie2010
9 Replies

10. Shell Programming and Scripting

Escaping Forward Slash

./split2.sh: line 1: split/ssl/pop3s.txt: No such file or directory sort: cannot read: split/ssl/pop3s.txt: No such file or directory Hi there, I am pulling data from the following source: ssl/http ssl/http ssl/http-alt ssl/https ssl/https ssl/https ssl/https ssl/https ssl/https... (3 Replies)
Discussion started by: alvinoo
3 Replies
All times are GMT -4. The time now is 08:49 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy