Pattern Matching


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Pattern Matching
# 8  
Old 04-12-2009
Quote:
Originally Posted by Aveltium
Hi, I'm very new to Linux and I'm sorry if this question is too dumb.
There are no dumb questions. Welcome on board. ;-)

Quote:
Originally Posted by Aveltium
If it is ok, please link me to some beginner guides for questions like this one.
We have a special "Tips and Tutorials" board here and if you use the search feature on "book recommendation" you will find a lot of threads.

My personal favourite regarding regular expressions is "sed & awk" by Dale Dougherty published by O'Reilly. It is well written with a good dose of humor and it covers everything there is to know about these two regex-based programs. There is a specialized book about regular expressions too from the same publisher. It is well written but i didn't like it as much as the aforementioned book.

Ok, having said this, here is a

Short (very short!) Introduction to Regular Expressions

As soon as you deal with text documents invariably you need to search for some content sooner or later. It is easy to search for strings, but in most cases (fixed) strings match not everything they are supposed to match or match things they are not supposed to match. Regexps are not searching for strings but searching for patterns and the regexp language is about describing these patterns.

Suppose you have a long text and want to find the word "colour".

(We will use a small Unix program called "grep" for the examples. It is given an expression to search for and a file in which it carries out the search. It will return all the lines containing the expression. The calling convention is "grep <expr> <file>".)

Ok, here is your first regular expression:

Code:
grep "colour" /path/to/file

That wasn't too hard, was it? Well, yes, but it isn't too useful either. We are just searching for a fixed string. Anyway, a fixed string is the simpliest, most basic form of a regular expression.

Now suppose that the text was written by several people, some speak english and some are american *) and therefore "colour" is sometimes written "colour" and sometimes "color". Of course we would like to find both versions ad we have to tell the program somehow that the "u" we are looking for is optional. We want to find "color" as well as "colour" but we wouldn't want to find words like "colonel-major", where something else then a "u" is between the "colo-" and the "-r". Here we go:

Code:
grep "colou*r" /path/to/file

The asterisk ("*") tells the regexp-program that the character preceeding it is optional.

We call this a "metacharacter". Most characters only match themselves: an "a" will match an "a" and nothing else (not even the "A", because regexps are case-sensitive). But some special characters do not match anything directly but change the way other characters are matched. A regular expression is usually a mixture of characters and metacharacters.

Looking at the output of the last command we see that it did match also the word "colourful" or "water-color". We might want to match only "colour" (however it is written) but not any conglomerate words.

We do this by matching only whitespace (blanks and tabs) before and after the word but exclude any other character. We use "character set" for this. It says "one of the following" characters (note that i use "<b>" for a blank and "<tab>" for a tab here because they are non-printing characters. Enter literal spaces and tabs instead when you type that in):

Code:
grep "[<b><tab>]colou*r[<b><tab>]" /path/to/file

Any ONE character inside "[...]" is matched, but not several! Therefore "d[ae]n" will match "dan" and "den" but not "dean".

-*-

Ok, so far. My time is limited today and i can't explain something in a few words others write books about. I hope you got an impression about how regular expressions work and upon request i might expand this text a little.



____________________
*) sorry - i just can't resist these opportunities ;-))






Quote:
Originally Posted by Aveltium
I want to check if the entered string is a number and has 4 digits.
The regex - without further explanation, but parts of it you will recognize - is:

"^[0-9]\{4\}$"

"^" used this way is the begin of a line, so the expression will only be found if it starts at the beginning"

"$" analogously end of line - we make sure the string contains only 4 digits

"[0-9]" is short for "[0123456789]", it is possible to use ranges instead of single characters to form sets

"\{n\}" match the previous expression (the brackets) exactly n times

Here is the whole script:

Code:
echo -n "Input: " ; read x
if [ $(echo "$x" | grep -c "^[0-9]\{4\}$") -eq 1 ] ; then
     echo "Valid"
else
     echo "Invalid"
fi

I hope this helps.

bakunin

Last edited by bakunin; 04-12-2009 at 09:25 AM..
This User Gave Thanks to bakunin For This Post:
# 9  
Old 04-12-2009
Wow holycow! Thanks a bunch!! I never got such nice answer like that! Very informative and helpful! Thanks again!!
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Big pattern file matching within another pattern file in awk or shell

Hi I need to do a patten match between files . I am new to shell scripting and have come up with this so far. It take 50 seconds to process files of 2mb size . I need to tune this code as file size will be around 50mb and need to save time. Main issue is that I need to search the pattern from... (2 Replies)
Discussion started by: nitin_daharwal
2 Replies

2. UNIX for Dummies Questions & Answers

Grep -v lines starting with pattern 1 and not matching pattern 2

Hi all! Thanks for taking the time to view this! I want to grep out all lines of a file that starts with pattern 1 but also does not match with the second pattern. Example: Drink a soda Eat a banana Eat multiple bananas Drink an apple juice Eat an apple Eat multiple apples I... (8 Replies)
Discussion started by: demmel
8 Replies

3. Shell Programming and Scripting

PHP - Regex for matching string containing pattern but without pattern itself

The sample file: dept1: user1,user2,user3 dept2: user4,user5,user6 dept3: user7,user8,user9 I want to match by '/^dept2.*/' but don't want to have substring 'dept2:' in output. How to compose such regex? (8 Replies)
Discussion started by: urello
8 Replies

4. Shell Programming and Scripting

Sed: printing lines AFTER pattern matching EXCLUDING the line containing the pattern

'Hi I'm using the following code to extract the lines(and redirect them to a txt file) after the pattern match. But the output is inclusive of the line with pattern match. Which option is to be used to exclude the line containing the pattern? sed -n '/Conn.*User/,$p' > consumers.txt (11 Replies)
Discussion started by: essem
11 Replies

5. UNIX for Dummies Questions & Answers

Find pattern suffix matching pattern

Hi, I am trying to get a result out of this but fails please help. Have two files /tmp/1 & /tmp/hosts. /tmp/1 IP=123.456.789.01 WAS_HOSTNAME=abcdefgh.was.tb.dsdc /tmp/hosts 123.456.789.01 I want this result in /tmp/hosts if hostname is already there dont want duplicate entry. ... (5 Replies)
Discussion started by: rajeshwebspere
5 Replies

6. Shell Programming and Scripting

sed - matching pattern one but not pattern two

All, I have the following file: -------------------------------------- # # /etc/pam.d/common-password - password-related modules common to all services # # This file is included from other service-specific PAM config files, # and should contain a list of modules that define the services... (2 Replies)
Discussion started by: RobertBerrie
2 Replies

7. Shell Programming and Scripting

counting the lines matching a pattern, in between two pattern, and generate a tab

Hi all, I'm looking for some help. I have a file (very long) that is organized like below: >Cluster 0 0 283nt, >01_FRYJ6ZM12HMXZS... at +/99% 1 279nt, >01_FRYJ6ZM12HN12A... at +/99% 2 281nt, >01_FRYJ6ZM12HM4TS... at +/99% 3 283nt, >01_FRYJ6ZM12HM946... at +/99% 4 279nt,... (4 Replies)
Discussion started by: d.chauliac
4 Replies

8. Shell Programming and Scripting

Pattern Matching

Hi Folks, I have the following requirement: I have a file that is containing numerous queries. The tables name mentioned in the queries are in the following format : SchemaName.Tablename. e.g COPDB.TableName. I need to take out all the COPDB.TableName pattern and write it to a different... (6 Replies)
Discussion started by: Siv_Pat
6 Replies

9. UNIX for Dummies Questions & Answers

Pattern Matching

Hi Folks, I have the following requirement: I have a file that is containing numerous queries. The tables name mentioned in the queries are in the following format : SchemaName.Tablename. e.g COPDB.TableName. I need to take out all the COPDB.TableName pattern and write it to a different... (0 Replies)
Discussion started by: Siv_Pat
0 Replies

10. Shell Programming and Scripting

comment/delete a particular pattern starting from second line of the matching pattern

Hi, I have file 1.txt with following entries as shown: 0152364|134444|10.20.30.40|015236433 0233654|122555|10.20.30.50|023365433 ** ** ** In file 2.txt I have the following entries as shown: 0152364|134444|10.20.30.40|015236433 0233654|122555|10.20.30.50|023365433... (4 Replies)
Discussion started by: imas
4 Replies
Login or Register to Ask a Question