Regular expression match


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Regular expression match
# 1  
Old 07-02-2015
Regular expression match

Code:
echo 20110101 | awk '{ print match($0,/^((17||18||19||20)[0-9][0-9]|[0-9][0-9])-*([1-9]|0[1-9]|1[012])-*([1-9]|0[1-9]|[12][0-9]|3[01])$/))

I am getting a match for the above, where as it shouldn't, as there is no hyphen in the echoed date.

Another question is what is the difference between || and | in the above statement
# 2  
Old 07-02-2015
The string || in an ERE (outside of a bracket expression) produces undefined results. Otherwise, a | in an ERE (outside of a bracket expression) separates alternatives to be matched by the ERE.

In an ERE, the expression -* matches zero or more hyphens.

In your awk statement, you not only have a few undefined terms in your ERE, you also have an extra ), a missing }, and a missing '; so there is no way that that awk statement produced any output other than a diagnostic message.

With the awk script:
Code:
echo 20110101 | awk '
{match($0,/^((17|18|19|20)[0-9][0-9]|[0-9][0-9])-*([1-9]|0[1-9]|1[012])-*([1-9]|0[1-9]|[12][0-9]|3[01])$/)
 print RSTART, RLENGTH
}'

the output is:
Code:
1 8

The matching parts of the ERE are marked in red.
# 3  
Old 07-02-2015
-* means any number of hyphens, 0...many. So 0 hyphen matches.
# 4  
Old 07-02-2015
I think what is missing from is here is the follow up comment that if you want to ensure that a hyphen is there, keep it as a literal character without any special meaning, i.e. drop the meta-character *


You would change this expression:-
Code:
....^((17|18|19|20)[0-9][0-9]|[0-9][0-9])-*([1-9]|0[1-9]|1[012])-*([1-9]|0[1-9]|[12][0-9]|3[01])$....

...to this:-
Code:
....^((17|18|19|20)[0-9][0-9]|[0-9][0-9])-([1-9]|0[1-9]|1[012])-([1-9]|0[1-9]|[12][0-9]|3[01])$....

Does this help clarify what you need to do?


Robin
# 5  
Old 07-02-2015
The part that I found strange about the given ERE is that it will accept dates like 2015115 (which is ambiguous) as well as 2015-1-15 and 2015-11-5 (both of which are clear as to where the break is between the month and day). And, although 20151-15 and 201511-5 might be unambiguous, I'm not sure that I would want to accept them as "valid" date input (and both of these are also accepted by the given ERE).
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Regular expression to match multiple lines?

Using a regular expression, I would like multiple lines to be matched. By default, a period (.) matches any character except newline. However, (?s) and /s modifiers are supposed to force . to accept a newline and to match any character including a newline. However, the following two perl... (4 Replies)
Discussion started by: LessNux
4 Replies

2. Shell Programming and Scripting

Perl split match regular expression with or

I cannot seem to get this to work correct: my ($k, $v) = split(/F/, $fc{$DIR}{symbolic}, 2); Below is the input (the $fc{$DIR}{symbolic} variable): QMH2562 FW:v5.06.03 DVR:v8.03.07.15.05.09-kbut i also need it to break on FV: Emulex NC553i FV4.2.401.6 DV8.3.5.86.2pthe code above... (2 Replies)
Discussion started by: rusted_planet
2 Replies

3. Homework & Coursework Questions

Regular Expression to match files in Perl

Hi Everybody! I need some help with a regular expression in Perl that will match files named messages, but also files named message.1, message.2 and so on. So really I need one that will find messages and messages that might be followed by a period and a digit without matching other files like... (2 Replies)
Discussion started by: Hax0rc1ph3r
2 Replies

4. Shell Programming and Scripting

regular expression exact match

hi everyone suppose we have two scenario echo ABCD | grep \{4\} DATE echo SYSDATE | grep \{4\} SYSDATE i want to match the string of four length only please help (5 Replies)
Discussion started by: aishsimplesweet
5 Replies

5. Shell Programming and Scripting

regular expression match

I am trying to match a similar line using grep with regular expression the line is /remote/mac/pbbbb/abc/def/hij/hop/include/abc/tif/element/test/testfiles/Office.cpp:57: const OfficeType& getType().get() const; I just need to extract the bold characters using grep with regular expression.... (5 Replies)
Discussion started by: prasbala
5 Replies

6. Shell Programming and Scripting

regular expression to match repeated appearance

Hi all, I am looking for a regex syntax to match repeated appearance. Likes, ']+]+' matches for string '65A SOME MORE AND 78B' Now, this gets messy if I need to extract all such repeated appearance. I don't want to write ] four or five times for matching repeated appearance. Thanks in... (2 Replies)
Discussion started by: guruparan18
2 Replies

7. Shell Programming and Scripting

Regular Expression to match repeated characters

Hello All I have file which contain sample data like below - test.txt ---------------------------------------------- jambesh aaa india trxxx sdasd mentor asss light train bbblah --------------------------------------------- I want to write a regX which would print only those... (4 Replies)
Discussion started by: jambesh
4 Replies

8. Shell Programming and Scripting

Regular expression match

Hi all, any idea how to match the following: char*<no or any string or space> buf and char *<no or any string or space> buf i need to capture the buf characters too. currently i need two checks to cover this: #search char* <any string> buf or char *<any string> buf @noarray =... (2 Replies)
Discussion started by: ChaMeN
2 Replies

9. UNIX for Dummies Questions & Answers

Regular Expression - match 'b' that follows 'a' and is at the end of a string

Hi, I'm struggling with a regex that would match a 'b' that follows an 'a' and is at the end of a string of non-white characters. For example: Line 1: aba abab b abb aab bab baa I can find the right strings but I'm lacking knowledge of how to "discard" the bits that precede bs.... (2 Replies)
Discussion started by: machinogodzilla
2 Replies

10. UNIX for Dummies Questions & Answers

Exact match with regular expression

Hi I have a file with data arranged into columns. The first column is the chromosome name. When I use grep to subset only rows with chr1, I get chr1 but also chr10, chr11,.. How do I get only rows with chr1? grep chr1 filein > fileout head fileout chr1 59757841 chr11 108258691 ... (2 Replies)
Discussion started by: jdhahbi
2 Replies
Login or Register to Ask a Question