Home Man
Search
Today's Posts
Register

Linux & Unix Commands - Search Man Pages

OpenSolaris 2009.06 - man page for pcrepartial (opensolaris section 3)

PCREPARTIAL(3)			     Library Functions Manual			   PCREPARTIAL(3)

NAME
       PCRE - Perl-compatible regular expressions

PARTIAL MATCHING IN PCRE

       In  normal  use	of  PCRE,  if  the  subject  string  that  is  passed  to  pcre_exec() or
       pcre_dfa_exec() matches as far as it goes, but is too short to match the  entire  pattern,
       PCRE_ERROR_NOMATCH  is returned. There are circumstances where it might be helpful to dis-
       tinguish this case from other cases in which there is no match.

       Consider, for example, an application where a human is required to  type  in  data  for	a
       field  with specific formatting requirements. An example might be a date in the form ddmm-
       myy, defined by this pattern:

	 ^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$

       If the application sees the user's keystrokes one by one, and can check that what has been
       typed  so  far  is potentially valid, it is able to raise an error as soon as a mistake is
       made, possibly beeping and not reflecting the character that has been typed. This  immedi-
       ate  feedback  is  likely to be a better user interface than a check that is delayed until
       the entire string has been entered.

       PCRE supports the concept of partial matching by means of the PCRE_PARTIAL  option,  which
       can  be	set  when  calling  pcre_exec()  or  pcre_dfa_exec().  When  this flag is set for
       pcre_exec(), the return code PCRE_ERROR_NOMATCH is converted into PCRE_ERROR_PARTIAL if at
       any  time  during the matching process the last part of the subject string matched part of
       the pattern. Unfortunately, for non-anchored matching, it is not possible  to  obtain  the
       position  of  the start of the partial match. No captured data is set when PCRE_ERROR_PAR-
       TIAL is returned.

       When PCRE_PARTIAL is set for pcre_dfa_exec(), the return code PCRE_ERROR_NOMATCH  is  con-
       verted  into  PCRE_ERROR_PARTIAL  if the end of the subject is reached, there have been no
       complete matches, but there is still at least one matching possibility. The portion of the
       string that provided the partial match is set as the first matching string.

       Using  PCRE_PARTIAL  disables one of PCRE's optimizations. PCRE remembers the last literal
       byte in a pattern, and abandons matching immediately if such a byte is not present in  the
       subject	string.  This  optimization  cannot be used for a subject string that might match
       only partially.

RESTRICTED PATTERNS FOR PCRE_PARTIAL

       Because of the way certain internal optimizations are implemented in the pcre_exec() func-
       tion,  the PCRE_PARTIAL option cannot be used with all patterns. These restrictions do not
       apply when pcre_dfa_exec() is used.  For pcre_exec(), repeated single characters such as

	 a{2,4}

       and repeated single metasequences such as

	 \d+

       are not permitted if the maximum number of occurrences  is  greater  than  one.	 Optional
       items  such  as \d? (where the maximum is one) are permitted.  Quantifiers with any values
       are permitted after parentheses, so the invalid examples above can be coded thus:

	 (a){2,4}
	 (\d)+

       These constructions run more slowly, but for the kinds of application that  are	envisaged
       for this facility, this is not felt to be a major restriction.

       If  PCRE_PARTIAL  is  set  for  a  pattern  that  does  not  conform  to the restrictions,
       pcre_exec()  returns  the  error  code  PCRE_ERROR_BADPARTIAL  (-13).   You  can  use  the
       PCRE_INFO_OKPARTIAL  call to pcre_fullinfo() to find out if a compiled pattern can be used
       for partial matching.

EXAMPLE OF PARTIAL MATCHING USING PCRETEST

       If the escape sequence \P is present in a pcretest data line,  the  PCRE_PARTIAL  flag  is
       used for the match. Here is a run of pcretest that uses the date example quoted above:

	   re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
	 data> 25jun04\P
	  0: 25jun04
	  1: jun
	 data> 25dec3\P
	 Partial match
	 data> 3ju\P
	 Partial match
	 data> 3juj\P
	 No match
	 data> j\P
	 No match

       The first data string is matched completely, so pcretest shows the matched substrings. The
       remaining four strings do not match the complete pattern, but the first	two  are  partial
       matches.  The  same  test,  using  pcre_dfa_exec()  matching  (by  means  of the \D escape
       sequence), produces the following output:

	   re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
	 data> 25jun04\P\D
	  0: 25jun04
	 data> 23dec3\P\D
	 Partial match: 23dec3
	 data> 3ju\P\D
	 Partial match: 3ju
	 data> 3juj\P\D
	 No match
	 data> j\P\D
	 No match

       Notice that in this case the portion of the string that was matched is made available.

MULTI-SEGMENT MATCHING WITH pcre_dfa_exec()

       When a partial match has been found using pcre_dfa_exec(), it is possible to continue  the
       match by providing additional subject data and calling pcre_dfa_exec() again with the same
       compiled regular expression, this time setting the PCRE_DFA_RESTART option. You must  also
       pass  the same working space as before, because this is where details of the previous par-
       tial match are stored. Here is an example using pcretest, using the \R escape sequence  to
       set the PCRE_DFA_RESTART option (\P and \D are as above):

	   re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
	 data> 23ja\P\D
	 Partial match: 23ja
	 data> n05\R\D
	  0: n05

       The  first  call has "23ja" as the subject, and requests partial matching; the second call
       has "n05" as the subject for the continued (restarted) match.  Notice that when the  match
       is  complete,  only the last part is shown; PCRE does not retain the previously partially-
       matched string. It is up to the calling program to do that if it needs to.

       You can set PCRE_PARTIAL with PCRE_DFA_RESTART to continue partial matching over  multiple
       segments.  This facility can be used to pass very long subject strings to pcre_dfa_exec().
       However, some care is needed for certain types of pattern.

       1. If the pattern contains tests for the beginning or end of a line, you need to pass  the
       PCRE_NOTBOL  or	PCRE_NOTEOL options, as appropriate, when the subject string for any call
       does not contain the beginning or end of a line.

       2. If the pattern contains backward assertions (including \b or \B), you need  to  arrange
       for some overlap in the subject strings to allow for this. For example, you could pass the
       subject in chunks that are 500 bytes long, but in a buffer of 700 bytes, with the starting
       offset set to 200 and the previous 200 bytes at the start of the buffer.

       3.  Matching a subject string that is split into multiple segments does not always produce
       exactly the same result as matching over one single long string.   The  difference  arises
       when  there  are  multiple matching possibilities, because a partial match result is given
       only when there are no completed matches in a call to pcre_dfa_exec(). This means that  as
       soon  as  the  shortest	match has been found, continuation to a new subject segment is no
       longer possible.  Consider this pcretest example:

	   re> /dog(sbody)?/
	 data> do\P\D
	 Partial match: do
	 data> gsb\R\P\D
	  0: g
	 data> dogsbody\D
	  0: dogsbody
	  1: dog

       The pattern matches the words "dog" or "dogsbody". When the subject is presented  in  sev-
       eral parts ("do" and "gsb" being the first two) the match stops when "dog" has been found,
       and it is not possible to continue. On the other hand, if "dogsbody"  is  presented  as	a
       single string, both matches are found.

       Because	of this phenomenon, it does not usually make sense to end a pattern that is going
       to be matched in this way with a variable repeat.

       4. Patterns that contain alternatives at the top level which do not  all  start	with  the
       same pattern item may not work as expected. For example, consider this pattern:

	 1234|3789

       If  the first part of the subject is "ABC123", a partial match of the first alternative is
       found at offset 3. There is no partial match for the second alternative,  because  such	a
       match  does not start at the same point in the subject string. Attempting to continue with
       the string "789" does not yield a match because only those alternatives that match at  one
       point  in  the  subject are remembered. The problem arises because the start of the second
       alternative matches within the first alternative. There is no problem with  anchored  pat-
       terns or patterns such as:

	 1234|ABCD

       where no string can be a partial match for both alternatives.

AUTHOR

       Philip Hazel
       University Computing Service
       Cambridge CB2 3QH, England.

REVISION

       Last updated: 04 June 2007
       Copyright (c) 1997-2007 University of Cambridge.

ATTRIBUTES
       See attributes(5) for descriptions of the following attributes:

       +--------------------+-----------------+
       |  ATTRIBUTE TYPE    | ATTRIBUTE VALUE |
       +--------------------+-----------------+
       |Availability	    | SUNWpcre	      |
       +--------------------+-----------------+
       |Interface Stability | Uncommitted     |
       +--------------------+-----------------+
NOTES
       Source for PCRE is available on http://opensolaris.org.

										   PCREPARTIAL(3)


All times are GMT -4. The time now is 03:57 PM.

Unix & Linux Forums Content Copyrightę1993-2018. All Rights Reserved.
UNIX.COM Login
Username:
Password:  
Show Password