Unix/Linux Go Back    


CentOS 7.0 - man page for parse::recdescent (centos section 3)

Linux & Unix Commands - Search Man Pages
Man Page or Keyword Search:   man
Select Man Page Set:       apropos Keyword Search (sections above)


Parse::RecDescent(3)	       User Contributed Perl Documentation	     Parse::RecDescent(3)

NAME
       Parse::RecDescent - Generate Recursive-Descent Parsers

VERSION
       This document describes version 1.967009 of Parse::RecDescent released March 16th, 2012.

SYNOPSIS
	use Parse::RecDescent;

	# Generate a parser from the specification in $grammar:

	    $parser = new Parse::RecDescent ($grammar);

	# Generate a parser from the specification in $othergrammar

	    $anotherparser = new Parse::RecDescent ($othergrammar);

	# Parse $text using rule 'startrule' (which must be
	# defined in $grammar):

	   $parser->startrule($text);

	# Parse $text using rule 'otherrule' (which must also
	# be defined in $grammar):

	    $parser->otherrule($text);

	# Change the universal token prefix pattern
	# before building a grammar
	# (the default is: '\s*'):

	   $Parse::RecDescent::skip = '[ \t]+';

	# Replace productions of existing rules (or create new ones)
	# with the productions defined in $newgrammar:

	   $parser->Replace($newgrammar);

	# Extend existing rules (or create new ones)
	# by adding extra productions defined in $moregrammar:

	   $parser->Extend($moregrammar);

	# Global flags (useful as command line arguments under -s):

	   $::RD_ERRORS       # unless undefined, report fatal errors
	   $::RD_WARN	      # unless undefined, also report non-fatal problems
	   $::RD_HINT	      # if defined, also suggestion remedies
	   $::RD_TRACE	      # if defined, also trace parsers' behaviour
	   $::RD_AUTOSTUB     # if defined, generates "stubs" for undefined rules
	   $::RD_AUTOACTION   # if defined, appends specified action to productions

DESCRIPTION
   Overview
       Parse::RecDescent incrementally generates top-down recursive-descent text parsers from
       simple yacc-like grammar specifications. It provides:

       o   Regular expressions or literal strings as terminals (tokens),

       o   Multiple (non-contiguous) productions for any rule,

       o   Repeated and optional subrules within productions,

       o   Full access to Perl within actions specified as part of the grammar,

       o   Simple automated error reporting during parser generation and parsing,

       o   The ability to commit to, uncommit to, or reject particular productions during a
	   parse,

       o   The ability to pass data up and down the parse tree ("down" via subrule argument
	   lists, "up" via subrule return values)

       o   Incremental extension of the parsing grammar (even during a parse),

       o   Precompilation of parser objects,

       o   User-definable reduce-reduce conflict resolution via "scoring" of matching
	   productions.

   Using "Parse::RecDescent"
       Parser objects are created by calling "Parse::RecDescent::new", passing in a grammar
       specification (see the following subsections). If the grammar is correct, "new" returns a
       blessed reference which can then be used to initiate parsing through any rule specified in
       the original grammar. A typical sequence looks like this:

	   $grammar = q {
	       # GRAMMAR SPECIFICATION HERE
		};

	   $parser = new Parse::RecDescent ($grammar) or die "Bad grammar!\n";

	   # acquire $text

	   defined $parser->startrule($text) or print "Bad text!\n";

       The rule through which parsing is initiated must be explicitly defined in the grammar
       (i.e. for the above example, the grammar must include a rule of the form: "startrule:
       <subrules>".

       If the starting rule succeeds, its value (see below) is returned. Failure to generate the
       original parser or failure to match a text is indicated by returning "undef". Note that
       it's easy to set up grammars that can succeed, but which return a value of 0, "0", or "".
       So don't be tempted to write:

	   $parser->startrule($text) or print "Bad text!\n";

       Normally, the parser has no effect on the original text. So in the previous example the
       value of $text would be unchanged after having been parsed.

       If, however, the text to be matched is passed by reference:

	   $parser->startrule(\$text)

       then any text which was consumed during the match will be removed from the start of $text.

   Rules
       In the grammar from which the parser is built, rules are specified by giving an identifier
       (which must satisfy /[A-Za-z]\w*/), followed by a colon on the same line, followed by one
       or more productions, separated by single vertical bars. The layout of the productions is
       entirely free-format:

	   rule1:  production1
	    |  production2 |
	   production3 | production4

       At any point in the grammar previously defined rules may be extended with additional
       productions. This is achieved by redeclaring the rule with the new productions. Thus:

	   rule1: a | b | c
	   rule2: d | e | f
	   rule1: g | h

       is exactly equivalent to:

	   rule1: a | b | c | g | h
	   rule2: d | e | f

       Each production in a rule consists of zero or more items, each of which may be either: the
       name of another rule to be matched (a "subrule"), a pattern or string literal to be
       matched directly (a "token"), a block of Perl code to be executed (an "action"), a special
       instruction to the parser (a "directive"), or a standard Perl comment (which is ignored).

       A rule matches a text if one of its productions matches. A production matches if each of
       its items match consecutive substrings of the text. The productions of a rule being
       matched are tried in the same order that they appear in the original grammar, and the
       first matching production terminates the match attempt (successfully). If all productions
       are tried and none matches, the match attempt fails.

       Note that this behaviour is quite different from the "prefer the longer match" behaviour
       of yacc. For example, if yacc were parsing the rule:

	   seq : 'A' 'B'
	   | 'A' 'B' 'C'

       upon matching "AB" it would look ahead to see if a 'C' is next and, if so, will match the
       second production in preference to the first. In other words, yacc effectively tries all
       the productions of a rule breadth-first in parallel, and selects the "best" match, where
       "best" means longest (note that this is a gross simplification of the true behaviour of
       yacc but it will do for our purposes).

       In contrast, "Parse::RecDescent" tries each production depth-first in sequence, and
       selects the "best" match, where "best" means first. This is the fundamental difference
       between "bottom-up" and "recursive descent" parsing.

       Each successfully matched item in a production is assigned a value, which can be accessed
       in subsequent actions within the same production (or, in some cases, as the return value
       of a successful subrule call). Unsuccessful items don't have an associated value, since
       the failure of an item causes the entire surrounding production to immediately fail. The
       following sections describe the various types of items and their success values.

   Subrules
       A subrule which appears in a production is an instruction to the parser to attempt to
       match the named rule at that point in the text being parsed. If the named subrule is not
       defined when requested the production containing it immediately fails (unless it was
       "autostubbed" - see Autostubbing).

       A rule may (recursively) call itself as a subrule, but not as the left-most item in any of
       its productions (since such recursions are usually non-terminating).

       The value associated with a subrule is the value associated with its $return variable (see
       "Actions" below), or with the last successfully matched item in the subrule match.

       Subrules may also be specified with a trailing repetition specifier, indicating that they
       are to be (greedily) matched the specified number of times. The available specifiers are:

	   subrule(?)  # Match one-or-zero times
	   subrule(s)  # Match one-or-more times
	   subrule(s?) # Match zero-or-more times
	   subrule(N)  # Match exactly N times for integer N > 0
	   subrule(N..M)   # Match between N and M times
	   subrule(..M)    # Match between 1 and M times
	   subrule(N..)    # Match at least N times

       Repeated subrules keep matching until either the subrule fails to match, or it has matched
       the minimal number of times but fails to consume any of the parsed text (this second
       condition prevents the subrule matching forever in some cases).

       Since a repeated subrule may match many instances of the subrule itself, the value
       associated with it is not a simple scalar, but rather a reference to a list of scalars,
       each of which is the value associated with one of the individual subrule matches. In other
       words in the rule:

	   program: statement(s)

       the value associated with the repeated subrule "statement(s)" is a reference to an array
       containing the values matched by each call to the individual subrule "statement".

       Repetition modifiers may include a separator pattern:

	   program: statement(s /;/)

       specifying some sequence of characters to be skipped between each repetition.  This is
       really just a shorthand for the <leftop:...> directive (see below).

   Tokens
       If a quote-delimited string or a Perl regex appears in a production, the parser attempts
       to match that string or pattern at that point in the text. For example:

	   typedef: "typedef" typename identifier ';'

	   identifier: /[A-Za-z_][A-Za-z0-9_]*/

       As in regular Perl, a single quoted string is uninterpolated, whilst a double-quoted
       string or a pattern is interpolated (at the time of matching, not when the parser is
       constructed). Hence, it is possible to define rules in which tokens can be set at run-
       time:

	   typedef: "$::typedefkeyword" typename identifier ';'

	   identifier: /$::identpat/

       Note that, since each rule is implemented inside a special namespace belonging to its
       parser, it is necessary to explicitly quantify variables from the main package.

       Regex tokens can be specified using just slashes as delimiters or with the explicit
       "m<delimiter>......<delimiter>" syntax:

	   typedef: "typedef" typename identifier ';'

	   typename: /[A-Za-z_][A-Za-z0-9_]*/

	   identifier: m{[A-Za-z_][A-Za-z0-9_]*}

       A regex of either type can also have any valid trailing parameter(s) (that is, any of
       [cgimsox]):

	   typedef: "typedef" typename identifier ';'

	   identifier: / [a-z_]        # LEADING ALPHA OR UNDERSCORE
		 [a-z0-9_]*    # THEN DIGITS ALSO ALLOWED
	       /ix     # CASE/SPACE/COMMENT INSENSITIVE

       The value associated with any successfully matched token is a string containing the actual
       text which was matched by the token.

       It is important to remember that, since each grammar is specified in a Perl string, all
       instances of the universal escape character '\' within a grammar must be "doubled", so
       that they interpolate to single '\'s when the string is compiled. For example, to use the
       grammar:

	   word:       /\S+/ | backslash
	   line:       prefix word(s) "\n"
	   backslash:  '\\'

       the following code is required:

	   $parser = new Parse::RecDescent (q{

	       word:   /\\S+/ | backslash
	       line:   prefix word(s) "\\n"
	       backslash:  '\\\\'

	   });

   Anonymous subrules
       Parentheses introduce a nested scope that is very like a call to an anonymous subrule.
       Hence they are useful for "in-lining" subroutine calls, and other kinds of grouping
       behaviour. For example, instead of:

	   word:       /\S+/ | backslash
	   line:       prefix word(s) "\n"

       you could write:

	   line:       prefix ( /\S+/ | backslash )(s) "\n"

       and get exactly the same effects.

       Parentheses are also use for collecting unrepeated alternations within a single
       production.

	   secret_identity: "Mr" ("Incredible"|"Fantastic"|"Sheen") ", Esq."

   Terminal Separators
       For the purpose of matching, each terminal in a production is considered to be preceded by
       a "prefix" - a pattern which must be matched before a token match is attempted. By
       default, the prefix is optional whitespace (which always matches, at least trivially), but
       this default may be reset in any production.

       The variable $Parse::RecDescent::skip stores the universal prefix, which is the default
       for all terminal matches in all parsers built with "Parse::RecDescent".

       If you want to change the universal prefix using $Parse::RecDescent::skip, be careful to
       set it before creating the grammar object, because it is applied statically (when a
       grammar is built) rather than dynamically (when the grammar is used).  Alternatively you
       can provide a global "<skip:...>" directive in your grammar before any rules (described
       later).

       The prefix for an individual production can be altered by using the "<skip:...>" directive
       (described later).  Setting this directive in the top-level rule is an alternative
       approach to setting $Parse::RecDescent::skip before creating the object, but in this case
       you don't get the intended skipping behaviour if you directly invoke methods different
       from the top-level rule.

   Actions
       An action is a block of Perl code which is to be executed (as the block of a "do"
       statement) when the parser reaches that point in a production. The action executes within
       a special namespace belonging to the active parser, so care must be taken in correctly
       qualifying variable names (see also "Start-up Actions" below).

       The action is considered to succeed if the final value of the block is defined (that is,
       if the implied "do" statement evaluates to a defined value - even one which would be
       treated as "false"). Note that the value associated with a successful action is also the
       final value in the block.

       An action will fail if its last evaluated value is "undef". This is surprisingly easy to
       accomplish by accident. For instance, here's an infuriating case of an action that makes
       its production fail, but only when debugging isn't activated:

	   description: name rank serial_number
	       { print "Got $item[2] $item[1] ($item[3])\n"
	       if $::debugging
	       }

       If $debugging is false, no statement in the block is executed, so the final value is
       "undef", and the entire production fails. The solution is:

	   description: name rank serial_number
	       { print "Got $item[2] $item[1] ($item[3])\n"
	       if $::debugging;
		 1;
	       }

       Within an action, a number of useful parse-time variables are available in the special
       parser namespace (there are other variables also accessible, but meddling with them will
       probably just break your parser. As a general rule, if you avoid referring to unqualified
       variables - especially those starting with an underscore - inside an action, things should
       be okay):

       @item and %item
	   The array slice @item[1..$#item] stores the value associated with each item (that is,
	   each subrule, token, or action) in the current production. The analogy is to $1, $2,
	   etc. in a yacc grammar.  Note that, for obvious reasons, @item only contains the
	   values of items before the current point in the production.

	   The first element ($item[0]) stores the name of the current rule being matched.

	   @item is a standard Perl array, so it can also be indexed with negative numbers,
	   representing the number of items back from the current position in the parse:

	       stuff: /various/ bits 'and' pieces "then" data 'end'
		   { print $item[-2] }	# PRINTS data
			# (EASIER THAN: $item[6])

	   The %item hash complements the <@item> array, providing named access to the same item
	   values:

	       stuff: /various/ bits 'and' pieces "then" data 'end'
		   { print $item{data}	# PRINTS data
			# (EVEN EASIER THAN USING @item)

	   The results of named subrules are stored in the hash under each subrule's name
	   (including the repetition specifier, if any), whilst all other items are stored under
	   a "named positional" key that indictates their ordinal position within their item
	   type: __STRINGn__, __PATTERNn__, __DIRECTIVEn__, __ACTIONn__:

	       stuff: /various/ bits 'and' pieces "then" data 'end' { save }
		   { print $item{__PATTERN1__}, # PRINTS 'various'
		   $item{__STRING2__},	# PRINTS 'then'
		   $item{__ACTION1__},	# PRINTS RETURN
			    # VALUE OF save
		   }

	   If you want proper named access to patterns or literals, you need to turn them into
	   separate rules:

	       stuff: various bits 'and' pieces "then" data 'end'
		   { print $item{various}  # PRINTS various
		   }

	       various: /various/

	   The special entry $item{__RULE__} stores the name of the current rule (i.e. the same
	   value as $item[0].

	   The advantage of using %item, instead of @items is that it removes the need to track
	   items positions that may change as a grammar evolves. For example, adding an interim
	   "<skip>" directive of action can silently ruin a trailing action, by moving an @item
	   element "down" the array one place. In contrast, the named entry of %item is
	   unaffected by such an insertion.

	   A limitation of the %item hash is that it only records the last value of a particular
	   subrule. For example:

	       range: '(' number '..' number )'
		   { $return = $item{number} }

	   will return only the value corresponding to the second match of the "number" subrule.
	   In other words, successive calls to a subrule overwrite the corresponding entry in
	   %item. Once again, the solution is to rename each subrule in its own rule:

	       range: '(' from_num '..' to_num ')'
		   { $return = $item{from_num} }

	       from_num: number
	       to_num:	 number

       @arg and %arg
	   The array @arg and the hash %arg store any arguments passed to the rule from some
	   other rule (see "Subrule argument lists"). Changes to the elements of either variable
	   do not propagate back to the calling rule (data can be passed back from a subrule via
	   the $return variable - see next item).

       $return
	   If a value is assigned to $return within an action, that value is returned if the
	   production containing the action eventually matches successfully. Note that setting
	   $return doesn't cause the current production to succeed. It merely tells it what to
	   return if it does succeed.  Hence $return is analogous to $$ in a yacc grammar.

	   If $return is not assigned within a production, the value of the last component of the
	   production (namely: $item[$#item]) is returned if the production succeeds.

       $commit
	   The current state of commitment to the current production (see "Directives" below).

       $skip
	   The current terminal prefix (see "Directives" below).

       $text
	   The remaining (unparsed) text. Changes to $text do not propagate out of unsuccessful
	   productions, but do survive successful productions. Hence it is possible to
	   dynamically alter the text being parsed - for example, to provide a "#include"-like
	   facility:

	       hash_include: '#include' filename
		   { $text = ::loadfile($item[2]) . $text }

	       filename: '<' /[a-z0-9._-]+/i '>'  { $return = $item[2] }
	       | '"' /[a-z0-9._-]+/i '"'  { $return = $item[2] }

       $thisline and $prevline
	   $thisline stores the current line number within the current parse (starting from 1).
	   $prevline stores the line number for the last character which was already successfully
	   parsed (this will be different from $thisline at the end of each line).

	   For efficiency, $thisline and $prevline are actually tied hashes, and only recompute
	   the required line number when the variable's value is used.

	   Assignment to $thisline adjusts the line number calculator, so that it believes that
	   the current line number is the value being assigned. Note that this adjustment will be
	   reflected in all subsequent line numbers calculations.

	   Modifying the value of the variable $text (as in the previous "hash_include" example,
	   for instance) will confuse the line counting mechanism. To prevent this, you should
	   call "Parse::RecDescent::LineCounter::resync($thisline)" immediately after any
	   assignment to the variable $text (or, at least, before the next attempt to use
	   $thisline).

	   Note that if a production fails after assigning to or resync'ing $thisline, the
	   parser's line counter mechanism will usually be corrupted.

	   Also see the entry for @itempos.

	   The line number can be set to values other than 1, by calling the start rule with a
	   second argument. For example:

	       $parser = new Parse::RecDescent ($grammar);

	       $parser->input($text, 10);  # START LINE NUMBERS AT 10

       $thiscolumn and $prevcolumn
	   $thiscolumn stores the current column number within the current line being parsed
	   (starting from 1). $prevcolumn stores the column number of the last character which
	   was actually successfully parsed. Usually "$prevcolumn == $thiscolumn-1", but not at
	   the end of lines.

	   For efficiency, $thiscolumn and $prevcolumn are actually tied hashes, and only
	   recompute the required column number when the variable's value is used.

	   Assignment to $thiscolumn or $prevcolumn is a fatal error.

	   Modifying the value of the variable $text (as in the previous "hash_include" example,
	   for instance) may confuse the column counting mechanism.

	   Note that $thiscolumn reports the column number before any whitespace that might be
	   skipped before reading a token. Hence if you wish to know where a token started (and
	   ended) use something like this:

	       rule: token1 token2 startcol token3 endcol token4
		   { print "token3: columns $item[3] to $item[5]"; }

	       startcol: '' { $thiscolumn }    # NEED THE '' TO STEP PAST TOKEN SEP
	       endcol:	{ $prevcolumn }

	   Also see the entry for @itempos.

       $thisoffset and $prevoffset
	   $thisoffset stores the offset of the current parsing position within the complete text
	   being parsed (starting from 0). $prevoffset stores the offset of the last character
	   which was actually successfully parsed. In all cases "$prevoffset == $thisoffset-1".

	   For efficiency, $thisoffset and $prevoffset are actually tied hashes, and only
	   recompute the required offset when the variable's value is used.

	   Assignment to $thisoffset or <$prevoffset> is a fatal error.

	   Modifying the value of the variable $text will not affect the offset counting
	   mechanism.

	   Also see the entry for @itempos.

       @itempos
	   The array @itempos stores a hash reference corresponding to each element of @item. The
	   elements of the hash provide the following:

	       $itempos[$n]{offset}{from}  # VALUE OF $thisoffset BEFORE $item[$n]
	       $itempos[$n]{offset}{to}    # VALUE OF $prevoffset AFTER $item[$n]
	       $itempos[$n]{line}{from}    # VALUE OF $thisline BEFORE $item[$n]
	       $itempos[$n]{line}{to}  # VALUE OF $prevline AFTER $item[$n]
	       $itempos[$n]{column}{from}  # VALUE OF $thiscolumn BEFORE $item[$n]
	       $itempos[$n]{column}{to}    # VALUE OF $prevcolumn AFTER $item[$n]

	   Note that the various "$itempos[$n]...{from}" values record the appropriate value
	   after any token prefix has been skipped.

	   Hence, instead of the somewhat tedious and error-prone:

	       rule: startcol token1 endcol
		 startcol token2 endcol
		 startcol token3 endcol
		   { print "token1: columns $item[1]
			 to $item[3]
		    token2: columns $item[4]
			 to $item[6]
		    token3: columns $item[7]
			 to $item[9]" }

	       startcol: '' { $thiscolumn }    # NEED THE '' TO STEP PAST TOKEN SEP
	       endcol:	{ $prevcolumn }

	   it is possible to write:

	       rule: token1 token2 token3
		   { print "token1: columns $itempos[1]{column}{from}
			 to $itempos[1]{column}{to}
		    token2: columns $itempos[2]{column}{from}
			 to $itempos[2]{column}{to}
		    token3: columns $itempos[3]{column}{from}
			 to $itempos[3]{column}{to}" }

	   Note however that (in the current implementation) the use of @itempos anywhere in a
	   grammar implies that item positioning information is collected everywhere during the
	   parse. Depending on the grammar and the size of the text to be parsed, this may be
	   prohibitively expensive and the explicit use of $thisline, $thiscolumn, etc. may be a
	   better choice.

       $thisparser
	   A reference to the "Parse::RecDescent" object through which parsing was initiated.

	   The value of $thisparser propagates down the subrules of a parse but not back up.
	   Hence, you can invoke subrules from another parser for the scope of the current rule
	   as follows:

	       rule: subrule1 subrule2
	       | { $thisparser = $::otherparser } <reject>
	       | subrule3 subrule4
	       | subrule5

	   The result is that the production calls "subrule1" and "subrule2" of the current
	   parser, and the remaining productions call the named subrules from $::otherparser.
	   Note, however that "Bad Things" will happen if "::otherparser" isn't a blessed
	   reference and/or doesn't have methods with the same names as the required subrules!

       $thisrule
	   A reference to the "Parse::RecDescent::Rule" object corresponding to the rule
	   currently being matched.

       $thisprod
	   A reference to the "Parse::RecDescent::Production" object corresponding to the
	   production currently being matched.

       $score and $score_return
	   $score stores the best production score to date, as specified by an earlier
	   "<score:...>" directive. $score_return stores the corresponding return value for the
	   successful production.

	   See "Scored productions".

       Warning: the parser relies on the information in the various "this..."  objects in some
       non-obvious ways. Tinkering with the other members of these objects will probably cause
       Bad Things to happen, unless you really know what you're doing. The only exception to this
       advice is that the use of "$this...->{local}" is always safe.

   Start-up Actions
       Any actions which appear before the first rule definition in a grammar are treated as
       "start-up" actions. Each such action is stripped of its outermost brackets and then
       evaluated (in the parser's special namespace) just before the rules of the grammar are
       first compiled.

       The main use of start-up actions is to declare local variables within the parser's special
       namespace:

	   { my $lastitem = '???'; }

	   list: item(s)   { $return = $lastitem }

	   item: book  { $lastitem = 'book'; }
	     bell  { $lastitem = 'bell'; }
	     candle    { $lastitem = 'candle'; }

       but start-up actions can be used to execute any valid Perl code within a parser's special
       namespace.

       Start-up actions can appear within a grammar extension or replacement (that is, a partial
       grammar installed via "Parse::RecDescent::Extend()" or "Parse::RecDescent::Replace()" -
       see "Incremental Parsing"), and will be executed before the new grammar is installed.
       Note, however, that a particular start-up action is only ever executed once.

   Autoactions
       It is sometimes desirable to be able to specify a default action to be taken at the end of
       every production (for example, in order to easily build a parse tree). If the variable
       $::RD_AUTOACTION is defined when "Parse::RecDescent::new()" is called, the contents of
       that variable are treated as a specification of an action which is to appended to each
       production in the corresponding grammar.

       Alternatively, you can hard-code the autoaction within a grammar, using the
       "<autoaction:...>" directive.

       So, for example, to construct a simple parse tree you could write:

	   $::RD_AUTOACTION = q { [@item] };

	   parser = Parse::RecDescent->new(q{
	   expression: and_expr '||' expression | and_expr
	   and_expr:   not_expr '&&' and_expr	| not_expr
	   not_expr:   '!' brack_expr	    | brack_expr
	   brack_expr: '(' expression ')'	| identifier
	   identifier: /[a-z]+/i
	   });

       or:

	   parser = Parse::RecDescent->new(q{
	   <autoaction: { [@item] } >

	   expression: and_expr '||' expression | and_expr
	   and_expr:   not_expr '&&' and_expr	| not_expr
	   not_expr:   '!' brack_expr	    | brack_expr
	   brack_expr: '(' expression ')'	| identifier
	   identifier: /[a-z]+/i
	   });

       Either of these is equivalent to:

	   parser = new Parse::RecDescent (q{
	   expression: and_expr '||' expression
	       { [@item] }
	     | and_expr
	       { [@item] }

	   and_expr:   not_expr '&&' and_expr
	       { [@item] }
	   |   not_expr
	       { [@item] }

	   not_expr:   '!' brack_expr
	       { [@item] }
	   |   brack_expr
	       { [@item] }

	   brack_expr: '(' expression ')'
	       { [@item] }
	     | identifier
	       { [@item] }

	   identifier: /[a-z]+/i
	       { [@item] }
	   });

       Alternatively, we could take an object-oriented approach, use different classes for each
       node (and also eliminating redundant intermediate nodes):

	   $::RD_AUTOACTION = q
	     { $#item==1 ? $item[1] : "$item[0]_node"->new(@item[1..$#item]) };

	   parser = Parse::RecDescent->new(q{
	       expression: and_expr '||' expression | and_expr
	       and_expr:   not_expr '&&' and_expr   | not_expr
	       not_expr:   '!' brack_expr	    | brack_expr
	       brack_expr: '(' expression ')'	    | identifier
	       identifier: /[a-z]+/i
	   });

       or:

	   parser = Parse::RecDescent->new(q{
	       <autoaction:
		 $#item==1 ? $item[1] : "$item[0]_node"->new(@item[1..$#item])
	       >

	       expression: and_expr '||' expression | and_expr
	       and_expr:   not_expr '&&' and_expr   | not_expr
	       not_expr:   '!' brack_expr	    | brack_expr
	       brack_expr: '(' expression ')'	    | identifier
	       identifier: /[a-z]+/i
	   });

       which are equivalent to:

	   parser = Parse::RecDescent->new(q{
	       expression: and_expr '||' expression
		   { "expression_node"->new(@item[1..3]) }
	       | and_expr

	       and_expr:   not_expr '&&' and_expr
		   { "and_expr_node"->new(@item[1..3]) }
	       |   not_expr

	       not_expr:   '!' brack_expr
		   { "not_expr_node"->new(@item[1..2]) }
	       |   brack_expr

	       brack_expr: '(' expression ')'
		   { "brack_expr_node"->new(@item[1..3]) }
	       | identifier

	       identifier: /[a-z]+/i
		   { "identifer_node"->new(@item[1]) }
	   });

       Note that, if a production already ends in an action, no autoaction is appended to it. For
       example, in this version:

	   $::RD_AUTOACTION = q
	     { $#item==1 ? $item[1] : "$item[0]_node"->new(@item[1..$#item]) };

	   parser = Parse::RecDescent->new(q{
	       expression: and_expr '&&' expression | and_expr
	       and_expr:   not_expr '&&' and_expr   | not_expr
	       not_expr:   '!' brack_expr	    | brack_expr
	       brack_expr: '(' expression ')'	    | identifier
	       identifier: /[a-z]+/i
		   { 'terminal_node'->new($item[1]) }
	   });

       each "identifier" match produces a "terminal_node" object, not an "identifier_node"
       object.

       A level 1 warning is issued each time an "autoaction" is added to some production.

   Autotrees
       A commonly needed autoaction is one that builds a parse-tree. It is moderately tricky to
       set up such an action (which must treat terminals differently from non-terminals), so
       Parse::RecDescent simplifies the process by providing the "<autotree>" directive.

       If this directive appears at the start of grammar, it causes Parse::RecDescent to insert
       autoactions at the end of any rule except those which already end in an action. The action
       inserted depends on whether the production is an intermediate rule (two or more items), or
       a terminal of the grammar (i.e. a single pattern or string item).

       So, for example, the following grammar:

	   <autotree>

	   file    : command(s)
	   command : get | set | vet
	   get : 'get' ident ';'
	   set : 'set' ident 'to' value ';'
	   vet : 'check' ident 'is' value ';'
	   ident   : /\w+/
	   value   : /\d+/

       is equivalent to:

	   file    : command(s)        { bless \%item, $item[0] }
	   command : get       { bless \%item, $item[0] }
	   | set	   { bless \%item, $item[0] }
	   | vet	   { bless \%item, $item[0] }
	   get : 'get' ident ';'   { bless \%item, $item[0] }
	   set : 'set' ident 'to' value ';'    { bless \%item, $item[0] }
	   vet : 'check' ident 'is' value ';'  { bless \%item, $item[0] }

	   ident   : /\w+/  { bless {__VALUE__=>$item[1]}, $item[0] }
	   value   : /\d+/  { bless {__VALUE__=>$item[1]}, $item[0] }

       Note that each node in the tree is blessed into a class of the same name as the rule
       itself. This makes it easy to build object-oriented processors for the parse-trees that
       the grammar produces. Note too that the last two rules produce special objects with the
       single attribute '__VALUE__'. This is because they consist solely of a single terminal.

       This autoaction-ed grammar would then produce a parse tree in a data structure like this:

	   {
	     file => {
	       command => {
		[ get => {
		   identifier => { __VALUE__ => 'a' },
		     },
		  set => {
		   identifier => { __VALUE__ => 'b' },
		   value      => { __VALUE__ => '7' },
		     },
		  vet => {
		   identifier => { __VALUE__ => 'b' },
		   value      => { __VALUE__ => '7' },
		     },
		 ],
		  },
	     }
	   }

       (except, of course, that each nested hash would also be blessed into the appropriate
       class).

       You can also specify a base class for the "<autotree>" directive.  The supplied prefix
       will be prepended to the rule names when creating tree nodes.  The following are
       equivalent:

	   <autotree:MyBase::Class>
	   <autotree:MyBase::Class::>

       And will produce a root node blessed into the "MyBase::Class::file" package in the example
       above.

   Autostubbing
       Normally, if a subrule appears in some production, but no rule of that name is ever
       defined in the grammar, the production which refers to the non-existent subrule fails
       immediately. This typically occurs as a result of misspellings, and is a sufficiently
       common occurance that a warning is generated for such situations.

       However, when prototyping a grammar it is sometimes useful to be able to use subrules
       before a proper specification of them is really possible.  For example, a grammar might
       include a section like:

	   function_call: identifier '(' arg(s?) ')'

	   identifier: /[a-z]\w*/i

       where the possible format of an argument is sufficiently complex that it is not worth
       specifying in full until the general function call syntax has been debugged. In this
       situation it is convenient to leave the real rule "arg" undefined and just slip in a
       placeholder (or "stub"):

	   arg: 'arg'

       so that the function call syntax can be tested with dummy input such as:

	   f0()
	   f1(arg)
	   f2(arg arg)
	   f3(arg arg arg)

       et cetera.

       Early in prototyping, many such "stubs" may be required, so "Parse::RecDescent" provides a
       means of automating their definition.  If the variable $::RD_AUTOSTUB is defined when a
       parser is built, a subrule reference to any non-existent rule (say, "subrule"), will cause
       a "stub" rule to be automatically defined in the generated parser.  If "$::RD_AUTOSTUB eq
       '1'" or is false, a stub rule of the form:

	   subrule: 'subrule'

       will be generated.  The special-case for a value of '1' is to allow the use of the perl -s
       with -RD_AUTOSTUB without generating "subrule: '1'" per below. If $::RD_AUTOSTUB is true,
       a stub rule of the form:

	   subrule: $::RD_AUTOSTUB

       will be generated.  $::RD_AUTOSTUB must contain a valid production item, no checking is
       performed.  No lazy evaluation of $::RD_AUTOSTUB is performed, it is evaluated at the time
       the Parser is generated.

       Hence, with $::RD_AUTOSTUB defined, it is possible to only partially specify a grammar,
       and then "fake" matches of the unspecified (sub)rules by just typing in their name, or a
       literal value that was assigned to $::RD_AUTOSTUB.

   Look-ahead
       If a subrule, token, or action is prefixed by "...", then it is treated as a "look-ahead"
       request. That means that the current production can (as usual) only succeed if the
       specified item is matched, but that the matching does not consume any of the text being
       parsed. This is very similar to the "/(?=...)/" look-ahead construct in Perl patterns.
       Thus, the rule:

	   inner_word: word ...word

       will match whatever the subrule "word" matches, provided that match is followed by some
       more text which subrule "word" would also match (although this second substring is not
       actually consumed by "inner_word")

       Likewise, a "...!" prefix, causes the following item to succeed (without consuming any
       text) if and only if it would normally fail. Hence, a rule such as:

	   identifier: ...!keyword ...!'_' /[A-Za-z_]\w*/

       matches a string of characters which satisfies the pattern "/[A-Za-z_]\w*/", but only if
       the same sequence of characters would not match either subrule "keyword" or the literal
       token '_'.

       Sequences of look-ahead prefixes accumulate, multiplying their positive and/or negative
       senses. Hence:

	   inner_word: word ...!......!word

       is exactly equivalent the the original example above (a warning is issued in cases like
       these, since they often indicate something left out, or misunderstood).

       Note that actions can also be treated as look-aheads. In such cases, the state of the
       parser text (in the local variable $text) after the look-ahead action is guaranteed to be
       identical to its state before the action, regardless of how it's changed within the action
       (unless you actually undefine $text, in which case you get the disaster you deserve :-).

   Directives
       Directives are special pre-defined actions which may be used to alter the behaviour of the
       parser. There are currently twenty-three directives: "<commit>", "<uncommit>", "<reject>",
       "<score>", "<autoscore>", "<skip>", "<resync>", "<error>", "<warn>", "<hint>",
       "<trace_build>", "<trace_parse>", "<nocheck>", "<rulevar>", "<matchrule>", "<leftop>",
       "<rightop>", "<defer>", "<nocheck>", "<perl_quotelike>", "<perl_codeblock>",
       "<perl_variable>", and "<token>".

       Committing and uncommitting
	   The "<commit>" and "<uncommit>" directives permit the recursive descent of the parse
	   tree to be pruned (or "cut") for efficiency.  Within a rule, a "<commit>" directive
	   instructs the rule to ignore subsequent productions if the current production fails.
	   For example:

	       command: 'find' <commit> filename
		  | 'open' <commit> filename
		  | 'move' filename filename

	   Clearly, if the leading token 'find' is matched in the first production but that
	   production fails for some other reason, then the remaining productions cannot possibly
	   match. The presence of the "<commit>" causes the "command" rule to fail immediately if
	   an invalid "find" command is found, and likewise if an invalid "open" command is
	   encountered.

	   It is also possible to revoke a previous commitment. For example:

	       if_statement: 'if' <commit> condition
		   'then' block <uncommit>
		   'else' block
		   | 'if' <commit> condition
		   'then' block

	   In this case, a failure to find an "else" block in the first production shouldn't
	   preclude trying the second production, but a failure to find a "condition" certainly
	   should.

	   As a special case, any production in which the first item is an "<uncommit>"
	   immediately revokes a preceding "<commit>" (even though the production would not
	   otherwise have been tried). For example, in the rule:

	       request: 'explain' expression
		      | 'explain' <commit> keyword
		      | 'save'
		      | 'quit'
		      | <uncommit> term '?'

	   if the text being matched was "explain?", and the first two productions failed, then
	   the "<commit>" in production two would cause productions three and four to be skipped,
	   but the leading "<uncommit>" in the production five would allow that production to
	   attempt a match.

	   Note in the preceding example, that the "<commit>" was only placed in production two.
	   If production one had been:

	       request: 'explain' <commit> expression

	   then production two would be (inappropriately) skipped if a leading "explain..." was
	   encountered.

	   Both "<commit>" and "<uncommit>" directives always succeed, and their value is always
	   1.

       Rejecting a production
	   The "<reject>" directive immediately causes the current production to fail (it is
	   exactly equivalent to, but more obvious than, the action "{undef}"). A "<reject>" is
	   useful when it is desirable to get the side effects of the actions in one production,
	   without prejudicing a match by some other production later in the rule. For example,
	   to insert tracing code into the parse:

	       complex_rule: { print "In complex rule...\n"; } <reject>

	       complex_rule: simple_rule '+' 'i' '*' simple_rule
		   | 'i' '*' simple_rule
		   | simple_rule

	   It is also possible to specify a conditional rejection, using the form
	   "<reject:condition>", which only rejects if the specified condition is true. This form
	   of rejection is exactly equivalent to the action "{(condition)?undef:1}>".  For
	   example:

	       command: save_command
		  | restore_command
		  | <reject: defined $::tolerant> { exit }
		  | <error: Unknown command. Ignored.>

	   A "<reject>" directive never succeeds (and hence has no associated value). A
	   conditional rejection may succeed (if its condition is not satisfied), in which case
	   its value is 1.

	   As an extra optimization, "Parse::RecDescent" ignores any production which begins with
	   an unconditional "<reject>" directive, since any such production can never
	   successfully match or have any useful side-effects. A level 1 warning is issued in all
	   such cases.

	   Note that productions beginning with conditional "<reject:...>" directives are never
	   "optimized away" in this manner, even if they are always guaranteed to fail (for
	   example: "<reject:1>")

	   Due to the way grammars are parsed, there is a minor restriction on the condition of a
	   conditional "<reject:...>": it cannot contain any raw '<' or '>' characters. For
	   example:

	       line: cmd <reject: $thiscolumn > max> data

	   results in an error when a parser is built from this grammar (since the grammar parser
	   has no way of knowing whether the first > is a "less than" or the end of the
	   "<reject:...>".

	   To overcome this problem, put the condition inside a do{} block:

	       line: cmd <reject: do{$thiscolumn > max}> data

	   Note that the same problem may occur in other directives that take arguments. The same
	   solution will work in all cases.

       Skipping between terminals
	   The "<skip>" directive enables the terminal prefix used in a production to be changed.
	   For example:

	       OneLiner: Command <skip:'[ \t]*'> Arg(s) /;/

	   causes only blanks and tabs to be skipped before terminals in the "Arg" subrule (and
	   any of its subrules>, and also before the final "/;/" terminal.  Once the production
	   is complete, the previous terminal prefix is reinstated. Note that this implies that
	   distinct productions of a rule must reset their terminal prefixes individually.

	   The "<skip>" directive evaluates to the previous terminal prefix, so it's easy to
	   reinstate a prefix later in a production:

	       Command: <skip:","> CSV(s) <skip:$item[1]> Modifier

	   The value specified after the colon is interpolated into a pattern, so all of the
	   following are equivalent (though their efficiency increases down the list):

	       <skip: "$colon|$comma">	 # ASSUMING THE VARS HOLD THE OBVIOUS VALUES

	       <skip: ':|,'>

	       <skip: q{[:,]}>

	       <skip: qr/[:,]/>

	   There is no way of directly setting the prefix for an entire rule, except as follows:

	       Rule: <skip: '[ \t]*'> Prod1
		   | <skip: '[ \t]*'> Prod2a Prod2b
		   | <skip: '[ \t]*'> Prod3

	   or, better:

	       Rule: <skip: '[ \t]*'>
	       (
		   Prod1
		 | Prod2a Prod2b
		 | Prod3
	       )

	   The skip pattern is passed down to subrules, so setting the skip for the top-level
	   rule as described above actually sets the prefix for the entire grammar (provided that
	   you only call the method corresponding to the top-level rule itself). Alternatively,
	   or if you have more than one top-level rule in your grammar, you can provide a global
	   "<skip>" directive prior to defining any rules in the grammar. These are the preferred
	   alternatives to setting $Parse::RecDescent::skip.

	   Additionally, using "<skip>" actually allows you to have a completely dynamic skipping
	   behaviour. For example:

	      Rule_with_dynamic_skip: <skip: $::skip_pattern> Rule

	   Then you can set $::skip_pattern before invoking "Rule_with_dynamic_skip" and have it
	   skip whatever you specified.

	   Note: Up to release 1.51 of Parse::RecDescent, an entirely different mechanism was
	   used for specifying terminal prefixes. The current method is not backwards-compatible
	   with that early approach. The current approach is stable and will not to change again.

       Resynchronization
	   The "<resync>" directive provides a visually distinctive means of consuming some of
	   the text being parsed, usually to skip an erroneous input. In its simplest form
	   "<resync>" simply consumes text up to and including the next newline ("\n") character,
	   succeeding only if the newline is found, in which case it causes its surrounding rule
	   to return zero on success.

	   In other words, a "<resync>" is exactly equivalent to the token "/[^\n]*\n/" followed
	   by the action "{ $return = 0 }" (except that productions beginning with a "<resync>"
	   are ignored when generating error messages). A typical use might be:

	       script : command(s)

	       command: save_command
		  | restore_command
		  | <resync> # TRY NEXT LINE, IF POSSIBLE

	   It is also possible to explicitly specify a resynchronization pattern, using the
	   "<resync:pattern>" variant. This version succeeds only if the specified pattern
	   matches (and consumes) the parsed text. In other words, "<resync:pattern>" is exactly
	   equivalent to the token "/pattern/" (followed by a "{ $return = 0 }" action). For
	   example, if commands were terminated by newlines or semi-colons:

	       command: save_command
		  | restore_command
		  | <resync:[^;\n]*[;\n]>

	   The value of a successfully matched "<resync>" directive (of either type) is the text
	   that it consumed. Note, however, that since the directive also sets $return, a
	   production consisting of a lone "<resync>" succeeds but returns the value zero (which
	   a calling rule may find useful to distinguish between "true" matches and "tolerant"
	   matches).  Remember that returning a zero value indicates that the rule succeeded
	   (since only an "undef" denotes failure within "Parse::RecDescent" parsers.

       Error handling
	   The "<error>" directive provides automatic or user-defined generation of error
	   messages during a parse. In its simplest form "<error>" prepares an error message
	   based on the mismatch between the last item expected and the text which cause it to
	   fail. For example, given the rule:

	       McCoy: curse ',' name ', I'm a doctor, not a' a_profession '!'
		| pronoun 'dead,' name '!'
		| <error>

	   the following strings would produce the following messages:

	   "Amen, Jim!"
		      ERROR (line 1): Invalid McCoy: Expected curse or pronoun
			  not found

	   "Dammit, Jim, I'm a doctor!"
		      ERROR (line 1): Invalid McCoy: Expected ", I'm a doctor, not a"
			  but found ", I'm a doctor!" instead

	   "He's dead,\n"
		      ERROR (line 2): Invalid McCoy: Expected name not found

	   "He's alive!"
		      ERROR (line 1): Invalid McCoy: Expected 'dead,' but found
			  "alive!" instead

	   "Dammit, Jim, I'm a doctor, not a pointy-eared Vulcan!"
		      ERROR (line 1): Invalid McCoy: Expected a profession but found
			  "pointy-eared Vulcan!" instead

	   Note that, when autogenerating error messages, all underscores in any rule name used
	   in a message are replaced by single spaces (for example "a_production" becomes "a
	   production"). Judicious choice of rule names can therefore considerably improve the
	   readability of automatic error messages (as well as the maintainability of the
	   original grammar).

	   If the automatically generated error is not sufficient, it is possible to provide an
	   explicit message as part of the error directive. For example:

	       Spock: "Fascinating ',' (name | 'Captain') '.'
		| "Highly illogical, doctor."
		| <error: He never said that!>

	   which would result in all failures to parse a "Spock" subrule printing the following
	   message:

		  ERROR (line <N>): Invalid Spock:  He never said that!

	   The error message is treated as a "qq{...}" string and interpolated when the error is
	   generated (not when the directive is specified!).  Hence:

	       <error: Mystical error near "$text">

	   would correctly insert the ambient text string which caused the error.

	   There are two other forms of error directive: "<error?>" and "<error?: msg>". These
	   behave just like "<error>" and "<error: msg>" respectively, except that they are only
	   triggered if the rule is "committed" at the time they are encountered. For example:

	       Scotty: "Ya kenna change the Laws of Phusics," <commit> name
		 | name <commit> ',' 'she's goanta blaw!'
		 | <error?>

	   will only generate an error for a string beginning with "Ya kenna change the Laws o'
	   Phusics," or a valid name, but which still fails to match the corresponding
	   production. That is, "$parser->Scotty("Aye, Cap'ain")" will fail silently (since
	   neither production will "commit" the rule on that input), whereas
	   "$parser->Scotty("Mr Spock, ah jest kenna do'ut!")"	will fail with the error message:

		  ERROR (line 1): Invalid Scotty: expected 'she's goanta blaw!'
		      but found 'I jest kenna do'ut!' instead.

	   since in that case the second production would commit after matching the leading name.

	   Note that to allow this behaviour, all "<error>" directives which are the first item
	   in a production automatically uncommit the rule just long enough to allow their
	   production to be attempted (that is, when their production fails, the commitment is
	   reinstated so that subsequent productions are skipped).

	   In order to permanently uncommit the rule before an error message, it is necessary to
	   put an explicit "<uncommit>" before the "<error>". For example:

	       line: 'Kirk:'  <commit> Kirk
	       | 'Spock:' <commit> Spock
	       | 'McCoy:' <commit> McCoy
	       | <uncommit> <error?> <reject>
	       | <resync>

	   Error messages generated by the various "<error...>" directives are not displayed
	   immediately. Instead, they are "queued" in a buffer and are only displayed once
	   parsing ultimately fails. Moreover, "<error...>" directives that cause one production
	   of a rule to fail are automatically removed from the message queue if another
	   production subsequently causes the entire rule to succeed.  This means that you can
	   put "<error...>" directives wherever useful diagnosis can be done, and only those
	   associated with actual parser failure will ever be displayed. Also see "GOTCHAS".

	   As a general rule, the most useful diagnostics are usually generated either at the
	   very lowest level within the grammar, or at the very highest. A good rule of thumb is
	   to identify those subrules which consist mainly (or entirely) of terminals, and then
	   put an "<error...>" directive at the end of any other rule which calls one or more of
	   those subrules.

	   There is one other situation in which the output of the various types of error
	   directive is suppressed; namely, when the rule containing them is being parsed as part
	   of a "look-ahead" (see "Look-ahead"). In this case, the error directive will still
	   cause the rule to fail, but will do so silently.

	   An unconditional "<error>" directive always fails (and hence has no associated value).
	   This means that encountering such a directive always causes the production containing
	   it to fail. Hence an "<error>" directive will inevitably be the last (useful) item of
	   a rule (a level 3 warning is issued if a production contains items after an
	   unconditional "<error>" directive).

	   An "<error?>" directive will succeed (that is: fail to fail :-), if the current rule
	   is uncommitted when the directive is encountered. In that case the directive's
	   associated value is zero. Hence, this type of error directive can be used before the
	   end of a production. For example:

	       command: 'do' <commit> something
		  | 'report' <commit> something
		  | <error?: Syntax error> <error: Unknown command>

	   Warning: The "<error?>" directive does not mean "always fail (but do so silently
	   unless committed)". It actually means "only fail (and report) if committed, otherwise
	   succeed". To achieve the "fail silently if uncommitted" semantics, it is necessary to
	   use:

	       rule: item <commit> item(s)
	       | <error?> <reject>  # FAIL SILENTLY UNLESS COMMITTED

	   However, because people seem to expect a lone "<error?>" directive to work like this:

	       rule: item <commit> item(s)
	       | <error?: Error message if committed>
	       | <error:  Error message if uncommitted>

	   Parse::RecDescent automatically appends a "<reject>" directive if the "<error?>"
	   directive is the only item in a production. A level 2 warning (see below) is issued
	   when this happens.

	   The level of error reporting during both parser construction and parsing is controlled
	   by the presence or absence of four global variables: $::RD_ERRORS, $::RD_WARN,
	   $::RD_HINT, and <$::RD_TRACE>. If $::RD_ERRORS is defined (and, by default, it is)
	   then fatal errors are reported.

	   Whenever $::RD_WARN is defined, certain non-fatal problems are also reported.

	   Warnings have an associated "level": 1, 2, or 3. The higher the level, the more
	   serious the warning. The value of the corresponding global variable ($::RD_WARN)
	   determines the lowest level of warning to be displayed. Hence, to see all warnings,
	   set $::RD_WARN to 1.  To see only the most serious warnings set $::RD_WARN to 3.  By
	   default $::RD_WARN is initialized to 3, ensuring that serious but non-fatal errors are
	   automatically reported.

	   There is also a grammar directive to turn on warnings from within the grammar:
	   "<warn>". It takes an optional argument, which specifies the warning level: "<warn:
	   2>".

	   See "DIAGNOSTICS" for a list of the varous error and warning messages that
	   Parse::RecDescent generates when these two variables are defined.

	   Defining any of the remaining variables (which are not defined by default) further
	   increases the amount of information reported.  Defining $::RD_HINT causes the parser
	   generator to offer more detailed analyses and hints on both errors and warnings.  Note
	   that setting $::RD_HINT at any point automagically sets $::RD_WARN to 1. There is also
	   a "<hint>" directive, which can be hard-coded into a grammar.

	   Defining $::RD_TRACE causes the parser generator and the parser to report their
	   progress to STDERR in excruciating detail (although, without hints unless $::RD_HINT
	   is separately defined). This detail can be moderated in only one respect: if
	   $::RD_TRACE has an integer value (N) greater than 1, only the N characters of the
	   "current parsing context" (that is, where in the input string we are at any point in
	   the parse) is reported at any time.

	   $::RD_TRACE is mainly useful for debugging a grammar that isn't behaving as you
	   expected it to. To this end, if $::RD_TRACE is defined when a parser is built, any
	   actual parser code which is generated is also written to a file named "RD_TRACE" in
	   the local directory.

	   There are two directives associated with the $::RD_TRACE variable.  If a grammar
	   contains a "<trace_build>" directive anywhere in its specification, $::RD_TRACE is
	   turned on during the parser construction phase.  If a grammar contains a
	   "<trace_parse>" directive anywhere in its specification, $::RD_TRACE is turned on
	   during any parse the parser performs.

	   Note that the four variables belong to the "main" package, which makes them easier to
	   refer to in the code controlling the parser, and also makes it easy to turn them into
	   command line flags ("-RD_ERRORS", "-RD_WARN", "-RD_HINT", "-RD_TRACE") under perl -s.

	   The corresponding directives are useful to "hardwire" the various debugging features
	   into a particular grammar (rather than having to set and reset external variables).

       Redirecting diagnostics
	   The diagnostics provided by the tracing mechanism always go to STDERR.  If you need
	   them to go elsewhere, localize and reopen STDERR prior to the parse.

	   For example:

	       {
		   local *STDERR = IO::File->new(">$filename") or die $!;

		   my $result = $parser->startrule($text);
	       }

       Consistency checks
	   Whenever a parser is build, Parse::RecDescent carries out a number of (potentially
	   expensive) consistency checks. These include: verifying that the grammar is not left-
	   recursive and that no rules have been left undefined.

	   These checks are important safeguards during development, but unnecessary overheads
	   when the grammar is stable and ready to be deployed. So Parse::RecDescent provides a
	   directive to disable them: "<nocheck>".

	   If a grammar contains a "<nocheck>" directive anywhere in its specification, the extra
	   compile-time checks are by-passed.

       Specifying local variables
	   It is occasionally convenient to specify variables which are local to a single rule.
	   This may be achieved by including a "<rulevar:...>" directive anywhere in the rule.
	   For example:

	       markup: <rulevar: $tag>

	       markup: tag {($tag=$item[1]) =~ s/^<|>$//g} body[$tag]

	   The example "<rulevar: $tag>" directive causes a "my" variable named $tag to be
	   declared at the start of the subroutine implementing the "markup" rule (that is,
	   before the first production, regardless of where in the rule it is specified).

	   Specifically, any directive of the form: "<rulevar:text>" causes a line of the form
	   "my text;" to be added at the beginning of the rule subroutine, immediately after the
	   definitions of the following local variables:

	       $thisparser $commit
	       $thisrule   @item
	       $thisline   @arg
	       $text   %arg

	   This means that the following "<rulevar>" directives work as expected:

	       <rulevar: $count = 0 >

	       <rulevar: $firstarg = $arg[0] || '' >

	       <rulevar: $myItems = \@item >

	       <rulevar: @context = ( $thisline, $text, @arg ) >

	       <rulevar: ($name,$age) = $arg{"name","age"} >

	   If a variable that is also visible to subrules is required, it needs to be "local"'d,
	   not "my"'d. "rulevar" defaults to "my", but if "local" is explicitly specified:

	       <rulevar: local $count = 0 >

	   then a "local"-ized variable is declared instead, and will be available within
	   subrules.

	   Note however that, because all such variables are "my" variables, their values do not
	   persist between match attempts on a given rule. To preserve values between match
	   attempts, values can be stored within the "local" member of the $thisrule object:

	       countedrule: { $thisrule->{"local"}{"count"}++ }
		    <reject>
		  | subrule1
		  | subrule2
		  | <reject: $thisrule->{"local"}{"count"} == 1>
		    subrule3

	   When matching a rule, each "<rulevar>" directive is matched as if it were an
	   unconditional "<reject>" directive (that is, it causes any production in which it
	   appears to immediately fail to match).  For this reason (and to improve readability)
	   it is usual to specify any "<rulevar>" directive in a separate production at the start
	   of the rule (this has the added advantage that it enables "Parse::RecDescent" to
	   optimize away such productions, just as it does for the "<reject>" directive).

       Dynamically matched rules
	   Because regexes and double-quoted strings are interpolated, it is relatively easy to
	   specify productions with "context sensitive" tokens. For example:

	       command:  keyword  body	"end $item[1]"

	   which ensures that a command block is bounded by a "<keyword>...end <same keyword>"
	   pair.

	   Building productions in which subrules are context sensitive is also possible, via the
	   "<matchrule:...>" directive. This directive behaves identically to a subrule item,
	   except that the rule which is invoked to match it is determined by the string
	   specified after the colon. For example, we could rewrite the "command" rule like this:

	       command:  keyword  <matchrule:body>  "end $item[1]"

	   Whatever appears after the colon in the directive is treated as an interpolated string
	   (that is, as if it appeared in "qq{...}" operator) and the value of that interpolated
	   string is the name of the subrule to be matched.

	   Of course, just putting a constant string like "body" in a "<matchrule:...>" directive
	   is of little interest or benefit.  The power of directive is seen when we use a string
	   that interpolates to something interesting. For example:

	       command:    keyword <matchrule:$item[1]_body> "end $item[1]"

	       keyword:    'while' | 'if' | 'function'

	       while_body: condition block

	       if_body:    condition block ('else' block)(?)

	       function_body:  arglist block

	   Now the "command" rule selects how to proceed on the basis of the keyword that is
	   found. It is as if "command" were declared:

	       command:    'while'    while_body    "end while"
		  |    'if'	  if_body   "end if"
		  |    'function' function_body "end function"

	   When a "<matchrule:...>" directive is used as a repeated subrule, the rule name
	   expression is "late-bound". That is, the name of the rule to be called is re-evaluated
	   each time a match attempt is made. Hence, the following grammar:

	       { $::species = 'dogs' }

	       pair:   'two' <matchrule:$::species>(s)

	       dogs:   /dogs/ { $::species = 'cats' }

	       cats:   /cats/

	   will match the string "two dogs cats cats" completely, whereas it will only match the
	   string "two dogs dogs dogs" up to the eighth letter. If the rule name were "early
	   bound" (that is, evaluated only the first time the directive is encountered in a
	   production), the reverse behaviour would be expected.

	   Note that the "matchrule" directive takes a string that is to be treated as a rule
	   name, not as a rule invocation. That is, it's like a Perl symbolic reference, not an
	   "eval". Just as you can say:

	       $subname = 'foo';

	       # and later...

	       &{$foo}(@args);

	   but not:

	       $subname = 'foo(@args)';

	       # and later...

	       &{$foo};

	   likewise you can say:

	       $rulename = 'foo';

	       # and in the grammar...

	       <matchrule:$rulename>[@args]

	   but not:

	       $rulename = 'foo[@args]';

	       # and in the grammar...

	       <matchrule:$rulename>

       Deferred actions
	   The "<defer:...>" directive is used to specify an action to be performed when (and
	   only if!) the current production ultimately succeeds.

	   Whenever a "<defer:...>" directive appears, the code it specifies is converted to a
	   closure (an anonymous subroutine reference) which is queued within the active parser
	   object. Note that, because the deferred code is converted to a closure, the values of
	   any "local" variable (such as $text, <@item>, etc.) are preserved until the deferred
	   code is actually executed.

	   If the parse ultimately succeeds and the production in which the "<defer:...>"
	   directive was evaluated formed part of the successful parse, then the deferred code is
	   executed immediately before the parse returns. If however the production which queued
	   a deferred action fails, or one of the higher-level rules which called that production
	   fails, then the deferred action is removed from the queue, and hence is never
	   executed.

	   For example, given the grammar:

	       sentence: noun trans noun
	       | noun intrans

	       noun:	 'the dog'
		   { print "$item[1]\t(noun)\n" }
	       |     'the meat'
		   { print "$item[1]\t(noun)\n" }

	       trans:	 'ate'
		   { print "$item[1]\t(transitive)\n" }

	       intrans:  'ate'
		   { print "$item[1]\t(intransitive)\n" }
		  |  'barked'
		   { print "$item[1]\t(intransitive)\n" }

	   then parsing the sentence "the dog ate" would produce the output:

	       the dog	(noun)
	       ate  (transitive)
	       the dog	(noun)
	       ate  (intransitive)

	   This is because, even though the first production of "sentence" ultimately fails, its
	   initial subrules "noun" and "trans" do match, and hence they execute their associated
	   actions.  Then the second production of "sentence" succeeds, causing the actions of
	   the subrules "noun" and "intrans" to be executed as well.

	   On the other hand, if the actions were replaced by "<defer:...>" directives:

	       sentence: noun trans noun
	       | noun intrans

	       noun:	 'the dog'
		   <defer: print "$item[1]\t(noun)\n" >
	       |     'the meat'
		   <defer: print "$item[1]\t(noun)\n" >

	       trans:	 'ate'
		   <defer: print "$item[1]\t(transitive)\n" >

	       intrans:  'ate'
		   <defer: print "$item[1]\t(intransitive)\n" >
		  |  'barked'
		   <defer: print "$item[1]\t(intransitive)\n" >

	   the output would be:

	       the dog	(noun)
	       ate  (intransitive)

	   since deferred actions are only executed if they were evaluated in a production which
	   ultimately contributes to the successful parse.

	   In this case, even though the first production of "sentence" caused the subrules
	   "noun" and "trans" to match, that production ultimately failed and so the deferred
	   actions queued by those subrules were subsequently disgarded. The second production
	   then succeeded, causing the entire parse to succeed, and so the deferred actions
	   queued by the (second) match of the "noun" subrule and the subsequent match of
	   "intrans" are preserved and eventually executed.

	   Deferred actions provide a means of improving the performance of a parser, by only
	   executing those actions which are part of the final parse-tree for the input data.

	   Alternatively, deferred actions can be viewed as a mechanism for building (and
	   executing) a customized subroutine corresponding to the given input data, much in the
	   same way that autoactions (see "Autoactions") can be used to build a customized data
	   structure for specific input.

	   Whether or not the action it specifies is ever executed, a "<defer:...>" directive
	   always succeeds, returning the number of deferred actions currently queued at that
	   point.

       Parsing Perl
	   Parse::RecDescent provides limited support for parsing subsets of Perl, namely: quote-
	   like operators, Perl variables, and complete code blocks.

	   The "<perl_quotelike>" directive can be used to parse any Perl quote-like operator: 'a
	   string', "m/a pattern/", "tr{ans}{lation}", etc.  It does this by calling
	   Text::Balanced::quotelike().

	   If a quote-like operator is found, a reference to an array of eight elements is
	   returned. Those elements are identical to the last eight elements returned by
	   Text::Balanced::extract_quotelike() in an array context, namely:

	   [0] the name of the quotelike operator -- 'q', 'qq', 'm', 's', 'tr' -- if the operator
	       was named; otherwise "undef",

	   [1] the left delimiter of the first block of the operation,

	   [2] the text of the first block of the operation (that is, the contents of a quote,
	       the regex of a match, or substitution or the target list of a translation),

	   [3] the right delimiter of the first block of the operation,

	   [4] the left delimiter of the second block of the operation if there is one (that is,
	       if it is a "s", "tr", or "y"); otherwise "undef",

	   [5] the text of the second block of the operation if there is one (that is, the
	       replacement of a substitution or the translation list of a translation); otherwise
	       "undef",

	   [6] the right delimiter of the second block of the operation (if any); otherwise
	       "undef",

	   [7] the trailing modifiers on the operation (if any); otherwise "undef".

	   If a quote-like expression is not found, the directive fails with the usual "undef"
	   value.

	   The "<perl_variable>" directive can be used to parse any Perl variable: $scalar,
	   @array, %hash, $ref->{field}[$index], etc.  It does this by calling
	   Text::Balanced::extract_variable().

	   If the directive matches text representing a valid Perl variable specification, it
	   returns that text. Otherwise it fails with the usual "undef" value.

	   The "<perl_codeblock>" directive can be used to parse curly-brace-delimited block of
	   Perl code, such as: { $a = 1; f() =~ m/pat/; }.  It does this by calling
	   Text::Balanced::extract_codeblock().

	   If the directive matches text representing a valid Perl code block, it returns that
	   text. Otherwise it fails with the usual "undef" value.

	   You can also tell it what kind of brackets to use as the outermost delimiters. For
	   example:

	       arglist: <perl_codeblock ()>

	   causes an arglist to match a perl code block whose outermost delimiters are "(...)"
	   (rather than the default "{...}").

       Constructing tokens
	   Eventually, Parse::RecDescent will be able to parse tokenized input, as well as
	   ordinary strings. In preparation for this joyous day, the "<token:...>" directive has
	   been provided.  This directive creates a token which will be suitable for input to a
	   Parse::RecDescent parser (when it eventually supports tokenized input).

	   The text of the token is the value of the immediately preceding item in the
	   production. A "<token:...>" directive always succeeds with a return value which is the
	   hash reference that is the new token. It also sets the return value for the production
	   to that hash ref.

	   The "<token:...>" directive makes it easy to build a Parse::RecDescent-compatible
	   lexer in Parse::RecDescent:

	       my $lexer = new Parse::RecDescent q
	       {
	       lex:    token(s)

	       token:  /a\b/	      <token:INDEF>
		    |  /the\b/	      <token:DEF>
		    |  /fly\b/	      <token:NOUN,VERB>
		    |  /[a-z]+/i { lc $item[1] }  <token:ALPHA>
		    |  <error: Unknown token>

	       };

	   which will eventually be able to be used with a regular Parse::RecDescent grammar:

	       my $parser = new Parse::RecDescent q
	       {
	       startrule: subrule1 subrule 2

	       # ETC...
	       };

	   either with a pre-lexing phase:

	       $parser->startrule( $lexer->lex($data) );

	   or with a lex-on-demand approach:

	       $parser->startrule( sub{$lexer->token(\$data)} );

	   But at present, only the "<token:...>" directive is actually implemented. The rest is
	   vapourware.

       Specifying operations
	   One of the commonest requirements when building a parser is to specify binary
	   operators. Unfortunately, in a normal grammar, the rules for such things are awkward:

	       disjunction:    conjunction ('or' conjunction)(s?)
		   { $return = [ $item[1], @{$item[2]} ] }

	       conjunction:    atom ('and' atom)(s?)
		   { $return = [ $item[1], @{$item[2]} ] }

	   or inefficient:

	       disjunction:    conjunction 'or' disjunction
		   { $return = [ $item[1], @{$item[2]} ] }
		  |    conjunction
		   { $return = [ $item[1] ] }

	       conjunction:    atom 'and' conjunction
		   { $return = [ $item[1], @{$item[2]} ] }
		  |    atom
		   { $return = [ $item[1] ] }

	   and either way is ugly and hard to get right.

	   The "<leftop:...>" and "<rightop:...>" directives provide an easier way of specifying
	   such operations. Using "<leftop:...>" the above examples become:

	       disjunction:    <leftop: conjunction 'or' conjunction>
	       conjunction:    <leftop: atom 'and' atom>

	   The "<leftop:...>" directive specifies a left-associative binary operator.  It is
	   specified around three other grammar elements (typically subrules or terminals), which
	   match the left operand, the operator itself, and the right operand respectively.

	   A "<leftop:...>" directive such as:

	       disjunction:    <leftop: conjunction 'or' conjunction>

	   is converted to the following:

	       disjunction:    ( conjunction ('or' conjunction)(s?)
		   { $return = [ $item[1], @{$item[2]} ] } )

	   In other words, a "<leftop:...>" directive matches the left operand followed by zero
	   or more repetitions of both the operator and the right operand. It then flattens the
	   matched items into an anonymous array which becomes the (single) value of the entire
	   "<leftop:...>" directive.

	   For example, an "<leftop:...>" directive such as:

	       output:	<leftop: ident '<<' expr >

	   when given a string such as:

	       cout << var << "str" << 3

	   would match, and $item[1] would be set to:

	       [ 'cout', 'var', '"str"', '3' ]

	   In other words:

	       output:	<leftop: ident '<<' expr >

	   is equivalent to a left-associative operator:

	       output:	ident	       { $return = [$item[1]]	}
		     |	ident '<<' expr        { $return = [@item[1,3]]     }
		     |	ident '<<' expr '<<' expr      { $return = [@item[1,3,5]]   }
		     |	ident '<<' expr '<<' expr '<<' expr    { $return = [@item[1,3,5,7]] }
		     #	...etc...

	   Similarly, the "<rightop:...>" directive takes a left operand, an operator, and a
	   right operand:

	       assign:	<rightop: var '=' expr >

	   and converts them to:

	       assign:	( (var '=' {$return=$item[1]})(s?) expr
		   { $return = [ @{$item[1]}, $item[2] ] } )

	   which is equivalent to a right-associative operator:

	       assign:	expr	   { $return = [$item[1]]	}
		     |	var '=' expr	   { $return = [@item[1,3]]	}
		     |	var '=' var '=' expr   { $return = [@item[1,3,5]]   }
		     |	var '=' var '=' var '=' expr   { $return = [@item[1,3,5,7]] }
		     #	...etc...

	   Note that for both the "<leftop:...>" and "<rightop:...>" directives, the directive
	   does not normally return the operator itself, just a list of the operands involved.
	   This is particularly handy for specifying lists:

	       list: '(' <leftop: list_item ',' list_item> ')'
		   { $return = $item[2] }

	   There is, however, a problem: sometimes the operator is itself significant.	For
	   example, in a Perl list a comma and a "=>" are both valid separators, but the "=>" has
	   additional stringification semantics.  Hence it's important to know which was used in
	   each case.

	   To solve this problem the "<leftop:...>" and "<rightop:...>" directives do return the
	   operator(s) as well, under two circumstances.  The first case is where the operator is
	   specified as a subrule. In that instance, whatever the operator matches is returned
	   (on the assumption that if the operator is important enough to have its own subrule,
	   then it's important enough to return).

	   The second case is where the operator is specified as a regular expression. In that
	   case, if the first bracketed subpattern of the regular expression matches, that
	   matching value is returned (this is analogous to the behaviour of the Perl "split"
	   function, except that only the first subpattern is returned).

	   In other words, given the input:

	       ( a=>1, b=>2 )

	   the specifications:

	       list:	  '('  <leftop: list_item separator list_item>	')'

	       separator: ',' | '=>'

	   or:

	       list:	  '('  <leftop: list_item /(,|=>)/ list_item>  ')'

	   cause the list separators to be interleaved with the operands in the anonymous array
	   in $item[2]:

	       [ 'a', '=>', '1', ',', 'b', '=>', '2' ]

	   But the following version:

	       list:	  '('  <leftop: list_item /,|=>/ list_item>  ')'

	   returns only the operators:

	       [ 'a', '1', 'b', '2' ]

	   Of course, none of the above specifications handle the case of an empty list, since
	   the "<leftop:...>" and "<rightop:...>" directives require at least a single right or
	   left operand to match. To specify that the operator can match "trivially", it's
	   necessary to add a "(s?)" qualifier to the directive:

	       list:	  '('  <leftop: list_item /(,|=>)/ list_item>(s?)  ')'

	   Note that in almost all the above examples, the first and third arguments of the
	   "<leftop:...>" directive were the same subrule. That is because "<leftop:...>"'s are
	   frequently used to specify "separated" lists of the same type of item. To make such
	   lists easier to specify, the following syntax:

	       list:   element(s /,/)

	   is exactly equivalent to:

	       list:   <leftop: element /,/ element>

	   Note that the separator must be specified as a raw pattern (i.e.  not a string or
	   subrule).

       Scored productions
	   By default, Parse::RecDescent grammar rules always accept the first production that
	   matches the input. But if two or more productions may potentially match the same
	   input, choosing the first that does so may not be optimal.

	   For example, if you were parsing the sentence "time flies like an arrow", you might
	   use a rule like this:

	       sentence: verb noun preposition article noun { [@item] }
	       | adjective noun verb article noun   { [@item] }
	       | noun verb preposition article noun { [@item] }

	   Each of these productions matches the sentence, but the third one is the most likely
	   interpretation. However, if the sentence had been "fruit flies like a banana", then
	   the second production is probably the right match.

	   To cater for such situtations, the "<score:...>" can be used.  The directive is
	   equivalent to an unconditional "<reject>", except that it allows you to specify a
	   "score" for the current production. If that score is numerically greater than the best
	   score of any preceding production, the current production is cached for later
	   consideration. If no later production matches, then the cached production is treated
	   as having matched, and the value of the item immediately before its "<score:...>"
	   directive is returned as the result.

	   In other words, by putting a "<score:...>" directive at the end of each production,
	   you can select which production matches using criteria other than specification order.
	   For example:

	       sentence: verb noun preposition article noun { [@item] } <score: sensible(@item)>
	       | adjective noun verb article noun   { [@item] } <score: sensible(@item)>
	       | noun verb preposition article noun { [@item] } <score: sensible(@item)>

	   Now, when each production reaches its respective "<score:...>" directive, the
	   subroutine "sensible" will be called to evaluate the matched items (somehow). Once all
	   productions have been tried, the one which "sensible" scored most highly will be the
	   one that is accepted as a match for the rule.

	   The variable $score always holds the current best score of any production, and the
	   variable $score_return holds the corresponding return value.

	   As another example, the following grammar matches lines that may be separated by
	   commas, colons, or semi-colons. This can be tricky if a colon-separated line also
	   contains commas, or vice versa. The grammar resolves the ambiguity by selecting the
	   rule that results in the fewest fields:

	       line: seplist[sep=>',']	<score: -@{$item[1]}>
	       | seplist[sep=>':']  <score: -@{$item[1]}>
	       | seplist[sep=>" "]  <score: -@{$item[1]}>

	       seplist: <skip:""> <leftop: /[^$arg{sep}]*/ "$arg{sep}" /[^$arg{sep}]*/>

	   Note the use of negation within the "<score:...>" directive to ensure that the seplist
	   with the most items gets the lowest score.

	   As the above examples indicate, it is often the case that all productions in a rule
	   use exactly the same "<score:...>" directive. It is tedious to have to repeat this
	   identical directive in every production, so Parse::RecDescent also provides the
	   "<autoscore:...>" directive.

	   If an "<autoscore:...>" directive appears in any production of a rule, the code it
	   specifies is used as the scoring code for every production of that rule, except
	   productions that already end with an explicit "<score:...>" directive. Thus the rules
	   above could be rewritten:

	       line: <autoscore: -@{$item[1]}>
	       line: seplist[sep=>',']
	       | seplist[sep=>':']
	       | seplist[sep=>" "]

	       sentence: <autoscore: sensible(@item)>
	       | verb noun preposition article noun { [@item] }
	       | adjective noun verb article noun   { [@item] }
	       | noun verb preposition article noun { [@item] }

	   Note that the "<autoscore:...>" directive itself acts as an unconditional "<reject>",
	   and (like the "<rulevar:...>" directive) is pruned at compile-time wherever possible.

       Dispensing with grammar checks
	   During the compilation phase of parser construction, Parse::RecDescent performs a
	   small number of checks on the grammar it's given. Specifically it checks that the
	   grammar is not left-recursive, that there are no "insatiable" constructs of the form:

	       rule: subrule(s) subrule

	   and that there are no rules missing (i.e. referred to, but never defined).

	   These checks are important during development, but can slow down parser construction
	   in stable code. So Parse::RecDescent provides the <nocheck> directive to turn them
	   off. The directive can only appear before the first rule definition, and switches off
	   checking throughout the rest of the current grammar.

	   Typically, this directive would be added when a parser has been thoroughly tested and
	   is ready for release.

   Subrule argument lists
       It is occasionally useful to pass data to a subrule which is being invoked. For example,
       consider the following grammar fragment:

	   classdecl: keyword decl

	   keyword:   'struct' | 'class';

	   decl:      # WHATEVER

       The "decl" rule might wish to know which of the two keywords was used (since it may affect
       some aspect of the way the subsequent declaration is interpreted). "Parse::RecDescent"
       allows the grammar designer to pass data into a rule, by placing that data in an argument
       list (that is, in square brackets) immediately after any subrule item in a production.
       Hence, we could pass the keyword to "decl" as follows:

	   classdecl: keyword decl[ $item[1] ]

	   keyword:   'struct' | 'class';

	   decl:      # WHATEVER

       The argument list can consist of any number (including zero!) of comma-separated Perl
       expressions. In other words, it looks exactly like a Perl anonymous array reference. For
       example, we could pass the keyword, the name of the surrounding rule, and the literal
       'keyword' to "decl" like so:

	   classdecl: keyword decl[$item[1],$item[0],'keyword']

	   keyword:   'struct' | 'class';

	   decl:      # WHATEVER

       Within the rule to which the data is passed ("decl" in the above examples) that data is
       available as the elements of a local variable @arg. Hence "decl" might report its
       intentions as follows:

	   classdecl: keyword decl[$item[1],$item[0],'keyword']

	   keyword:   'struct' | 'class';

	   decl:      { print "Declaring $arg[0] (a $arg[2])\n";
		print "(this rule called by $arg[1])" }

       Subrule argument lists can also be interpreted as hashes, simply by using the local
       variable %arg instead of @arg. Hence we could rewrite the previous example:

	   classdecl: keyword decl[keyword => $item[1],
	       caller  => $item[0],
	       type    => 'keyword']

	   keyword:   'struct' | 'class';

	   decl:      { print "Declaring $arg{keyword} (a $arg{type})\n";
		print "(this rule called by $arg{caller})" }

       Both @arg and %arg are always available, so the grammar designer may choose whichever
       convention (or combination of conventions) suits best.

       Subrule argument lists are also useful for creating "rule templates" (especially when used
       in conjunction with the "<matchrule:...>" directive). For example, the subrule:

	   list:     <matchrule:$arg{rule}> /$arg{sep}/ list[%arg]
	       { $return = [ $item[1], @{$item[3]} ] }
	   |	 <matchrule:$arg{rule}>
	       { $return = [ $item[1]] }

       is a handy template for the common problem of matching a separated list.  For example:

	   function: 'func' name '(' list[rule=>'param',sep=>';'] ')'

	   param:    list[rule=>'name',sep=>','] ':' typename

	   name:     /\w+/

	   typename: name

       When a subrule argument list is used with a repeated subrule, the argument list goes
       before the repetition specifier:

	   list:   /some|many/ thing[ $item[1] ](s)

       The argument list is "late bound". That is, it is re-evaluated for every repetition of the
       repeated subrule.  This means that each repeated attempt to match the subrule may be
       passed a completely different set of arguments if the value of the expression in the
       argument list changes between attempts. So, for example, the grammar:

	   { $::species = 'dogs' }

	   pair:   'two' animal[$::species](s)

	   animal: /$arg[0]/ { $::species = 'cats' }

       will match the string "two dogs cats cats" completely, whereas it will only match the
       string "two dogs dogs dogs" up to the eighth letter. If the value of the argument list
       were "early bound" (that is, evaluated only the first time a repeated subrule match is
       attempted), one would expect the matching behaviours to be reversed.

       Of course, it is possible to effectively "early bind" such argument lists by passing them
       a value which does not change on each repetition. For example:

	   { $::species = 'dogs' }

	   pair:   'two' { $::species } animal[$item[2]](s)

	   animal: /$arg[0]/ { $::species = 'cats' }

       Arguments can also be passed to the start rule, simply by appending them to the argument
       list with which the start rule is called (after the "line number" parameter). For example,
       given:

	   $parser = new Parse::RecDescent ( $grammar );

	   $parser->data($text, 1, "str", 2, \@arr);

	   #	     ^^^^^  ^  ^^^^^^^^^^^^^^^
	   #	   |	|     |
	   # TEXT TO BE PARSED	|     |
	   # STARTING LINE NUMBER     |
	   # ELEMENTS OF @arg WHICH IS PASSED TO RULE data

       then within the productions of the rule "data", the array @arg will contain "("str", 2,
       \@arr)".

   Alternations
       Alternations are implicit (unnamed) rules defined as part of a production. An alternation
       is defined as a series of '|'-separated productions inside a pair of round brackets. For
       example:

	   character: 'the' ( good | bad | ugly ) /dude/

       Every alternation implicitly defines a new subrule, whose automatically-generated name
       indicates its origin: "_alternation_<I>_of_production_<P>_of_rule<R>" for the appropriate
       values of <I>, <P>, and <R>. A call to this implicit subrule is then inserted in place of
       the brackets. Hence the above example is merely a convenient short-hand for:

	   character: 'the'
	      _alternation_1_of_production_1_of_rule_character
	      /dude/

	   _alternation_1_of_production_1_of_rule_character:
	      good | bad | ugly

       Since alternations are parsed by recursively calling the parser generator, any type(s) of
       item can appear in an alternation. For example:

	   character: 'the' ( 'high' "plains"  # Silent, with poncho
		| /no[- ]name/ # Silent, no poncho
		| vengeance_seeking    # Poncho-optional
		| <error>
		) drifter

       In this case, if an error occurred, the automatically generated message would be:

	   ERROR (line <N>): Invalid implicit subrule: Expected
		 'high' or /no[- ]name/ or generic,
		 but found "pacifist" instead

       Since every alternation actually has a name, it's even possible to extend or replace them:

	   parser->Replace(
	   "_alternation_1_of_production_1_of_rule_character:
	       'generic Eastwood'"
	       );

       More importantly, since alternations are a form of subrule, they can be given repetition
       specifiers:

	   character: 'the' ( good | bad | ugly )(?) /dude/

   Incremental Parsing
       "Parse::RecDescent" provides two methods - "Extend" and "Replace" - which can be used to
       alter the grammar matched by a parser. Both methods take the same argument as
       "Parse::RecDescent::new", namely a grammar specification string

       "Parse::RecDescent::Extend" interprets the grammar specification and adds any productions
       it finds to the end of the rules for which they are specified. For example:

	   $add = "name: 'Jimmy-Bob' | 'Bobby-Jim'\ndesc: colour /necks?/";
	   parser->Extend($add);

       adds two productions to the rule "name" (creating it if necessary) and one production to
       the rule "desc".

       "Parse::RecDescent::Replace" is identical, except that it first resets are rule specified
       in the additional grammar, removing any existing productions.  Hence after:

	   $add = "name: 'Jimmy-Bob' | 'Bobby-Jim'\ndesc: colour /necks?/";
	   parser->Replace($add);

       are are only valid "name"s and the one possible description.

       A more interesting use of the "Extend" and "Replace" methods is to call them inside the
       action of an executing parser. For example:

	   typedef: 'typedef' type_name identifier ';'
		  { $thisparser->Extend("type_name: '$item[3]'") }
	      | <error>

	   identifier: ...!type_name /[A-Za-z_]w*/

       which automatically prevents type names from being typedef'd, or:

	   command: 'map' key_name 'to' abort_key
		  { $thisparser->Replace("abort_key: '$item[2]'") }
	      | 'map' key_name 'to' key_name
		  { map_key($item[2],$item[4]) }
	      | abort_key
		  { exit if confirm("abort?") }

	   abort_key: 'q'

	   key_name: ...!abort_key /[A-Za-z]/

       which allows the user to change the abort key binding, but not to unbind it.

       The careful use of such constructs makes it possible to reconfigure a a running parser,
       eliminating the need for semantic feedback by providing syntactic feedback instead.
       However, as currently implemented, "Replace()" and "Extend()" have to regenerate and
       re-"eval" the entire parser whenever they are called. This makes them quite slow for large
       grammars.

       In such cases, the judicious use of an interpolated regex is likely to be far more
       efficient:

	   typedef: 'typedef' type_name/ identifier ';'
		  { $thisparser->{local}{type_name} .= "|$item[3]" }
	      | <error>

	   identifier: ...!type_name /[A-Za-z_]w*/

	   type_name: /$thisparser->{local}{type_name}/

   Precompiling parsers
       Normally Parse::RecDescent builds a parser from a grammar at run-time.  That approach
       simplifies the design and implementation of parsing code, but has the disadvantage that it
       slows the parsing process down - you have to wait for Parse::RecDescent to build the
       parser every time the program runs. Long or complex grammars can be particularly slow to
       build, leading to unacceptable delays at start-up.

       To overcome this, the module provides a way of "pre-building" a parser object and saving
       it in a separate module. That module can then be used to create clones of the original
       parser.

       A grammar may be precompiled using the "Precompile" class method.  For example, to
       precompile a grammar stored in the scalar $grammar, and produce a class named PreGrammar
       in a module file named PreGrammar.pm, you could use:

	   use Parse::RecDescent;

	   Parse::RecDescent->Precompile([$options_hashref], $grammar, "PreGrammar");

       The first required argument is the grammar string, the second is the name of the class to
       be built. The name of the module file is generated automatically by appending ".pm" to the
       last element of the class name. Thus

	   Parse::RecDescent->Precompile($grammar, "My::New::Parser");

       would produce a module file named Parser.pm.

       An optional hash reference may be supplied as the first argument to "Precompile".  This
       argument is currently EXPERIMENTAL, and may change in a future release of
       Parse::RecDescent.  The only supported option is currently "-standalone", see "Standalone
       Precompiled Parsers".

       It is somewhat tedious to have to write a small Perl program just to generate a
       precompiled grammar class, so Parse::RecDescent has some special magic that allows you to
       do the job directly from the command-line.

       If your grammar is specified in a file named grammar, you can generate a class named
       Yet::Another::Grammar like so:

	   > perl -MParse::RecDescent - grammar Yet::Another::Grammar

       This would produce a file named Grammar.pm containing the full definition of a class
       called Yet::Another::Grammar. Of course, to use that class, you would need to put the
       Grammar.pm file in a directory named Yet/Another, somewhere in your Perl include path.

       Having created the new class, it's very easy to use it to build a parser. You simply "use"
       the new module, and then call its "new" method to create a parser object. For example:

	   use Yet::Another::Grammar;
	   my $parser = Yet::Another::Grammar->new();

       The effect of these two lines is exactly the same as:

	   use Parse::RecDescent;

	   open GRAMMAR_FILE, "grammar" or die;
	   local $/;
	   my $grammar = <GRAMMAR_FILE>;

	   my $parser = Parse::RecDescent->new($grammar);

       only considerably faster.

       Note however that the parsers produced by either approach are exactly the same, so whilst
       precompilation has an effect on set-up speed, it has no effect on parsing speed.
       RecDescent 2.0 will address that problem.

       Standalone Precompiled Parsers

       Until version 1.967003 of Parse::RecDescent, parser modules built with "Precompile" were
       dependent on Parse::RecDescent.	Future Parse::RecDescent releases with different internal
       implementations would break pre-existing precompiled parsers.

       Version 1.967_005 added the ability for Parse::RecDescent to include itself in the
       resulting .pm file if you pass the boolean option "-standalone" to "Precompile":

	   Parse::RecDescent->Precompile({ -standalone = 1, },
	       $grammar, "My::New::Parser");

       Parse::RecDescent is included as Parse::RecDescent::_Runtime in order to avoid conflicts
       between an installed version of Parse::RecDescent and a precompiled, standalone parser
       made with another version of Parse::RecDescent.	This renaming is experimental, and is
       subject to change in future versions.

       Precompiled parsers remain dependent on Parse::RecDescent by default, as this feature is
       still considered experimental.  In the future, standalone parsers will become the default.

GOTCHAS
       This section describes common mistakes that grammar writers seem to make on a regular
       basis.

   1. Expecting an error to always invalidate a parse
       A common mistake when using error messages is to write the grammar like this:

	   file: line(s)

	   line: line_type_1
	   | line_type_2
	   | line_type_3
	   | <error>

       The expectation seems to be that any line that is not of type 1, 2 or 3 will invoke the
       "<error>" directive and thereby cause the parse to fail.

       Unfortunately, that only happens if the error occurs in the very first line.  The first
       rule states that a "file" is matched by one or more lines, so if even a single line
       succeeds, the first rule is completely satisfied and the parse as a whole succeeds. That
       means that any error messages generated by subsequent failures in the "line" rule are
       quietly ignored.

       Typically what's really needed is this:

	   file: line(s) eofile    { $return = $item[1] }

	   line: line_type_1
	   | line_type_2
	   | line_type_3
	   | <error>

	   eofile: /^\Z/

       The addition of the "eofile" subrule  to the first production means that a file only
       matches a series of successful "line" matches that consume the complete input text. If any
       input text remains after the lines are matched, there must have been an error in the last
       "line". In that case the "eofile" rule will fail, causing the entire "file" rule to fail
       too.

       Note too that "eofile" must match "/^\Z/" (end-of-text), not "/^\cZ/" or "/^\cD/" (end-of-
       file).

       And don't forget the action at the end of the production. If you just write:

	   file: line(s) eofile

       then the value returned by the "file" rule will be the value of its last item: "eofile".
       Since "eofile" always returns an empty string on success, that will cause the "file" rule
       to return that empty string. Apart from returning the wrong value, returning an empty
       string will trip up code such as:

	   $parser->file($filetext) || die;

       (since "" is false).

       Remember that Parse::RecDescent returns undef on failure, so the only safe test for
       failure is:

	   defined($parser->file($filetext)) || die;

   2. Using a "return" in an action
       An action is like a "do" block inside the subroutine implementing the surrounding rule. So
       if you put a "return" statement in an action:

	   range: '(' start '..' end )'
	       { return $item{end} }
	      /\s+/

       that subroutine will immediately return, without checking the rest of the items in the
       current production (e.g. the "/\s+/") and without setting up the necessary data structures
       to tell the parser that the rule has succeeded.

       The correct way to set a return value in an action is to set the $return variable:

	   range: '(' start '..' end )'
		       { $return = $item{end} }
		  /\s+/

   2. Setting $Parse::RecDescent::skip at parse time
       If you want to change the default skipping behaviour (see "Terminal Separators" and the
       "<skip:...>" directive) by setting $Parse::RecDescent::skip you have to remember to set
       this variable before creating the grammar object.

       For example, you might want to skip all Perl-like comments with this regular expression:

	  my $skip_spaces_and_comments = qr/
		(?mxs:
		   \s+	       # either spaces
		   | \# .*?$   # or a dash and whatever up to the end of line
		)*	       # repeated at will (in whatever order)
	     /;

       And then:

	  my $parser1 = Parse::RecDescent->new($grammar);

	  $Parse::RecDescent::skip = $skip_spaces_and_comments;

	  my $parser2 = Parse::RecDescent->new($grammar);

	  $parser1->parse($text); # this does not cope with comments
	  $parser2->parse($text); # this skips comments correctly

       The two parsers behave differently, because any skipping behaviour specified via
       $Parse::RecDescent::skip is hard-coded when the grammar object is built, not at parse
       time.

DIAGNOSTICS
       Diagnostics are intended to be self-explanatory (particularly if you use -RD_HINT (under
       perl -s) or define $::RD_HINT inside the program).

       "Parse::RecDescent" currently diagnoses the following:

       o   Invalid regular expressions used as pattern terminals (fatal error).

       o   Invalid Perl code in code blocks (fatal error).

       o   Lookahead used in the wrong place or in a nonsensical way (fatal error).

       o   "Obvious" cases of left-recursion (fatal error).

       o   Missing or extra components in a "<leftop>" or "<rightop>" directive.

       o   Unrecognisable components in the grammar specification (fatal error).

       o   "Orphaned" rule components specified before the first rule (fatal error) or after an
	   "<error>" directive (level 3 warning).

       o   Missing rule definitions (this only generates a level 3 warning, since you may be
	   providing them later via "Parse::RecDescent::Extend()").

       o   Instances where greedy repetition behaviour will almost certainly cause the failure of
	   a production (a level 3 warning - see "ON-GOING ISSUES AND FUTURE DIRECTIONS" below).

       o   Attempts to define rules named 'Replace' or 'Extend', which cannot be called directly
	   through the parser object because of the predefined meaning of
	   "Parse::RecDescent::Replace" and "Parse::RecDescent::Extend". (Only a level 2 warning
	   is generated, since such rules can still be used as subrules).

       o   Productions which consist of a single "<error?>" directive, and which therefore may
	   succeed unexpectedly (a level 2 warning, since this might conceivably be the desired
	   effect).

       o   Multiple consecutive lookahead specifiers (a level 1 warning only, since their effects
	   simply accumulate).

       o   Productions which start with a "<reject>" or "<rulevar:...>" directive. Such
	   productions are optimized away (a level 1 warning).

       o   Rules which are autogenerated under $::AUTOSTUB (a level 1 warning).

AUTHOR
       Damian Conway (damian@conway.org) Jeremy T. Braun (JTBRAUN@CPAN.org) [current maintainer]

BUGS AND IRRITATIONS
       There are undoubtedly serious bugs lurking somewhere in this much code :-) Bug reports,
       test cases and other feedback are most welcome.

       Ongoing annoyances include:

       o   There's no support for parsing directly from an input stream.  If and when the Perl
	   Gods give us regular expressions on streams, this should be trivial (ahem!) to
	   implement.

       o   The parser generator can get confused if actions aren't properly closed or if they
	   contain particularly nasty Perl syntax errors (especially unmatched curly brackets).

       o   The generator only detects the most obvious form of left recursion (potential
	   recursion on the first subrule in a rule). More subtle forms of left recursion (for
	   example, through the second item in a rule after a "zero" match of a preceding "zero-
	   or-more" repetition, or after a match of a subrule with an empty production) are not
	   found.

       o   Instead of complaining about left-recursion, the generator should silently transform
	   the grammar to remove it. Don't expect this feature any time soon as it would require
	   a more sophisticated approach to parser generation than is currently used.

       o   The generated parsers don't always run as fast as might be wished.

       o   The meta-parser should be bootstrapped using "Parse::RecDescent" :-)

ON-GOING ISSUES AND FUTURE DIRECTIONS
       1.  Repetitions are "incorrigibly greedy" in that they will eat everything they can and
	   won't backtrack if that behaviour causes a production to fail needlessly.  So, for
	   example:

	       rule: subrule(s) subrule

	   will never succeed, because the repetition will eat all the subrules it finds, leaving
	   none to match the second item. Such constructions are relatively rare (and
	   "Parse::RecDescent::new" generates a warning whenever they occur) so this may not be a
	   problem, especially since the insatiable behaviour can be overcome "manually" by
	   writing:

	       rule: penultimate_subrule(s) subrule

	       penultimate_subrule: subrule ...subrule

	   The issue is that this construction is exactly twice as expensive as the original,
	   whereas backtracking would add only 1/N to the cost (for matching N repetitions of
	   "subrule"). I would welcome feedback on the need for backtracking; particularly on
	   cases where the lack of it makes parsing performance problematical.

       2.  Having opened that can of worms, it's also necessary to consider whether there is a
	   need for non-greedy repetition specifiers. Again, it's possible (at some cost) to
	   manually provide the required functionality:

	       rule: nongreedy_subrule(s) othersubrule

	       nongreedy_subrule: subrule ...!othersubrule

	   Overall, the issue is whether the benefit of this extra functionality outweighs the
	   drawbacks of further complicating the (currently minimalist) grammar specification
	   syntax, and (worse) introducing more overhead into the generated parsers.

       3.  An "<autocommit>" directive would be nice. That is, it would be useful to be able to
	   say:

	       command: <autocommit>
	       command: 'find' name
		  | 'find' address
		  | 'do' command 'at' time 'if' condition
		  | 'do' command 'at' time
		  | 'do' command
		  | unusual_command

	   and have the generator work out that this should be "pruned" thus:

	       command: 'find' name
		  | 'find' <commit> address
		  | 'do' <commit> command <uncommit>
		   'at' time
		   'if' <commit> condition
		  | 'do' <commit> command <uncommit>
		   'at' <commit> time
		  | 'do' <commit> command
		  | unusual_command

	   There are several issues here. Firstly, should the "<autocommit>" automatically
	   install an "<uncommit>" at the start of the last production (on the grounds that the
	   "command" rule doesn't know whether an "unusual_command" might start with "find" or
	   "do") or should the "unusual_command" subgraph be analysed (to see if it might be
	   viable after a "find" or "do")?

	   The second issue is how regular expressions should be treated. The simplest approach
	   would be simply to uncommit before them (on the grounds that they might match). Better
	   efficiency would be obtained by analyzing all preceding literal tokens to determine
	   whether the pattern would match them.

	   Overall, the issues are: can such automated "pruning" approach a hand-tuned version
	   sufficiently closely to warrant the extra set-up expense, and (more importantly) is
	   the problem important enough to even warrant the non-trivial effort of building an
	   automated solution?

SUPPORT
   Source Code Repository
       <http://github.com/jtbraun/Parse-RecDescent>

   Mailing List
       Visit <http://www.perlfoundation.org/perl5/index.cgi?parse_recdescent> to sign up for the
       mailing list.

       <http://www.PerlMonks.org> is also a good place to ask questions. Previous posts about
       Parse::RecDescent can typically be found with this search:
       <http://perlmonks.org/index.pl?node=recdescent>.

   FAQ
       Visit Parse::RecDescent::FAQ for answers to frequently (and not so frequently) asked
       questions about Parse::RecDescent.

   View/Report Bugs
       To view the current bug list or report a new issue visit
       <https://rt.cpan.org/Public/Dist/Display.html?Name=Parse-RecDescent>.

SEE ALSO
       Regexp::Grammars provides Parse::RecDescent style parsing using native Perl 5.10 regular
       expressions.

LICENCE AND COPYRIGHT
       Copyright (c) 1997-2007, Damian Conway "<DCONWAY@CPAN.org>". All rights reserved.

       This module is free software; you can redistribute it and/or modify it under the same
       terms as Perl itself. See perlartistic.

DISCLAIMER OF WARRANTY
       BECAUSE THIS SOFTWARE IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE SOFTWARE,
       TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE
       COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE SOFTWARE "AS IS" WITHOUT WARRANTY OF
       ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
       WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO
       THE QUALITY AND PERFORMANCE OF THE SOFTWARE IS WITH YOU. SHOULD THE SOFTWARE PROVE
       DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR, OR CORRECTION.

       IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT
       HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE SOFTWARE AS PERMITTED BY
       THE ABOVE LICENCE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL,
       INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE
       SOFTWARE (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR
       LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE SOFTWARE TO OPERATE WITH ANY
       OTHER SOFTWARE), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF
       SUCH DAMAGES.

perl v5.16.3				    2014-06-09			     Parse::RecDescent(3)
Unix & Linux Commands & Man Pages : ©2000 - 2018 Unix and Linux Forums


All times are GMT -4. The time now is 10:02 PM.