parse::eyapp::eyappintro5.18(3) [mojave man page]

Parse::Eyapp::eyappintro(3)				User Contributed Perl Documentation			       Parse::Eyapp::eyappintro(3)

NAME

       Parse::Eyapp::eyappintro - An introduction to Parse::Eyapp

SYNOPSIS

	 # File 'calc.eyp': translates infix expressions to postfix
	 # Compile it with:  eyapp -o calc.pl -C Postfix.eyp
	 # Execution:	     ./calc.pl -c 'a = 2*3+b'
	 %token NUM = /([0-9]+(?:.[0-9]+)?)/
	 %token VAR = /([A-Za-z][A-Za-z0-9_]*)/

	 %right  '='
	 %left	 '-' '+'
	 %left	 '*' '/'
	 %left	 NEG

	 %defaultaction { "$left $right $op"; }

	 %%
	 line: $exp  { print "$exp
" }
	 ;

	 exp:	     $NUM  { $NUM }
		 |   $VAR  { $VAR }
		 |   VAR.left '='.op exp.right
		 |   exp.left '+'.op exp.right
		 |   exp.left '-'.op exp.right
		 |   exp.left '*'.op exp.right
		 |   exp.left '/'.op exp.right
		 |   '-' $exp %prec NEG { "$exp NEG" }
		 |   '(' $exp ')' { $exp }
	 ;

	 %%

INTRODUCTION TO PARSING WITH Parse::Eyapp
   Introduction
       Parsing is the activity of producing a syntax tree from an input stream. The program example in the synopsis section shows an example of a
       translation scheme.  It translates infix arithmetic expressions like

	  a = 2 * 3 + 4 * b

       into postfix expressions like

	  a 2 3 * 4 b * + =

       The program contains a context free eyapp grammar defining the language of arithmetic infix expressions. A context free grammar is a
       mathematical device to define languages. To better see the grammar for the example above we have to eliminate the semantic actions (We call
       semantic actions to the sections delimited by curly brackets containing Perl code).  This can be done calling "eyapp" with options "-vc":

	 $ eyapp -vc Postfix.eyp
	 %token NUM =/([0-9]+(?:.[0-9]+)?)/
	 %token VAR =/([A-Za-z][A-Za-z0-9_]*)/
	 %right '='
	 %left '-' '+'
	 %left '*' '/'
	 %left NEG

	 %%

	 line:
	       exp
	 ;
	 exp:
	       NUM
	     | VAR
	     | VAR '=' exp
	     | exp '+' exp
	     | exp '-' exp
	     | exp '*' exp
	     | exp '/' exp
	     | '-' exp %prec NEG
	     | '(' exp ')'
	 ;

	 %%

       A grammar generates a language. A grammar is defined by a set of production rules. A production rule has two components: a left hand side
       which is a syntactic variable or non terminal and a right hand side which is a phrase made of syntactic variables and terminals. The left
       hand side (lhs) and the right hand side (rhs) are usually separated by an arrow like in:

					   exp -> VAR = exp

       A note: when you see a production rule like:

			       line: exp <+ ';'>

       is not really a production rule but an abbreviation for two productions. It stands for:

			       line : exp
				    | line ';' exp
			       ;

       A terminal or token never appears on the left hand side of a production rule.

   Ambiguity
       The phrases of the language are those obtained successively applying the production rules of the grammar until no more rules can be
       applied.  The successive substitutions must start from the "start" symbol of the grammar ("line" in the example). Such legal sequence of
       substitutions is known as a derivation. The following is an example of a legal derivation (the big arrow "=>" is read derives):

	 line => exp => VAR = exp => VAR = exp + exp => VAR = exp + NUM => VAR = VAR + NUM

       thus the phrase "VAR = VAR + NUM" belongs to the language generated by the former grammar.  A derivation like this can be seen as a tree.
       For instance, the former derivation is equivalent (has the same information) than the following tree:

				    +----+
				    |line|
				    +----+
				       |
				     +---+
				     |exp|
				     +---+
			   .-----.-----^-----.
			 +---+ +---+	    +---+
			 |VAR| | = |	    |exp|
			 +---+ +---+	    +---+
					.-----+-----.
				      +---+ +---+ +---+
				      |exp| | + | |exp|
				      +---+ +---+ +---+
					|	    |
				      +---+	  +---+
				      |VAR|	  |NUM|
				      +---+	  +---+

       which can be written more succinctly:

			       line(exp(VAR, '=', exp(exp(VAR), '+',  exp(NUM))))

       or even more briefly:

					     VAR = (VAR + NUM)

       Such a tree is called a syntax tree for the input "VAR = VAR + NUM".  A grammar is said to be ambiguous if there are phrases in the
       generated language that have more than one syntax tree. The grammar in the synopsis example is ambiguous. Here is an alternative tree for
       the same phrase "VAR = VAR + NUM":

					   +----+
					   |line|
					   +----+
					      |
					    +---+
					    |exp|
					    +---+
				   .----------^-----.-----.
				 +---+		  +---+  +---+
				 |exp|		  | + |  |exp|
				 +---+		  +---+  +---+
			     .-----+-----.		   |
			   +---+ +---+ +---+		 +---+
			   |VAR| | = | |exp|		 |NUM|
			   +---+ +---+ +---+		 +---+
					 |
				       +---+
				       |VAR|
				       +---+

       or

			       line(exp(exp(VAR, '=', exp(VAR)), '+', exp(NUM)))

       or

					    (VAR = VAR) + NUM

   Semantic Actions and Attributes
       "Parse::Eyapp" analyzes your grammar and produce a LALR parser.	Actually the SYNOPSIS example is more than a context free grammar: is a
       translation scheme. A translation scheme scheme is a context free grammar where the right hand sides of the productions have been augmented
       with semantic actions (i.e. with chunks of Perl code):

				       A -> alpha { action(@_) } beta

       The analyzer generated by Eyapp executes "{ action(@_) }" after all the semantic actions associated with "alpha" have been executed and
       before the execution of any of the semantic actions associated with "beta".

       In a translation scheme each symbol occurrence has an associated attribute.  The embedded actions modify the attributes associated with the
       symbols of the grammar:

			       A -> alpha { action(@_) } beta

       Each symbol on the right hand side of a production rule has an associated scalar attribute.  In "eyapp" the attributes of the symbols to
       the left of "action" are passed as arguments to "action" (in the example, those of "alpha").  These arguments are preceded by a reference
       to the syntax analyzer object.  Therefore, you can access to the attributes associated with the first, second, etc. symbols in the right
       hand side using the notation:

		      $_[1], $_[2], ...

       However it is better to refer to the attributes by names. This is the purpose of the dot and dollar notations as in:

	 exp:	     $NUM  { $NUM }
		 |   $VAR  { $VAR }
		 |   VAR.left '='.op exp.right
		 |   exp.left '+'.op exp.right
		 |   exp.left '-'.op exp.right
		 |   exp.left '*'.op exp.right
		 |   exp.left '/'.op exp.right
		 |   '-' $exp %prec NEG { "$exp NEG" }
		 |   '(' $exp ')' { $exp }
	 ;

       By prefixing the symbol "NUM" by a "$" we can refer to the associated attribute inside the semantic action as $NUM:

	 exp:	     $NUM  { $NUM }

       By postfixing the two appearances of "expr" with ".left" and ".right" and the appearance of '+' with ".op" we can refer to the associates
       attributes as $left, $right and $op instead of $_[1], $_[3] and $_[2]:

	 %defaultaction { "$left $right $op"; }

       There is no way inside an ordinary "eyapp" program for an intermediate "action" to access the attributes of the symbols on its right, i.e.
       those associated with the symbols of "beta". This restriction is lifted if you  use the %metatree directive to build a full translation
       scheme.	See Parse::Eyapp::translationschemestut to know more about full translation schemes.

       Actions on the right hand side counts as symbols and so they can be referenced by its positional argument in later actions in the same
       production rule. For intermediate actions, the value returned by the "action" is the attribute associated with such action. For an action
       at the end of the rule:

			       A -> alpha { lastaction(@_) }

       the returned value constitutes the attribute of the left hand side of the rule (the attribute of "A" in this case). The action at the end
       of the right hand side is called the action associated with the production rule. When no explicit action has been associated with a
       production rule the default action applies. In "Parse::Eyapp" the programmer can define what is the default action through the
       %defaultaction directive:

			       %defaultaction { "$left $right $op"; }

       Actually, intermediate actions are implemented via a trick. When "eyapp" sees an intermediate action like:

			       A -> alpha { action(@_) } beta

       it creates a new auxiliary syntactic variable "Temp":

			     Temp -> /* empty */ { action(@_) }

   Solving Ambiguities via Precedence and Associativity Declarations
       Notice that ambiguous grammars produce ambiguous translation schemes: since a phrase may have two syntactic trees it will be more than one
       tree-traversing and consequently more than one way to execute the embedded semantic actions. Certainly different execution orders will
       usually produce different results. Thus, syntactic ambiguities translate onto semantic ambiguities. That is why it is so important to
       resolve all the ambiguities and conflicts that may arise in our grammars. This is the function of the %left and %right declarations on the
       header section:

	     %right  '='     # Lowest precedence
	     %left   '-' '+' # + and - have more precedence than = Disambiguate a-b-c as (a-b)-c
	     %left   '*' '/' # * and / have more precedence than + Disambiguate a/b/c as (a/b)/c
	     %left   NEG     # Disambiguate -a-b as (-a)-b and not as -(a-b)

       Priority can be assigned to tokens by using the %left and %right declarations. Tokens in lines below have more precedence than tokens in
       line above.  The idea  behind this notation is this: Any ambiguity can be seen as a parenthesizing problem. You can parenthesize left (in
       the jargon this is called reduce) or parenthesize right (in the jargon, shift).	Recall the main points of yacc-like parsers related to
       priorities:

       o The directives

		     %left
		     %right
		     %nonassoc

	 can be used in the head section to declare the priority of a token

       o The later the declaration line the higher the priority

       o Tokens in the same line have the same priority. Ties will be solved using the token associativity (whether they were declared %left or
	 %right)

       o The precedence of a production rule (right hand side) is the precedence of the last token in the right hand side

       o If the parser emits a warning announcing a shift-reduce conflict or a reduce-reduce conflict in your grammar, it likely means that your
	 grammar is ambiguous or not LALR.  In such case, recompile the grammar with "eyapp -v" and carefully study the ".output" file generated.
	 Detect which token and which rules are involved in the conflict.

       o In a shift-reduce conflict the default action is to shift (i.e. associate right). This action can be changed if the production and the
	 token involved have explicit priorities

       o Most of the time the presence of a reduce-reduce conflict means that your grammar is ambiguous. Rewrite your grammar. Alternatively, use
	 the %conflict and %PREC directives (see example "debuggintut/pascalenumeratedvsrangesolvedviadyn.eyp").  The default action is to reduce
	 by the first production.

       o If the precedence of the production rule is higher the shift-reduce conflict is solved in favor of the reduction (i.e. associate left)

       o If the precedence of the token is higher the shift-reduce conflict is solved in favor of the shift (i.e. associate right).

       o If the precedence of the token is the same than the precedence of the rule, and is left the shift-reduce conflict is solved in favor of
	 the reduction (i.e. associate left)

       o If the precedence of the token is the same than the precedence of the rule, and is right the shift-reduce conflict is solved in favor of
	 the shift

       o If the precedence of the token is the same than the precedence of the rule, and is nonassoc the presence of a shift-reduce conflict means
	 an error.  This is used to describe operators, like the operator ".LT." in FORTRAN, that may not associate with themselves. That is,
	 because

				      A .LT. B .LT. C

	 is invalid in FORTRAN, ".LT." would be described with the keyword %nonassoc in eyapp.

       o The default precedence of a production can be changed using the "%prec TOKEN" directive.  Now the rule has the precedence and
	 associativity of the specified "TOKEN".

   Examples
       o By giving token '+' more precedence than token '=' we solve the ambiguity in "VAR = VAR + NUM" in favor of " VAR = (VAR + NUM)". The
	 conflict occurs between the productions

				     exp -> exp . '+' exp
				     exp -> VAR '=' exp .

	 Where the dot means:

	 If I have seen "VAR '=' exp" and I am in the presence of a token '+' I can associate left, i.e. reduce "VAR '=' exp" to "exp" or to
	 associate right, that is, to shift to the right to reduce "exp '+' exp" to "exp" later.

       o How it works when two tokens are declared in the same line?  Consider the phrase "NUM - NUM - NUM". It will be interpreted as "(NUM -
	 NUM) - NUM" if the token '-' is declared "%left '-'" and will be interpreted as "NUM - (NUM - NUM)"  if the token '-' is declared "%right
	 '-'".	By saying '-' is left we are saying we	prefer between the two trees in dispute the one that deepens to the left:

						+---+
						|exp|
						+---+
					 .--------^--.-----.
				       +---+	   +---+ +---+
				       |exp|	   | - | |exp|
				       +---+	   +---+ +---+
				   .-----+-----.	   |
				 +---+ +---+ +---+	 +---+
				 |exp| | - | |exp|	 |NUM|
				 +---+ +---+ +---+	 +---+
				   |	       |
				 +---+	     +---+
				 |NUM|	     |NUM|
				 +---+	     +---+

	 By saying '-' is right we are saying we  prefer between the two trees in dispute the one that deepens to the right:

						 +---+
						 |exp|
						 +---+
				       .-----.-----^-----.
				     +---+ +---+       +---+
				     |exp| |MIN|       |exp|
				     +---+ +---+       +---+
				       |	   .-----+-----.
				     +---+	 +---+ +---+ +---+
				     |NUM|	 |exp| |MIN| |exp|
				     +---+	 +---+ +---+ +---+
						    |		|
						  +---+       +---+
						  |NUM|       |NUM|
						  +---+       +---+

	 Since priority means earlier evaluation and the evaluation by eyapp of semantic actions is bottom up, the deeper the associated subtree
	 the higher the priority.

       o Consider now the phrase "-NUM-NUM". There are two interpretations: one as "-(NUM-NUM)" and the other as "(-NUM)-NUM". The conflict occurs
	 between the productions

				     exp -> exp . '-' exp
				     exp -> '-' exp.

	 Both productions have the precedence of the token '-'. But we prefer the interpretation "(-NUM)-NUM" to win. We do that by explicitly
	 changing the precedence associated with the unary minus production via the %prec directive.

   Lexical Analysis
       Parsers created by "eyapp" do not deal directly with the input. Instead they expect the input to be processed by a lexical analyzer. The
       lexical analyzer parses the input and produces the next token. A token is a pair. The first component is the name of the token (like "NUM"
       or "VAR") and the second is its attribute (i.e. the information associated with the token, like that the value is 4 for a "NUM" or the
       identifier is "temperature" for a "VAR"). Tokens are usually defined using regular expressions. Thus the token "NUM" is characterized by
       "/[0-9]+(?:.[0-9]+)?/" and the token "VAR" by "/[A-Za-z][A-Za-z0-9_]*/". The eyapp compiler automatically generates a lexical analyzer
       from your token definitions. The tokens "NUM" and "VAR" were defined using the %token directives:

	 %token NUM = /([0-9]+(?:.[0-9]+)?)/
	 %token VAR = /([A-Za-z][A-Za-z0-9_]*)/

       The order in which the tokens are defined is important. The input will be matched against the regular expression for "NUM" before the
       regular expression for "VAR" is tried, and all the literal tokens that appear between quotes inside the body of the grammar, like '+' or
       "'-", are tried before any explicitly defined token.

       You can, alternatively, define the lexical analyzer explicitly. There are many ways to do it.  Here is an example of a definition of a
       lexical analyzer using the %lexer directive:

	 %lexer  {
	   m{G[ 	]*}gc;
	   m{G(
)+}gc 		   and $self->tokenline($1 =~ tr/
//);
	   m{G([0-9]+(?:.[0-9]+)?)}gc    and return ('NUM',	$1);
	   m{G([A-Za-z_][A-Za-z0-9_]*)}gc and return ('VAR',	$1);
	   m{G(.)}gc			   and return ($1,	$1);
	 }

       The %lexer directive is followed by the code defining the lexical analyzer.  When called, the variable $_ is an alias of the input.  The
       input can also be set and accessed via the "input" method of the $parser object.

       To catch the next pattern we use the anchor "G".  The "G" anchor matches at the point where the previous "/g" match left off.	Normally,
       when a scalar "m{}g" match fails, the match position is reset and "G" will start matching at the beginning of the string.  The "c" option
       causes the match position to be retained following an unsuccessful match.  The couple "('',undef)" which signals the end of the input is
       automatically inserted by "eyapp".

       By default, the lexers generated by "eyapp" emit the end-of-input token "('', undef)" when the end of the current string is reached. A
       incremental lexer differs from these behavior: when the end is reached it reads more input from the current file, which was set by

			$parser->YYInputFile

       See the following variant of the synopsis example:

	 ~/LEyapp/examples/eyappintro$ cat InputFromStream.eyp
	 %whites /([ 	]+)/
	 %token NUM = /([0-9]+(?:.[0-9]+)?)/
	 %token VAR = /([A-Za-z][A-Za-z0-9_]*)/

	 %right '='
	 %left	 '-' '+'
	 %left	 '*' '/'
	 %left	 NEG

	 %defaultaction { "$_[1] $_[3] $_[2]" }

	 # example of incremental lexer
	 %incremental lexer  'Write an arithmetic expression: '

	 %%
	 input: 		 {}
		 |   input line  {}
	 ;

	 line:	   '
'       {}
		 | exp '
'   { print "$_[1]
" }
		 | error '
'	{}
	 ;

	 exp:	     NUM		{ $_[1] }
		 |   VAR		{ $_[1] }
		 |   VAR '=' exp
		 |   exp '+' exp
		 |   exp '-' exp
		 |   exp '*' exp
		 |   exp '/' exp
		 |   '-' exp %prec NEG	{ "$_[2] NEG" }
		 |   '(' exp ')'	{ $_[2] }
	 ;

	 %%

       Now, after the grammar is compiled

	 ~/LEyapp/examples/eyappintro$ eyapp -C InputFromStream.eyp

       When the generated modulino is executed, each time the end of the input string is reached, it asks for more input until we press the end-
       of-file ("^D" in Unix) key:

	 ~/LEyapp/examples/eyappintro$ ./InputFromStream.pm -noslurp
	 Write an arithmetic expression: a=2+3*b
	 a 2 3 b * + =
	 Write an arithmetic expression: a=-b*2
	 a b NEG 2 * =
	 Write an arithmetic expression: ^D
	 ~/LEyapp/examples/eyappintro$

   The "main" and "error" subroutines
       If you compile your grammar with option "-C", "eyapp" will insert a line like this as the first line of the generated ".pm" file:

		       #!/usr/bin/perl

       It will also append a line like this as the last line of the ".pm" file:

		 unless (caller) { exit !__PACKAGE__->main(''); }

       This allows the alternative use of the module as a script.  Unless a "main" subroutine was defined, the one provided by
       Parse::Eyapp::Driver will be called. It also provides a default subroutine for the handling of error messages.

       The default main accepts a few arguments from the command line.	Here are some:

       o "-f filename" input from "filename"

       o "-c 'string'" input from 'string'

       o "-noslurp" when input is from "STDIN" don't wait for end of file

SEE ALSO

       o   The project home is at <http://code.google.com/p/parse-eyapp/>.  Use a subversion client to anonymously check out the latest project
	   source code:

	      svn checkout http://parse-eyapp.googlecode.com/svn/trunk/ parse-eyapp-read-only

       o   The tutorial Parsing Strings and Trees with "Parse::Eyapp" (An Introduction to Compiler Construction in seven pages) in
	   <http://nereida.deioc.ull.es/~pl/eyapsimple/>

       o   Parse::Eyapp, Parse::Eyapp::eyapplanguageref, Parse::Eyapp::debuggingtut, Parse::Eyapp::defaultactionsintro,
	   Parse::Eyapp::translationschemestut, Parse::Eyapp::Driver, Parse::Eyapp::Node, Parse::Eyapp::YATW, Parse::Eyapp::Treeregexp,
	   Parse::Eyapp::Scope, Parse::Eyapp::Base, Parse::Eyapp::datagenerationtut

       o   The pdf file in <http://nereida.deioc.ull.es/~pl/perlexamples/languageintro.pdf>

       o   The pdf file in <http://nereida.deioc.ull.es/~pl/perlexamples/debuggingtut.pdf>

       o   The pdf file in <http://nereida.deioc.ull.es/~pl/perlexamples/eyapplanguageref.pdf>

       o   The pdf file in <http://nereida.deioc.ull.es/~pl/perlexamples/Treeregexp.pdf>

       o   The pdf file in <http://nereida.deioc.ull.es/~pl/perlexamples/Node.pdf>

       o   The pdf file in <http://nereida.deioc.ull.es/~pl/perlexamples/YATW.pdf>

       o   The pdf file in <http://nereida.deioc.ull.es/~pl/perlexamples/Eyapp.pdf>

       o   The pdf file in <http://nereida.deioc.ull.es/~pl/perlexamples/Base.pdf>

       o   The pdf file in <http://nereida.deioc.ull.es/~pl/perlexamples/translationschemestut.pdf>

       o   The pdf file in <http://nereida.deioc.ull.es/~pl/perlexamples/treematchingtut.pdf>

       o   perldoc eyapp,

       o   perldoc treereg,

       o   perldoc vgg,

       o   The Syntax Highlight file for vim at <http://www.vim.org/scripts/script.php?script_id=2453> and <http://nereida.deioc.ull.es/~vim/>

       o   Analisis Lexico y Sintactico, (Notes for a course in compiler construction) by  Casiano Rodriguez-Leon.  Available at
	   <http://nereida.deioc.ull.es/~pl/perlexamples/> Is the more complete and reliable source for Parse::Eyapp. However is in Spanish.

       o   Parse::Yapp,

       o   Man pages of yacc(1) and bison(1), <http://www.delorie.com/gnu/docs/bison/bison.html>

       o   Language::AttributeGrammar

       o   Parse::RecDescent.

       o   HOP::Parser

       o   HOP::Lexer

       o   ocamlyacc tutorial at <http://plus.kaist.ac.kr/~shoh/ocaml/ocamllex-ocamlyacc/ocamlyacc-tutorial/ocamlyacc-tutorial.html>

REFERENCES

       o   The classic Dragon's book Compilers: Principles, Techniques, and Tools by Alfred V. Aho, Ravi Sethi and Jeffrey D. Ullman (Addison-
	   Wesley 1986)

       o   CS2121: The Implementation and Power of Programming Languages (See <http://www.cs.man.ac.uk/~pjj>,
	   <http://www.cs.man.ac.uk/~pjj/complang/g2lr.html> and <http://www.cs.man.ac.uk/~pjj/cs2121/ho/ho.html>) by Pete Jinks

CONTRIBUTORS

       o Hal Finkel <http://www.halssoftware.com/>

       o G. Williams <http://kasei.us/>

       o Thomas L. Shinnick <http://search.cpan.org/~tshinnic/>

       o Frank Leray

AUTHOR

       Casiano Rodriguez-Leon (casiano@ull.es)

ACKNOWLEDGMENTS

       This work has been supported by CEE (FEDER) and the Spanish Ministry of Educacion y Ciencia through Plan Nacional I+D+I number
       TIN2005-08818-C04-04 (ULL::OPLINK project <http://www.oplink.ull.es/>).	Support from Gobierno de Canarias was through GC02210601 (Grupos
       Consolidados).  The University of La Laguna has also supported my work in many ways and for many years.

       A large percentage of  code is verbatim taken from Parse::Yapp 1.05.  The author of Parse::Yapp is Francois Desarmenien.

       I wish to thank Francois Desarmenien for his Parse::Yapp module, to my students at La Laguna and to the Perl Community. Thanks to the
       people who have contributed to improve the module (see "CONTRIBUTORS" in Parse::Eyapp).	Thanks to Larry Wall for giving us Perl.  Special
       thanks to Juana.

LICENCE AND COPYRIGHT

       Copyright (c) 2006-2008 Casiano Rodriguez-Leon (casiano@ull.es). All rights reserved.

       Parse::Yapp copyright is of Francois Desarmenien, all rights reserved. 1998-2001

       These modules are free software; you can redistribute it and/or modify it under the same terms as Perl itself. See perlartistic.

       This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of
       MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

perl v5.18.2							    2012-03-23					       Parse::Eyapp::eyappintro(3)
Linux and UNIX Man Pages

parse::eyapp::eyappintro5.18(3) [mojave man page]