Linux and UNIX Man Pages

Linux & Unix Commands - Search Man Pages

parse::eyapp::eyappintro(3) [osx man page]

Parse::Eyapp::eyappintro(3)				User Contributed Perl Documentation			       Parse::Eyapp::eyappintro(3)

Parse::Eyapp::eyappintro - An introduction to Parse::Eyapp SYNOPSIS
# File 'calc.eyp': translates infix expressions to postfix # Compile it with: eyapp -o -C Postfix.eyp # Execution: ./ -c 'a = 2*3+b' %token NUM = /([0-9]+(?:.[0-9]+)?)/ %token VAR = /([A-Za-z][A-Za-z0-9_]*)/ %right '=' %left '-' '+' %left '*' '/' %left NEG %defaultaction { "$left $right $op"; } %% line: $exp { print "$exp " } ; exp: $NUM { $NUM } | $VAR { $VAR } | VAR.left '='.op exp.right | exp.left '+'.op exp.right | exp.left '-'.op exp.right | exp.left '*'.op exp.right | exp.left '/'.op exp.right | '-' $exp %prec NEG { "$exp NEG" } | '(' $exp ')' { $exp } ; %% INTRODUCTION TO PARSING WITH Parse::Eyapp Introduction Parsing is the activity of producing a syntax tree from an input stream. The program example in the synopsis section shows an example of a translation scheme. It translates infix arithmetic expressions like a = 2 * 3 + 4 * b into postfix expressions like a 2 3 * 4 b * + = The program contains a context free eyapp grammar defining the language of arithmetic infix expressions. A context free grammar is a mathematical device to define languages. To better see the grammar for the example above we have to eliminate the semantic actions (We call semantic actions to the sections delimited by curly brackets containing Perl code). This can be done calling "eyapp" with options "-vc": $ eyapp -vc Postfix.eyp %token NUM =/([0-9]+(?:.[0-9]+)?)/ %token VAR =/([A-Za-z][A-Za-z0-9_]*)/ %right '=' %left '-' '+' %left '*' '/' %left NEG %% line: exp ; exp: NUM | VAR | VAR '=' exp | exp '+' exp | exp '-' exp | exp '*' exp | exp '/' exp | '-' exp %prec NEG | '(' exp ')' ; %% A grammar generates a language. A grammar is defined by a set of production rules. A production rule has two components: a left hand side which is a syntactic variable or non terminal and a right hand side which is a phrase made of syntactic variables and terminals. The left hand side (lhs) and the right hand side (rhs) are usually separated by an arrow like in: exp -> VAR = exp A note: when you see a production rule like: line: exp <+ ';'> is not really a production rule but an abbreviation for two productions. It stands for: line : exp | line ';' exp ; A terminal or token never appears on the left hand side of a production rule. Ambiguity The phrases of the language are those obtained successively applying the production rules of the grammar until no more rules can be applied. The successive substitutions must start from the "start" symbol of the grammar ("line" in the example). Such legal sequence of substitutions is known as a derivation. The following is an example of a legal derivation (the big arrow "=>" is read derives): line => exp => VAR = exp => VAR = exp + exp => VAR = exp + NUM => VAR = VAR + NUM thus the phrase "VAR = VAR + NUM" belongs to the language generated by the former grammar. A derivation like this can be seen as a tree. For instance, the former derivation is equivalent (has the same information) than the following tree: +----+ |line| +----+ | +---+ |exp| +---+ .-----.-----^-----. +---+ +---+ +---+ |VAR| | = | |exp| +---+ +---+ +---+ .-----+-----. +---+ +---+ +---+ |exp| | + | |exp| +---+ +---+ +---+ | | +---+ +---+ |VAR| |NUM| +---+ +---+ which can be written more succinctly: line(exp(VAR, '=', exp(exp(VAR), '+', exp(NUM)))) or even more briefly: VAR = (VAR + NUM) Such a tree is called a syntax tree for the input "VAR = VAR + NUM". A grammar is said to be ambiguous if there are phrases in the generated language that have more than one syntax tree. The grammar in the synopsis example is ambiguous. Here is an alternative tree for the same phrase "VAR = VAR + NUM": +----+ |line| +----+ | +---+ |exp| +---+ .----------^-----.-----. +---+ +---+ +---+ |exp| | + | |exp| +---+ +---+ +---+ .-----+-----. | +---+ +---+ +---+ +---+ |VAR| | = | |exp| |NUM| +---+ +---+ +---+ +---+ | +---+ |VAR| +---+ or line(exp(exp(VAR, '=', exp(VAR)), '+', exp(NUM))) or (VAR = VAR) + NUM Semantic Actions and Attributes "Parse::Eyapp" analyzes your grammar and produce a LALR parser. Actually the SYNOPSIS example is more than a context free grammar: is a translation scheme. A translation scheme scheme is a context free grammar where the right hand sides of the productions have been augmented with semantic actions (i.e. with chunks of Perl code): A -> alpha { action(@_) } beta The analyzer generated by Eyapp executes "{ action(@_) }" after all the semantic actions associated with "alpha" have been executed and before the execution of any of the semantic actions associated with "beta". In a translation scheme each symbol occurrence has an associated attribute. The embedded actions modify the attributes associated with the symbols of the grammar: A -> alpha { action(@_) } beta Each symbol on the right hand side of a production rule has an associated scalar attribute. In "eyapp" the attributes of the symbols to the left of "action" are passed as arguments to "action" (in the example, those of "alpha"). These arguments are preceded by a reference to the syntax analyzer object. Therefore, you can access to the attributes associated with the first, second, etc. symbols in the right hand side using the notation: $_[1], $_[2], ... However it is better to refer to the attributes by names. This is the purpose of the dot and dollar notations as in: exp: $NUM { $NUM } | $VAR { $VAR } | VAR.left '='.op exp.right | exp.left '+'.op exp.right | exp.left '-'.op exp.right | exp.left '*'.op exp.right | exp.left '/'.op exp.right | '-' $exp %prec NEG { "$exp NEG" } | '(' $exp ')' { $exp } ; By prefixing the symbol "NUM" by a "$" we can refer to the associated attribute inside the semantic action as $NUM: exp: $NUM { $NUM } By postfixing the two appearances of "expr" with ".left" and ".right" and the appearance of '+' with ".op" we can refer to the associates attributes as $left, $right and $op instead of $_[1], $_[3] and $_[2]: %defaultaction { "$left $right $op"; } There is no way inside an ordinary "eyapp" program for an intermediate "action" to access the attributes of the symbols on its right, i.e. those associated with the symbols of "beta". This restriction is lifted if you use the %metatree directive to build a full translation scheme. See Parse::Eyapp::translationschemestut to know more about full translation schemes. Actions on the right hand side counts as symbols and so they can be referenced by its positional argument in later actions in the same production rule. For intermediate actions, the value returned by the "action" is the attribute associated with such action. For an action at the end of the rule: A -> alpha { lastaction(@_) } the returned value constitutes the attribute of the left hand side of the rule (the attribute of "A" in this case). The action at the end of the right hand side is called the action associated with the production rule. When no explicit action has been associated with a production rule the default action applies. In "Parse::Eyapp" the programmer can define what is the default action through the %defaultaction directive: %defaultaction { "$left $right $op"; } Actually, intermediate actions are implemented via a trick. When "eyapp" sees an intermediate action like: A -> alpha { action(@_) } beta it creates a new auxiliary syntactic variable "Temp": Temp -> /* empty */ { action(@_) } Solving Ambiguities via Precedence and Associativity Declarations Notice that ambiguous grammars produce ambiguous translation schemes: since a phrase may have two syntactic trees it will be more than one tree-traversing and consequently more than one way to execute the embedded semantic actions. Certainly different execution orders will usually produce different results. Thus, syntactic ambiguities translate onto semantic ambiguities. That is why it is so important to resolve all the ambiguities and conflicts that may arise in our grammars. This is the function of the %left and %right declarations on the header section: %right '=' # Lowest precedence %left '-' '+' # + and - have more precedence than = Disambiguate a-b-c as (a-b)-c %left '*' '/' # * and / have more precedence than + Disambiguate a/b/c as (a/b)/c %left NEG # Disambiguate -a-b as (-a)-b and not as -(a-b) Priority can be assigned to tokens by using the %left and %right declarations. Tokens in lines below have more precedence than tokens in line above. The idea behind this notation is this: Any ambiguity can be seen as a parenthesizing problem. You can parenthesize left (in the jargon this is called reduce) or parenthesize right (in the jargon, shift). Recall the main points of yacc-like parsers related to priorities: o The directives %left %right %nonassoc can be used in the head section to declare the priority of a token o The later the declaration line the higher the priority o Tokens in the same line have the same priority. Ties will be solved using the token associativity (whether they were declared %left or %right) o The precedence of a production rule (right hand side) is the precedence of the last token in the right hand side o If the parser emits a warning announcing a shift-reduce conflict or a reduce-reduce conflict in your grammar, it likely means that your grammar is ambiguous or not LALR. In such case, recompile the grammar with "eyapp -v" and carefully study the ".output" file generated. Detect which token and which rules are involved in the conflict. o In a shift-reduce conflict the default action is to shift (i.e. associate right). This action can be changed if the production and the token involved have explicit priorities o Most of the time the presence of a reduce-reduce conflict means that your grammar is ambiguous. Rewrite your grammar. Alternatively, use the %conflict and %PREC directives (see example "debuggintut/pascalenumeratedvsrangesolvedviadyn.eyp"). The default action is to reduce by the first production. o If the precedence of the production rule is higher the shift-reduce conflict is solved in favor of the reduction (i.e. associate left) o If the precedence of the token is higher the shift-reduce conflict is solved in favor of the shift (i.e. associate right). o If the precedence of the token is the same than the precedence of the rule, and is left the shift-reduce conflict is solved in favor of the reduction (i.e. associate left) o If the precedence of the token is the same than the precedence of the rule, and is right the shift-reduce conflict is solved in favor of the shift o If the precedence of the token is the same than the precedence of the rule, and is nonassoc the presence of a shift-reduce conflict means an error. This is used to describe operators, like the operator ".LT." in FORTRAN, that may not associate with themselves. That is, because A .LT. B .LT. C is invalid in FORTRAN, ".LT." would be described with the keyword %nonassoc in eyapp. o The default precedence of a production can be changed using the "%prec TOKEN" directive. Now the rule has the precedence and associativity of the specified "TOKEN". Examples o By giving token '+' more precedence than token '=' we solve the ambiguity in "VAR = VAR + NUM" in favor of " VAR = (VAR + NUM)". The conflict occurs between the productions exp -> exp . '+' exp exp -> VAR '=' exp . Where the dot means: If I have seen "VAR '=' exp" and I am in the presence of a token '+' I can associate left, i.e. reduce "VAR '=' exp" to "exp" or to associate right, that is, to shift to the right to reduce "exp '+' exp" to "exp" later. o How it works when two tokens are declared in the same line? Consider the phrase "NUM - NUM - NUM". It will be interpreted as "(NUM - NUM) - NUM" if the token '-' is declared "%left '-'" and will be interpreted as "NUM - (NUM - NUM)" if the token '-' is declared "%right '-'". By saying '-' is left we are saying we prefer between the two trees in dispute the one that deepens to the left: +---+ |exp| +---+ .--------^--.-----. +---+ +---+ +---+ |exp| | - | |exp| +---+ +---+ +---+ .-----+-----. | +---+ +---+ +---+ +---+ |exp| | - | |exp| |NUM| +---+ +---+ +---+ +---+ | | +---+ +---+ |NUM| |NUM| +---+ +---+ By saying '-' is right we are saying we prefer between the two trees in dispute the one that deepens to the right: +---+ |exp| +---+ .-----.-----^-----. +---+ +---+ +---+ |exp| |MIN| |exp| +---+ +---+ +---+ | .-----+-----. +---+ +---+ +---+ +---+ |NUM| |exp| |MIN| |exp| +---+ +---+ +---+ +---+ | | +---+ +---+ |NUM| |NUM| +---+ +---+ Since priority means earlier evaluation and the evaluation by eyapp of semantic actions is bottom up, the deeper the associated subtree the higher the priority. o Consider now the phrase "-NUM-NUM". There are two interpretations: one as "-(NUM-NUM)" and the other as "(-NUM)-NUM". The conflict occurs between the productions exp -> exp . '-' exp exp -> '-' exp. Both productions have the precedence of the token '-'. But we prefer the interpretation "(-NUM)-NUM" to win. We do that by explicitly changing the precedence associated with the unary minus production via the %prec directive. Lexical Analysis Parsers created by "eyapp" do not deal directly with the input. Instead they expect the input to be processed by a lexical analyzer. The lexical analyzer parses the input and produces the next token. A token is a pair. The first component is the name of the token (like "NUM" or "VAR") and the second is its attribute (i.e. the information associated with the token, like that the value is 4 for a "NUM" or the identifier is "temperature" for a "VAR"). Tokens are usually defined using regular expressions. Thus the token "NUM" is characterized by "/[0-9]+(?:.[0-9]+)?/" and the token "VAR" by "/[A-Za-z][A-Za-z0-9_]*/". The eyapp compiler automatically generates a lexical analyzer from your token definitions. The tokens "NUM" and "VAR" were defined using the %token directives: %token NUM = /([0-9]+(?:.[0-9]+)?)/ %token VAR = /([A-Za-z][A-Za-z0-9_]*)/ The order in which the tokens are defined is important. The input will be matched against the regular expression for "NUM" before the regular expression for "VAR" is tried, and all the literal tokens that appear between quotes inside the body of the grammar, like '+' or "'-", are tried before any explicitly defined token. You can, alternatively, define the lexical analyzer explicitly. There are many ways to do it. Here is an example of a definition of a lexical analyzer using the %lexer directive: %lexer { m{G[ ]*}gc; m{G( )+}gc and $self->tokenline($1 =~ tr/ //); m{G([0-9]+(?:.[0-9]+)?)}gc and return ('NUM', $1); m{G([A-Za-z_][A-Za-z0-9_]*)}gc and return ('VAR', $1); m{G(.)}gc and return ($1, $1); } The %lexer directive is followed by the code defining the lexical analyzer. When called, the variable $_ is an alias of the input. The input can also be set and accessed via the "input" method of the $parser object. To catch the next pattern we use the anchor "G". The "G" anchor matches at the point where the previous "/g" match left off. Normally, when a scalar "m{}g" match fails, the match position is reset and "G" will start matching at the beginning of the string. The "c" option causes the match position to be retained following an unsuccessful match. The couple "('',undef)" which signals the end of the input is automatically inserted by "eyapp". By default, the lexers generated by "eyapp" emit the end-of-input token "('', undef)" when the end of the current string is reached. A incremental lexer differs from these behavior: when the end is reached it reads more input from the current file, which was set by $parser->YYInputFile See the following variant of the synopsis example: ~/LEyapp/examples/eyappintro$ cat InputFromStream.eyp %whites /([ ]+)/ %token NUM = /([0-9]+(?:.[0-9]+)?)/ %token VAR = /([A-Za-z][A-Za-z0-9_]*)/ %right '=' %left '-' '+' %left '*' '/' %left NEG %defaultaction { "$_[1] $_[3] $_[2]" } # example of incremental lexer %incremental lexer 'Write an arithmetic expression: ' %% input: {} | input line {} ; line: ' ' {} | exp ' ' { print "$_[1] " } | error ' ' {} ; exp: NUM { $_[1] } | VAR { $_[1] } | VAR '=' exp | exp '+' exp | exp '-' exp | exp '*' exp | exp '/' exp | '-' exp %prec NEG { "$_[2] NEG" } | '(' exp ')' { $_[2] } ; %% Now, after the grammar is compiled ~/LEyapp/examples/eyappintro$ eyapp -C InputFromStream.eyp When the generated modulino is executed, each time the end of the input string is reached, it asks for more input until we press the end- of-file ("^D" in Unix) key: ~/LEyapp/examples/eyappintro$ ./ -noslurp Write an arithmetic expression: a=2+3*b a 2 3 b * + = Write an arithmetic expression: a=-b*2 a b NEG 2 * = Write an arithmetic expression: ^D ~/LEyapp/examples/eyappintro$ The "main" and "error" subroutines If you compile your grammar with option "-C", "eyapp" will insert a line like this as the first line of the generated ".pm" file: #!/usr/bin/perl It will also append a line like this as the last line of the ".pm" file: unless (caller) { exit !__PACKAGE__->main(''); } This allows the alternative use of the module as a script. Unless a "main" subroutine was defined, the one provided by Parse::Eyapp::Driver will be called. It also provides a default subroutine for the handling of error messages. The default main accepts a few arguments from the command line. Here are some: o "-f filename" input from "filename" o "-c 'string'" input from 'string' o "-noslurp" when input is from "STDIN" don't wait for end of file SEE ALSO
o The project home is at <>. Use a subversion client to anonymously check out the latest project source code: svn checkout parse-eyapp-read-only o The tutorial Parsing Strings and Trees with "Parse::Eyapp" (An Introduction to Compiler Construction in seven pages) in <> o Parse::Eyapp, Parse::Eyapp::eyapplanguageref, Parse::Eyapp::debuggingtut, Parse::Eyapp::defaultactionsintro, Parse::Eyapp::translationschemestut, Parse::Eyapp::Driver, Parse::Eyapp::Node, Parse::Eyapp::YATW, Parse::Eyapp::Treeregexp, Parse::Eyapp::Scope, Parse::Eyapp::Base, Parse::Eyapp::datagenerationtut o The pdf file in <> o The pdf file in <> o The pdf file in <> o The pdf file in <> o The pdf file in <> o The pdf file in <> o The pdf file in <> o The pdf file in <> o The pdf file in <> o The pdf file in <> o perldoc eyapp, o perldoc treereg, o perldoc vgg, o The Syntax Highlight file for vim at <> and <> o Analisis Lexico y Sintactico, (Notes for a course in compiler construction) by Casiano Rodriguez-Leon. Available at <> Is the more complete and reliable source for Parse::Eyapp. However is in Spanish. o Parse::Yapp, o Man pages of yacc(1) and bison(1), <> o Language::AttributeGrammar o Parse::RecDescent. o HOP::Parser o HOP::Lexer o ocamlyacc tutorial at <> REFERENCES
o The classic Dragon's book Compilers: Principles, Techniques, and Tools by Alfred V. Aho, Ravi Sethi and Jeffrey D. Ullman (Addison- Wesley 1986) o CS2121: The Implementation and Power of Programming Languages (See <>, <> and <>) by Pete Jinks CONTRIBUTORS
o Hal Finkel <> o G. Williams <> o Thomas L. Shinnick <> o Frank Leray AUTHOR
Casiano Rodriguez-Leon ( ACKNOWLEDGMENTS
This work has been supported by CEE (FEDER) and the Spanish Ministry of Educacion y Ciencia through Plan Nacional I+D+I number TIN2005-08818-C04-04 (ULL::OPLINK project <>). Support from Gobierno de Canarias was through GC02210601 (Grupos Consolidados). The University of La Laguna has also supported my work in many ways and for many years. A large percentage of code is verbatim taken from Parse::Yapp 1.05. The author of Parse::Yapp is Francois Desarmenien. I wish to thank Francois Desarmenien for his Parse::Yapp module, to my students at La Laguna and to the Perl Community. Thanks to the people who have contributed to improve the module (see "CONTRIBUTORS" in Parse::Eyapp). Thanks to Larry Wall for giving us Perl. Special thanks to Juana. LICENCE AND COPYRIGHT
Copyright (c) 2006-2008 Casiano Rodriguez-Leon ( All rights reserved. Parse::Yapp copyright is of Francois Desarmenien, all rights reserved. 1998-2001 These modules are free software; you can redistribute it and/or modify it under the same terms as Perl itself. See perlartistic. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. perl v5.16.2 2012-03-23 Parse::Eyapp::eyappintro(3)
Man Page