osf1 yacc man page on unix.com

yacc(1) 						      General Commands Manual							   yacc(1)

NAME
       yacc - Generates an LR(1) parsing program from input consisting of a context-free grammar specification

SYNOPSIS
       yacc [-vltds] [-b prefix] [-N number] [-p symbol_prefix] [-P pathname] grammar

STANDARDS
       Interfaces documented on this reference page conform to industry standards as follows:

       yacc:  XPG4, XPG4-UNIX

       Refer to the standards(5) reference page for more information about industry standards and associated tags.

OPTIONS
       Uses  prefix  instead  of y as the prefix for all output filenames (prefix.tab.c, prefix.tab.h, and prefix.output).  Produces the <y.tab.h>
       file, which contains the #define statements that associate the yacc-assigned token codes with your token names. This  allows  source  files
       other  than  y.tab.c to access the token codes by including this header file.  Includes no #line constructs in y.tab.c. Use this only after
       the grammar and associated actions are fully debugged.  [Tru64 UNIX]  Provides yacc with extra storage for building its LALR tables,  which
       may  be	necessary  when  compiling very large grammars. The number should be larger than 40,000 when you use this option.  Allows multiple
       yacc parsers to be linked together. Use symbol_prefix instead of yy to prefix  global  symbols.	 [Tru64  UNIX]	Specifies  an  alternative
       parser  (instead  of  /usr/ccs/lib/yaccpar).  The  pathname specifies the filename of the skeleton to be used in place of yaccpar).  [Tru64
       UNIX]  Breaks the yyparse() function into several smaller functions. Because its size is somewhat proportional to that of the  grammar,	it
       is  possible for yyparse() to become too large to compile, optimize, or execute efficiently.  Compiles run-time debugging code. By default,
       this code is not included when y.tab.c is compiled. If YYDEBUG has a nonzero value, the	C  compiler  (cc)  includes  the  debugging  code,
       whether	or  not  the  -t option was used. Without compiling this code, yyparse() will run more quickly.  Produces the y.output file, which
       contains a readable description of the parsing tables and a report on conflicts generated by grammar ambiguities.

OPERANDS
       The pathname of a file containing input instructions. The format of this file is described in the DESCRIPTION section.

DESCRIPTION
       The yacc command converts a context-free grammar specification into a set of tables for a simple automaton that executes an  LR(1)  parsing
       algorithm. The yacc grammar can be ambiguous; specified precedence rules are used to break ambiguities.

       You must compile the y.tab.c output file with a C language compiler to produce the yyparse() function.  This function must be loaded with a
       yylex lexical analyzer function, as well as two routines that you must provide, main() and an error-handling routine,  yyerror().  The  lex
       command is useful for creating lexical analyzers usable by yacc.

       The  yacc  program  reads  its  skeleton parser from the file /usr/ccs/lib/yaccpar. Use the environment variable YACCPAR to specify another
       location for the yacc program to read from. If you use this environment variable, the -P option is ignored, if specified.

       The general format of the yacc input file is as follows:

       [definitions] %% rules [%% [user subroutines]]

       where Is the section where you define the variables to be used later in the grammar, such as in the rules section. It is also  where  files
       are  included  (#include) and processing conditions are defined.  This section is optional.  Is the section that contains grammar rules for
       the parser.  A yacc input file must have a rules section.  Is the section that contains user-supplied subroutines that can be used  by  the
       actions in the rules section. This section is optional.

       Comments,  in  C syntax, can appear anywhere in the user subroutines section or the definitions section. In the rules section, comments can
       appear wherever a symbol is allowed. Blank lines or lines consisting of white space can be inserted anywhere in the file, and are  ignored.
       The NULL character must not be used in grammar rules or literals.

   Definitions Section of Input File
       The  definitions  section  of  a  yacc input file contains entries that perform the following functions: Includes standard I/O header file.
       Defines global variables.  Defines the list rule as the place to start processing.  Defines the tokens used by  the  parser.   Defines  the
       operators and their precedence.

       Each  line in the definitions section can be: When placed on lines by themselves, these enclose C code to be passed into the global defini-
       tions of the output file. Such lines commonly include preprocessor directives and declarations of external variables and functions.   Lists
       tokens  or  terminal symbols to be used in the rest of the input file. This line is needed for tokens that do not appear in other % defini-
       tions. If type is present, the C type for all tokens on this line is declared to be the type referenced by type. If a positive integer num-
       ber  follows  a	token,	that value is assigned to the token.  Indicates that each token is an operator, all tokens in this definition have
       equal precedence, and a succession of the operators listed in this definition are evaluated left to right.  Indicates that each token is an
       operator,  that	all  tokens in this definition have equal precedence, and that a succession of the operators listed in this definition are
       evaluated right to left.  Indicates that each token is an operator, and that the operators listed in this definition cannot appear in  suc-
       cession. Indicates that the token cannot be used associatively.	Indicates the highest-level production rule to be reduced; in other words,
       the rule where the parser can consider its work done and can terminate processing. If this definition is not included, the parser uses  the
       first  production  rule.  The symbol must be non-terminal (not a token).  Defines each symbol as data type type, to resolve ambiguities. If
       this construct is present, yacc performs type checking and otherwise assumes all symbols to be of type integer.	Defines the yylval  global
       variable as a union, where union-def is a standard C definition in the format: { type member ; [type member ; ...] }

	      At  least  one  member  should  be an int. Any valid C data type can be defined, including structures. When you run yacc with the -d
	      option, the definition of yylval is placed in the <y.tab.h> file and can be referred to in a lex input file.

       Every token (non-terminal symbol) must be listed in one of the preceding % definitions. Multiple tokens can be separated by white space	or
       commas. All the tokens in %left, %right, and %nonassoc definitions are assigned a precedence with tokens in later definitions having prece-
       dence over those in earlier definitions.

       In addition to symbols, a token can be literal character enclosed in single quotes. (Multibyte characters are  recognized  by  the  lexical
       analyzer  and returned as tokens.) The following special characters can be used, just as in C programs: Alert Newline Tab Vertical tab Car-
       riage Return Backspace Form Feed Backslash Single Quote Question mark One or more octal digits specifying the integer value of the  charac-
       ter

   Rules Section of Input File
       The  rules section of a yacc input file defines the rules that parse the input stream. It consists of a series of production rules that the
       parser tries to reduce. The format of each production rule is:

       symbol : symbol-sequence [action] [| symbol-sequence [action] ...] ;

       A symbol-sequence consists of zero or more symbols separated by white space. The first symbol must be the first character of the line,  but
       newlines and other white space can appear anywhere else in the rule. All terminal symbols must be declared in %token definitions.

       Each symbol-sequence represents an alternative way of reducing the rule. A symbol can appear recursively in its own rule.  Always use left-
       recursion (where the recursive symbol appears before the terminating case in symbol-sequence).

       The following sequence indicates that the current sequence of symbols is to be preferred over others, at the level of  precedence  assigned
       to token in the definitions section of the input file:

       %prec token

       The  specially defined token error matches any unrecognized sequence of input. This token causes the parser to invoke the yyerror function.
       By default, the parser tries to synchronize with the input and continue processing it by reading and discarding all input up to the  symbol
       following  error. (You can override this behavior through the yyerrok action.) If no error token appears in the yacc input file, the parser
       exits with an error message upon encountering unrecognized input.

       The parser always executes action after encountering the symbol that precedes it. Thus, an action can appear in the  middle  of	a  symbol-
       sequence,  after each symbol-sequence, or after multiple instances of symbol-sequence. In the last case, action is executed when the parser
       matches any of the sequences.

       The action consists of standard C code within braces and can also take the  following  values,  variables,  and	keywords.   If	the  token
       returned  by  the  yylex function is associated with a significant value, yylex should place the value in this global variable. By default,
       yylval is of type long. The definitions section can include a %union definition to associate with other data types,  including  structures.
       If  you	run yacc with the -d option, the full yylval definition is passed into the <y.tab.h> file for access by lex.  Causes the parser to
       start parsing tokens immediately after an erroneous sequence, instead of performing the default action of reading and discarding tokens	up
       to  a  synchronization token. The yyerrok action should appear immediately after the error token.  Refers to symbol n, a token index in the
       production, counting from the beginning of the production rule, where the first symbol after the colon is $1. The type variable is the name
       of  one	of the union lines listed in the %union directive in the declaration section. The <type> syntax (non-standard) allows the value to
       be cast to a specific data type. Note that you will rarely need to use the type syntax.	Refers to the value returned by the  matched  sym-
       bol-sequence and used for the matched symbol when reducing other rules. The symbol-sequence generally assigns a value to $$. The type vari-
       able is the name of one of the union lines listed in the %union directive in the declaration  section.  The  <type>  syntax  (non-standard)
       allows the value to be cast to a specific data type. Note that you will rarely need to use the type syntax.

   User Subroutines Section of Input File
       The  user  subroutines  section of the yacc input file contains user-supplied functions. Because these functions are included in this file,
       you do not need to use the yacc library when processing this file. If you supply a lexical analyzer (yylex) to the parser, it must be  con-
       tained in the user subroutines section.

       The  following  functions,  which are contained in the user subroutines section, are invoked within the yyparse function generated by yacc.
       The lexical analyzer called by yyparse to recognize each token of input. Usually this function is created by lex.  yylex reads input,  rec-
       ognizes	expressions within the input, and returns a token number representing the kind of token read. The function returns an int value. A
       return value of 0 (zero) means the end of input.

	      If the parser and yylex do not agree on these token numbers, reliable communication between them	cannot	occur.	For  one-character
	      literals,  the  token is simply the numeric value of the character in the current character set. The numbers for other tokens can be
	      chosen by either yacc or the user. In either case, the #define construct of C is used to allow yylex() to return these numbers  sym-
	      bolically. The #define statements are put into the code file, and into the header file if that file is requested. The set of charac-
	      ters permitted by yacc in an identifier is larger than that permitted by C. Token names found to contain such characters will not be
	      included in the #define declarations.

	      If  the  token numbers are chosen by yacc, those tokens other than literals are assigned numbers greater than 256, although no order
	      is implied. A token can be explicitly assigned a number by following its first appearance in the declaration section with a  number.
	      Names  and literals not defined in this way retain their default definition. All assigned token numbers are unique and distinct from
	      the token numbers used for literals. If duplicate token numbers cause conflicts in parser generation, yacc reports an error;  other-
	      wise, it is unspecified whether the token assignment is accepted or an error is reported.

	      The end of the input is marked by a special token called the endmarker that has a token number that is zero or negative. All lexical
	      analyzers return zero or negative as a token number upon reaching the end of their input. If the tokens up to,  but  not	excluding,
	      the  endmarker  form  a structure that matches the start symbol, the parser accepts the input. If the endmarker is seen in any other
	      context, it is considered an error.  The function that the parser calls upon encountering an  input  error.  The	default  function,
	      defined in liby.a, simply prints string to the standard error. The user can redefine the function. The function's type is void.  The
	      wrap-up routine that returns a value of 1 when the end of input occurs.

       The liby.a library contains default main() and yyerror() functions. (main() is the required main program that calls yyparse() to start  the
       program.) These routines look like the following, respectively:

       main() {
	    setlocale(LC_ALL, );
	    (void) yyparse();
	    return(0); }

       int yyerror(s);
	    char *s; {
	    fprintf(stderr,"%s
",s);
	    return (0); }

NOTES
       The  LANG  and  LC_* variables affect the execution of the yacc command as stated. The main() function defined by yacc issues the following
       call:

       setlocale(LC_ALL, )

       As a result, the program generated by yacc will also be affected by the contents of these variables at run time.

       The lex program can be compiled as a C program with -std0, -std, or -std1 mode. It can also be compiled as a C++ program. If YY_NOPROTO	is
       defined on the compilation command line, function prototypes are not generated.

EXIT STATUS
       The following exit values are returned: Successful completion.  An error occurred.

EXAMPLES
       This section describes the example programs for the lex and yacc commands, which together create a simple desk calculator program that per-
       forms addition, subtraction, multiplication, and division operations. The calculator program also allows you to assign values to  variables
       (each designated by a single lowercase ASCII letter), and then use the variables in calculations. The files that contain the program are as
       follows: The lex specification file that defines the lexical analysis rules.  The yacc grammar file that  defines  the  parsing	rules  and
       calls the yylex() function created by lex to provide input.

       The remaining text expects that the current directory is the directory that contains the lex and yacc example program files.

   Compiling the Example Program
       Perform	the  following	steps  to create the example program using lex and yacc: Process the yacc grammar file using the -d option. The -d
       option tells yacc to create a file that defines the tokens it uses in addition to creating the C language source code file.

	      yacc -d calc.y

	      The following files are created: The C language source file that yacc created for the parser.   A  header  file  containing  #define
	      statements for the tokens used by the parser.

	      (The *.o files are created temporarily and then removed.)  Process the lex specification file:

	      lex calc.l

	      The  following  file  is	created: The C language source file that lex created for the lexical analyzer.	Compile and link the two C
	      language source files:

	      cc -o calc y.tab.c lex.yy.c

	      The following files are created: The object file for y.tab.c.  The object file for lex.yy.c.  The executable program file.

       You can then run the program directly by entering: calc

       Then, enter numbers and operators in calculator fashion. After you press <Return>, the program displays the result of  the  operation.	If
       you assign a value to a variable as follows, the cursor moves to the next line:

       m=4 <Return> _

       You can then use the variable in calculations and it will have the value assigned to it:

       m+5 <Return> 9

   The Parser Source Code
       The file calc.y has entries in all three of the sections of a yacc grammar file--declarations, rules, and user subroutines. It contains the
       following source code:

       %{ #include <stdio.h>

       int regs[26]; int base;

       %}

       %start list

       %token DIGIT LETTER

       %left '|' %left '&' %left '+' '-' %left '*' '/' '%' %left UMINUS /*supplies precedence for unary minus */

       %%     /* beginning of rules section */

       list   :      /*empty */
	      |      list stat '
'
	      |      list error '
'
		     {	      yyerrok;	      }
	      ;

       stat   :      expr
		     {	      printf("%d
",$1);	}
	      |      LETTER '=' expr
		     {	      regs[$1] = $3;  }
	      ;

       expr   :      '(' expr ')'
		     {	    $$ = $2;	    }
	      |      expr '*' expr
		     {	      $$ = $1 * $3;	   }
	      |      expr '/' expr
	      {      $$ = $1 / $3;	  }
	      |      expr '%' expr
		     {	      $$ = $1 % $3;	   }
	      |      expr '+' expr
		     {	      $$ = $1 + $3;	   }
	      |      expr '-' expr
		     {	      $$ = $1 - $3;	   }
	      |      expr '&' expr
		     {	      $$ = $1 & $3;	   }
	      |      expr '|' expr
		     {	      $$ = $1 | $3;	   }
	      |      '-' expr %prec UMINUS
		     {	      $$ = -$2;        }
	      |      LETTER
		     {	      $$ = regs[$1];	    }
	      |      number
	      ;

       number :      DIGIT
		     {	      $$ = $1; base = ($1==0) ? 8:10;	     }
	      |      number	   DIGIT
		     {	      $$ = base * $1 + $2;	  }
	      ;

       %%     /* beginning of user subroutines section */ main() {
	       return(yyparse()); }

       yyerror(s) char *s; {
	       fprintf(stderr,"%s
",s); }

       yywrap() {
	       return(1); }

   The Lexical Analyzer Source Code
       The file calc.l contains the lexical analyzer source code. It contains the rules used to generate the tokens from  the  input  stream.	It
       also contains include statements for standard input and output, as well as for the <y.tab.h> file. The yacc program generates the <y.tab.h>
       file from the yacc grammar file information, if you use the -d option with the yacc command. The file <y.tab.h>	contains  definitions  for
       the tokens that the parser program uses.

       Contents of calc.1: %{

       #include  <stdio.h> #include "y.tab.h" int c; #if !defined (YYSTYPE) #define YYSTYPE long #endif extern YYSTYPE yylval; %} %% " "     ; [a-
       z]   {
		      c = yytext[0];
		      yylval = c - 'a';
		      return(LETTER);
	       } [0-9]	 {
		      c = yytext[0];
		      yylval = c - '0';
		      return(DIGIT);
	       } [^a-z 0-9]	 {
		       c = yytext[0];
		       return(c);
		       }

ENVIRONMENT VARIABLES
       The following environment variables affect the execution of yacc: Provides a default value for the internationalization variables that  are
       unset or null. If LANG is unset or null, the corresponding value from the default locale is used.  If any of the internationalization vari-
       ables contain an invalid setting, the utility behaves as if none of the variables had been defined.  If set to a  non-empty  string  value,
       overrides  the  values of all the other internationalization variables.	Determines the locale for the interpretation of sequences of bytes
       of text data as characters (for example, single-byte as opposed to multi-byte characters in arguments and  input  files).   Determines  the
       locale  for  the format and contents of diagnostic messages written to standard error.  Determines the location of message catalogs for the
       processing of LC_MESSAGES.

FILES
       A readable description of parsing tables and a report on conflicts generated by grammar ambiguities Output file Definitions for token names
       Temporary file Temporary file Temporary file Default skeleton parser for C programs The yacc library

SEE ALSO
       Commands:  lex(1)

       Standards:  standards(5)

       Programming Support Tools

																	   yacc(1)
osf1 man page for yacc