ANTLR(1) PCCTS Manual Pages ANTLR(1)
antlr - ANother Tool for Language Recognition
antlr [options] grammar_files
Antlr converts an extended form of context-free grammar into a set of C functions which
directly implement an efficient form of deterministic recursive-descent LL(k) parser.
Context-free grammars may be augmented with predicates to allow semantics to influence
parsing; this allows a form of context-sensitive parsing. Selective backtracking is also
available to handle non-LL(k) and even non-LALR(k) constructs. Antlr also produces a def-
inition of a lexer which can be automatically converted into C code for a DFA-based lexer
by dlg. Hence, antlr serves a function much like that of yacc, however, it is notably
more flexible and is more integrated with a lexer generator (antlr directly generates dlg
code, whereas yacc and lex are given independent descriptions). Unlike yacc which accepts
LALR(1) grammars, antlr accepts LL(k) grammars in an extended BNF notation -- which elimi-
nates the need for precedence rules.
Like yacc grammars, antlr grammars can use automatically-maintained symbol attribute val-
ues referenced as dollar variables. Further, because antlr generates top-down parsers,
arbitrary values may be inherited from parent rules (passed like function parameters).
Antlr also has a mechanism for creating and manipulating abstract-syntax-trees.
There are various other niceties in antlr, including the ability to spread one grammar
over multiple files or even multiple grammars in a single file, the ability to generate a
version of the grammar with actions stripped out (for documentation purposes), and lots
-ck n Use up to n symbols of lookahead when using compressed (linear approximation)
lookahead. This type of lookahead is very cheap to compute and is attempted before
full LL(k) lookahead, which is of exponential complexity in the worst case. In
general, the compressed lookahead can be much deeper (e.g, -ck 10) than the full
lookahead (which usually must be less than 4).
-CC Generate C++ output from both ANTLR and DLG.
-cr Generate a cross-reference for all rules. For each rule, print a list of all other
rules that reference it.
-e1 Ambiguities/errors shown in low detail (default).
-e2 Ambiguities/errors shown in more detail.
-e3 Ambiguities/errors shown in excruciating detail.
Rename err.c to file.
Rename stdpccts.h header (turns on -gh) to file.
Rename lexical output, parser.dlg, to file.
Rename file with lexical mode definitions, mode.h, to file.
Rename file which remaps globally visible symbols, remap.h, to file.
Rename tokens.h to file.
-ga Generate ANSI-compatible code (default case). This has not been rigorously tested
to be ANSI XJ11 C compliant, but it is close. The normal output of antlr is cur-
rently compilable under both K&R, ANSI C, and C++--this option does nothing because
antlr generates a bunch of #ifdef's to do the right thing depending on the lan-
-gc Indicates that antlr should generate no C code, i.e., only perform analysis on the
-gd C code is inserted in each of the antlr generated parsing functions to provide for
user-defined handling of a detailed parse trace. The inserted code consists of
calls to the user-supplied macros or functions called zzTRACEIN and zzTRACEOUT.
The only argument is a char * pointing to a C-style string which is the grammar
rule recognized by the current parsing function. If no definition is given for the
trace functions, upon rule entry and exit, a message will be printed indicating
that a particular rule as been entered or exited.
-ge Generate an error class for each non-terminal.
-gh Generate stdpccts.h for non-ANTLR-generated files to include. This file contains
all defines needed to describe the type of parser generated by antlr (e.g. how much
lookahead is used and whether or not trees are constructed) and contains the header
action specified by the user.
-gk Generate parsers that delay lookahead fetches until needed. Without this option,
antlr generates parsers which always have k tokens of lookahead available.
-gl Generate line info about grammar actions in C parser of the form # line "file"
which makes error messages from the C/C++ compiler make more sense as they will
point into the grammar file not the resulting C file. Debugging is easier as well,
because you will step through the grammar not C file.
-gs Do not generate sets for token expression lists; instead generate a ||-separated
sequence of LA(1)==token_number. The default is to generate sets.
-gt Generate code for Abstract-Syntax Trees.
-gx Do not create the lexical analyzer files (dlg-related). This option should be
given when the user wishes to provide a customized lexical analyzer. It may also
be used in make scripts to cause only the parser to be rebuilt when a change not
affecting the lexical structure is made to the input grammars.
-k n Set k of LL(k) to n; i.e. set tokens of look-ahead (default==1).
-o dir Directory where output files should go (default="."). This is very nice for keep-
ing the source directory clear of ANTLR and DLG spawn.
-p The complete grammar, collected from all input grammar files and stripped of all
comments and embedded actions, is listed to stdout. This is intended to aid in
viewing the entire grammar as a whole and to eliminate the need to keep actions
concisely stated so that the grammar is easier to read. Hence, it is preferable to
embed even complex actions directly in the grammar, rather than to call them as
subroutines, since the subroutine call overhead will be saved.
-pa This option is the same as -p except that the output is annotated with the first
sets determined from grammar analysis.
Turn on the computation and hoisting of predicate context.
Turn off the computation and hoisting of predicate context. This option makes 1.10
behave like the 1.06 release with option -pr on. Context computation is off by
-rl n Limit the maximum number of tree nodes used by grammar analysis to n. Occasion-
ally, antlr is unable to analyze a grammar submitted by the user. This rare situa-
tion can only occur when the grammar is large and the amount of lookahead is
greater than one. A nonlinear analysis algorithm is used by PCCTS to handle the
general case of LL(k) parsing. The average complexity of analysis, however, is
near linear due to some fancy footwork in the implementation which reduces the num-
ber of calls to the full LL(k) algorithm. An error message will be displayed, if
this limit is reached, which indicates the grammar construct being analyzed when
antlr hit a non-linearity. Use this option if antlr seems to go out to lunch and
your disk start thrashing; try n=10000 to start. Once the offending construct has
been identified, try to remove the ambiguity that antlr was trying to overcome with
large lookahead analysis. The introduction of (...)? backtracking blocks elimi-
nates some of these problems -- antlr does not analyze alternatives that begin with
(...)? (it simply backtracks, if necessary, at run time).
-w1 Set low warning level. Do not warn if semantic predicates and/or (...)? blocks are
assumed to cover ambiguous alternatives.
-w2 Ambiguous parsing decisions yield warnings even if semantic predicates or (...)?
blocks are used. Warn if predicate context computed and semantic predicates incom-
pletely disambiguate alternative productions.
- Read grammar from standard input and generate stdin.c as the parser file.
Antlr works... we think. There is no implicit guarantee of anything. We reserve no
legal rights to the software known as the Purdue Compiler Construction Tool Set (PCCTS) --
PCCTS is in the public domain. An individual or company may do whatever they wish with
source code distributed with PCCTS or the code generated by PCCTS, including the incorpo-
ration of PCCTS, or its output, into commercial software. We encourage users to develop
software with PCCTS. However, we do ask that credit is given to us for developing PCCTS.
By "credit", we mean that if you incorporate our source code into one of your programs
(commercial product, research project, or otherwise) that you acknowledge this fact some-
where in the documentation, research report, etc... If you like PCCTS and have developed
a nice tool with the output, please mention that you developed it using PCCTS. As long as
these guidelines are followed, we expect to continue enhancing this system and expect to
make other tools available as they are completed.
*.c output C parser.
*.cpp output C++ parser when C++ mode is used.
output dlg lexical analyzer.
err.c token string array, error sets and error support routines. Not used in C++ mode.
file that redefines all globally visible parser symbols. The use of the #parser
directive creates this file. Not used in C++ mode.
list of definitions needed by C files, not generated by PCCTS, that reference PCCTS
objects. This is not generated by default. Not used in C++ mode.
output #defines for tokens used and function prototypes for functions generated for
ANTLR September 1995 ANTLR(1)