Query: lex
OS: opensolaris
Section: 1
Format: Original Unix Latex Style Formatted with HTML and a Horizontal Scroll Bar
lex(1) User Commands lex(1)NAMElex - generate programs for lexical tasksSYNOPSISlex [-cntv] [-e | -w] [-V -Q [y | n]] [file]...DESCRIPTIONThe lex utility generates C programs to be used in lexical processing of character input, and that can be used as an interface to yacc. The C programs are generated from lex source code and conform to the ISO C standard. Usually, the lex utility writes the program it generates to the file lex.yy.c. The state of this file is unspecified if lex exits with a non-zero exit status. See EXTENDED DESCRIPTION for a com- plete description of the lex input language.OPTIONSThe following options are supported: -c Indicates C-language action (default option). -e Generates a program that can handle EUC characters (cannot be used with the -w option). yytext[] is of type unsigned char[]. -n Suppresses the summary of statistics usually written with the -v option. If no table sizes are specified in the lex source code and the -v option is not specified, then -n is implied. -t Writes the resulting program to standard output instead of lex.yy.c. -v Writes a summary of lex statistics to the standard error. (See the discussion of lex table sizes under the heading Definitions in lex.) If table sizes are specified in the lex source code, and if the -n option is not specified, the -v option may be enabled. -w Generates a program that can handle EUC characters (cannot be used with the -e option). Unlike the -e option, yytext[] is of type wchar_t[]. -V Prints out version information on standard error. -Q[y|n] Prints out version information to output file lex.yy.c by using -Qy. The -Qn option does not print out version information and is the default.OPERANDSThe following operand is supported: file A pathname of an input file. If more than one such file is specified, all files will be concatenated to produce a single lex pro- gram. If no file operands are specified, or if a file operand is -, the standard input will be used.OUTPUTThe lex output files are described below. Stdout If the -t option is specified, the text file of C source code output of lex will be written to standard output. Stderr If the -t option is specified informational, error and warning messages concerning the contents of lex source code input will be written to the standard error. If the -t option is not specified: 1. Informational error and warning messages concerning the contents of lex source code input will be written to either the standard output or standard error. 2. If the -v option is specified and the -n option is not specified, lex statistics will also be written to standard error. These statistics may also be generated if table sizes are specified with a % operator in the Definitions in lex section (see EXTENDED DESCRIPTION), as long as the -n option is not specified. Output Files A text file containing C source code will be written to lex.yy.c, or to the standard output if the -t option is present.EXTENDED DESCRIPTIONEach input file contains lex source code, which is a table of regular expressions with corresponding actions in the form of C program frag- ments. When lex.yy.c is compiled and linked with the lex library (using the -l l operand with c89 or cc), the resulting program reads character input from the standard input and partitions it into strings that match the given expressions. When an expression is matched, these actions will occur: o The input string that was matched is left in yytext as a null-terminated string; yytext is either an external character array or a pointer to a character string. As explained in Definitions in lex, the type can be explicitly selected using the %array or %pointer declarations, but the default is %array. o The external int yyleng is set to the length of the matching string. o The expression's corresponding program fragment, or action, is executed. During pattern matching, lex searches the set of patterns for the single longest possible match. Among rules that match the same number of characters, the rule given first will be chosen. The general format of lex source is: Definitions %% Rules %% User Subroutines The first %% is required to mark the beginning of the rules (regular expressions and actions); the second %% is required only if user sub- routines follow. Any line in the Definitions in lex section beginning with a blank character will be assumed to be a C program fragment and will be copied to the external definition area of the lex.yy.c file. Similarly, anything in the Definitions in lex section included between delimiter lines containing only %{ and %} will also be copied unchanged to the external definition area of the lex.yy.c file. Any such input (beginning with a blank character or within %{ and %} delimiter lines) appearing at the beginning of the Rules section before any rules are specified will be written to lex.yy.c after the declarations of variables for the yylex function and before the first line of code in yylex. Thus, user variables local to yylex can be declared here, as well as application code to execute upon entry to yylex. The action taken by lex when encountering any input beginning with a blank character or within %{ and %} delimiter lines appearing in the Rules section but coming after one or more rules is undefined. The presence of such input may result in an erroneous definition of the yylex function. Definitions in lex Definitions in lex appear before the first %% delimiter. Any line in this section not contained between %{ and %} lines and not beginning with a blank character is assumed to define a lex substitution string. The format of these lines is: name substitute If a name does not meet the requirements for identifiers in the ISO C standard, the result is undefined. The string substitute will replace the string { name } when it is used in a rule. The name string is recognized in this context only when the braces are provided and when it does not appear within a bracket expression or within double-quotes. In the Definitions in lex section, any line beginning with a % (percent sign) character and followed by an alphanumeric word beginning with either s or S defines a set of start conditions. Any line beginning with a % followed by a word beginning with either x or X defines a set of exclusive start conditions. When the generated scanner is in a %s state, patterns with no state specified will be also active; in a %x state, such patterns will not be active. The rest of the line, after the first word, is considered to be one or more blank-character-sepa- rated names of start conditions. Start condition names are constructed in the same way as definition names. Start conditions can be used to restrict the matching of regular expressions to one or more states as described in Regular expressions in lex. Implementations accept either of the following two mutually exclusive declarations in the Definitions in lex section: %array Declare the type of yytext to be a null-terminated character array. %pointer Declare the type of yytext to be a pointer to a null-terminated character string. Note: When using the %pointer option, you may not also use the yyless function to alter yytext. %array is the default. If %array is specified (or neither %array nor %pointer is specified), then the correct way to make an external ref- erence to yyext is with a declaration of the form: extern char yytext[] If %pointer is specified, then the correct external reference is of the form: extern char *yytext; lex will accept declarations in the Definitions in lex section for setting certain internal table sizes. The declarations are shown in the following table. Table Size Declaration in lex +-------------------------------------------------------------------+ | Declaration Description Default | +-------------------------------------------------------------------+ |%pn Number of positions 2500 | |%nn Number of states 500 | |%a n Number of transitions 2000 | |%en Number of parse tree nodes 1000 | |%kn Number of packed character classes 10000 | |%on Size of the output array 3000 | +-------------------------------------------------------------------+ Programs generated by lex need either the -e or -w option to handle input that contains EUC characters from supplementary codesets. If nei- ther of these options is specified, yytext is of the type char[], and the generated program can handle only ASCII characters. When the -e option is used, yytext is of the type unsigned char[] and yyleng gives the total number of bytes in the matched string. With this option, the macros input(), unput(c), and output(c) should do a byte-based I/O in the same way as with the regular ASCII lex. Two more variables are available with the -e option, yywtext and yywleng, which behave the same as yytext and yyleng would under the -w option. When the -w option is used, yytext is of the type wchar_t[] and yyleng gives the total number of characters in the matched string. If you supply your own input(), unput(c), or output(c) macros with this option, they must return or accept EUC characters in the form of wide character (wchar_t). This allows a different interface between your program and the lex internals, to expedite some programs. Rules in lex The Rules in lex source files are a table in which the left column contains regular expressions and the right column contains actions (C program fragments) to be executed when the expressions are recognized. ERE action ERE action ... The extended regular expression (ERE) portion of a row will be separated from action by one or more blank characters. A regular expression containing blank characters is recognized under one of the following conditions: o The entire expression appears within double-quotes. o The blank characters appear within double-quotes or square brackets. o Each blank character is preceded by a backslash character. User Subroutines in lex Anything in the user subroutines section will be copied to lex.yy.c following yylex. Regular Expressions in lex The lex utility supports the set of Extended Regular Expressions (EREs) described on regex(5) with the following additions and exceptions to the syntax: ... Any string enclosed in double-quotes will represent the characters within the double-quotes as themselves, except that back- slash escapes (which appear in the following table) are recognized. Any backslash-escape sequence is terminated by the clos- ing quote. For example, "