flex(1) [centos man page]

FLEX(1) 							   User Commands							   FLEX(1)

NAME

       flex - the fast lexical analyser generator

SYNOPSIS

       flex [OPTIONS] [FILE]...

DESCRIPTION

       Generates programs that perform pattern-matching on text.

   Table Compression:
       -Ca, --align
	      trade off larger tables for better memory alignment

       -Ce, --ecs
	      construct equivalence classes

       -Cf    do not compress tables; use -f representation

       -CF    do not compress tables; use -F representation

       -Cm, --meta-ecs
	      construct meta-equivalence classes

       -Cr, --read
	      use read() instead of stdio for scanner input

       -f, --full
	      generate fast, large scanner. Same as -Cfr

       -F, --fast
	      use alternate table representation. Same as -CFr

       -Cem   default compression (same as --ecs --meta-ecs)

   Debugging:
       -d, --debug
	      enable debug mode in scanner

       -b, --backup
	      write backing-up information to lex.backup

       -p, --perf-report
	      write performance report to stderr

       -s, --nodefault
	      suppress default rule to ECHO unmatched text

       -T, --trace
	      flex should run in trace mode

       -w, --nowarn
	      do not generate warnings

       -v, --verbose
	      write summary of scanner statistics to stdout

   Files:
       -o, --outfile=FILE
	      specify output filename

       -S, --skel=FILE
	      specify skeleton file

       -t, --stdout
	      write scanner on stdout instead of lex.yy.c

       --yyclass=NAME
	      name of C++ class

       --header-file=FILE
	      create a C header file in addition to the scanner

       --tables-file[=FILE] write tables to FILE

   Scanner behavior:
       -7, --7bit
	      generate 7-bit scanner

       -8, --8bit
	      generate 8-bit scanner

       -B, --batch
	      generate batch scanner (opposite of -I)

       -i, --case-insensitive
	      ignore case in patterns

       -l, --lex-compat
	      maximal compatibility with original lex

       -X, --posix-compat
	      maximal compatibility with POSIX lex

       -I, --interactive
	      generate interactive scanner (opposite of -B)

       --yylineno
	      track line count in yylineno

   Generated code:
       -+,  --c++
	      generate C++ scanner class

       -Dmacro[=defn]
	      #define macro defn  (default defn is '1')

       -L,  --noline
	      suppress #line directives in scanner

       -P,  --prefix=STRING
	      use STRING as prefix instead of "yy"

       -R,  --reentrant
	      generate a reentrant C scanner

       --bison-bridge
	      scanner for bison pure parser.

       --bison-locations
	      include yylloc support.

       --stdinit
	      initialize yyin/yyout to stdin/stdout

       --noansi-definitions old-style function definitions

       --noansi-prototypes
	      empty parameter list in prototypes

       --nounistd
	      do not include <unistd.h>

       --noFUNCTION
	      do not generate a particular FUNCTION

   Miscellaneous:
       -c     do-nothing POSIX option

       -n     do-nothing POSIX option

       -?

       -h, --help
	      produce this help message

       -V, --version
	      report flex version

SEE ALSO

       The  full documentation for flex is maintained as a Texinfo manual.  If the info and flex programs are properly installed at your site, the
       command

	      info flex

       should give you access to the complete manual.

flex 2.5.36							     July 2012								   FLEX(1)

Check Out this Related Man Page

flex(1) 						      General Commands Manual							   flex(1)

NAME

       flex - Generates a C Language lexical analyzer

SYNOPSIS

       flex [-bcdfinpstvFILT8] -C[efmF] [-Sskeleton] [file...]

OPTIONS

       Generates backtracking information to lex.backtrack. This is a list of scanner states that require backtracking and the input characters on
       which they do so.  By adding rules you can remove backtracking states.  If all backtracking states are eliminated and -f or -F is used, the
       generated  scanner  will  run  faster.	Makes  the  generated  scanner run in debug mode.  Whenever a pattern is recognized and the global
       yy_lex_debug is nonzero (which is the default), the scanner writes to stderr a line of the form:

	      --accepting rule at line 53 ("the matched text")

	      The line number refers to the location of the rule in the file defining the scanner (the input to lex).  Messages are also generated
	      when  the  scanner  backtracks,  accepts the default rule, reaches the end of its input buffer (or encounters a NULL), or reaches an
	      End-of-File.  Specifies full table (no table compression is done). The result is large but fast. This option is equivalent  to  -Cf.
	      Instructs  flex  to  generate a case-insensitive scanner.  The case of letters given in the flex input patterns will be ignored, and
	      tokens in the input will be matched regardless of case.  The matched text given in yytext will have the original case  (as  read	by
	      the  scanner).  Generates a performance report to stderr.  This identifies features of the flex input file that will cause a loss of
	      performance in the resulting scanner.  Causes the default rule (that unmatched scanner input is echoed to stdout) to be  suppressed.
	      If  the scanner encounters input that does not match any of its rules, it aborts with an error.  Instructs flex to write the scanner
	      it generates to standard output instead of lex.yy.c.  Specifies that flex should write to stderr a summary of  statistics  regarding
	      the  scanner  it	generates.   Specifies that the fast scanner table representation should be used.  This representation is about as
	      fast as the full table representation (-f), and for some sets of patterns will be considerably smaller  (and  for  others,  larger).
	      This  option  is	equivalent  to	-CF.  Instructs flex to generate an interactive scanner; that is, a scanner that stops immediately
	      rather than looking ahead if it knows that the currently scanned text cannot be part of a longer rule's match. Note,  -I	cannot	be
	      used in conjunction with full or fast tables; that is, the -f, -F, -Cf, or -CF options.  Instructs flex not to generate #line direc-
	      tives in lex.yy.c. The default is to generate such directives so error messages in  the  actions	will  be  correctly  located  with
	      respect  to the original lex input file.	Makes flex run in trace mode.  It will generate a lot of messages to stdout concerning the
	      form of the input and the resultant nondeterministic and deterministic finite automata.  This option is mostly for use in  maintain-
	      ing  flex.   Instructs  flex  to	generate  an  8-bit scanner (which is the default).  Controls the degree of table compression. The
	      default setting is -Cem which provides the highest degree of table compression.  Faster-executing scanners can be traded off at  the
	      cost of larger tables with the following generally being true:

	      Slowest and smallest

	      -Cem -Cm -Ce -C -C{f,F}e -C{f,F}

	      Fastest and largest

	      The -C options are not cumulative; whenever the option is encountered, the previous -C settings are forgotten.  The -f or -F and -Cm
	      options do not make sense together; there is no opportunity for meta-equivalence classes if the table is not being compressed.  Oth-
	      erwise,  the  options may be freely mixed.  A lone -C specifies that the scanner tables should be compressed and neither equivalence
	      classes nor meta-equivalence classes should be used.  Directs flex to construct equivalence classes; for example, sets of characters
	      that  have  identical  lexical properties. Equivalence classes usually give dramatic reductions in the final table/object file sizes
	      (typically a factor of 2 to 5) and are inexpensive performance-wise (one array look-up per character scanned).  Directs flex to con-
	      struct  meta-equivalence	classes,  which are sets of equivalence classes (or characters, if equivalence classes are not being used)
	      that are commonly used together.	Meta-equivalence classes are often a big win when using compressed tables, but they have a  moder-
	      ate  performance impact (one or two "if" tests and one array look-up per character scanned).  Specifies that the full scanner tables
	      should be generated; flex should not compress the tables by taking advantage of similar transition functions for	different  states.
	      Specifies that the alternative fast scanner representation should be used.  Overrides the default skeleton file from which flex con-
	      structs its scanners.  This is useful for flex maintenance or development.  Specifies table-compression options.	(Obsolescent) Sup-
	      presses the statistics summaries that the -v option typically generates.	(Obsolete)

DESCRIPTION

       The  flex  command  is  a tool for generating scanners: programs which recognize lexical patterns in text. The flex command reads the given
       input files, or its standard input if no filenames are given or if a file operand is - (dash) for a description of a scanner  to  generate.
       The  description  is in the form of pairs of regular expressions and C code, called rules.  The flex command generates as output a C source
       file, lex.yy.c, which defines a routine yylex(). This file is compiled and linked with the -ll library to produce an executable.  When  the
       executable  is  run,  it  scans	its input and the regular expressions in its rules looking for the best match (longest input). When it has
       selected a rule it executes the associated C code which has access to the matched input sequence (commonly referred to as  a  token).  This
       process then repeats until input is exhausted.

       The flex command treats multiple input files as one.

   Syntax for Input
       This  section  contains a description of the flex input file, which is normally named with a suffix.  The section provides a listing of the
       special values, macros, and functions recognized by flex.

       The flex input file consists of three sections, separated by a line with just %% in it:

       [ definitions ] %% [ rules ] [ %% [ user functions ]]

       Contains declarations to simplify the scanner specification, and declarations of start states which are explained  below.   Describes  what
       the scanner is to do.  Contains user-supplied functions that copied straight through to lex.yy.c.

	      With the exception of the first %% sequence all sections are optional. The minimal scanner %%, copies its input to standard output.

       Each line in the definitions section can be: Defines name to expand to regexp.  name is a word beginning with a letter or an underscore (_)
       followed by zero or more letters, digits, underscores or dashes (-). In the regular-expression parts of the rules section, flex substitutes
       regexp  wherever  you  refer to {name} (name within braces).  Defines names for states used in the rules section. A rule may be made condi-
       tionally active based on the current scanner state. Multiple lines defining states can appear, and each can contain multiple  state  names,
       separated  by  white space. The name of a state follows the same syntax as that of regexp names except that dashes ('-') are not permitted.
       Unlike regexp names, state names share the C #define namespace. In the rules section states are recognized as <state> (state  within  angle
       brackets).

	      The  %x  directive names exclusive states.  When a scanner is in an exclusive state, only rules prefixed with that state are active.
	      Inclusive states are named with the %s directive.  When placed on lines by themselves, these symbols enclose C  code  to	be  passed
	      verbatim	into  the  global definitions of the output file.  Such lines commonly include preprocessor directives and declarations of
	      external variables and functions.  Lines beginning with a space or tab in the definitions  section  are  passed  directly  into  the
	      lex.yy.c output file, as part of the initial global definitions.

       The  rules  section follows the definitions, separated by a line consisting of %%.  The rules section contains rules for matching input and
       taking actions, in the following format: pattern [action]

       The pattern starts in the first column of the line and extends until the first non-escaped white space character. The flex command attempts
       to find the pattern that matches the longest input sequence and execute the associated action. If two or more patterns match the same input
       the one which appears first in the rules section is chosen. If no action exists the matched input is discarded. If no pattern  matches  the
       input the default is to copy it to standard output.

       All  action code is placed in the yylex() function. Text (C code or declarations) placed at the beginning of the rules section is copied to
       the beginning of the yylex() function and may be used in actions. This text must begin with a space  or	a  tab	(to  distinguish  it  from
       rules).	 In  addition, any input (beginning with a space or within %{ and %} delimiter lines) appearing at the beginning of the rules sec-
       tion before any rules are specified will be written to lex.yy.c after the declarations of variables for the yylex() function and before the
       first line of code in yylex().

       Elements  of  each rule are: A pattern may begin with a comma separated list of state names enclosed by angle brackets (< state [,state...]
       >).  These states are entered via the BEGIN statement. If a pattern begins with a state, the scanner can only recognize	it  when  in  that
       state.	The  initial state is 0 (zero).  A regular expression to match against the input stream. The regular expressions in flex provide a
       rich character matching syntax.

	      The following characters, shown in order of decreasing precedence have special meanings: Matches the character x.   Enclose  charac-
	      ters  and treat them as literal strings.	For example, "*+" is treated as the asterisk character followed by the plus character.	If
	      str is one of the characters a, b, f, n, r, t, or v, then the ANSI C interpretation is adopted (for example, 
 is a  newline).	If
	      str is a string of octal digits it is interpreted as a character with octal value str. If str is a string of hexadecimal digits with
	      a leading x it is interpreted as a character with that value. Otherwise, it is interpreted literally with no  special  meaning.  For
	      example,	x*yz  represents  the	four characters x*yz.  Represents a character class in the enclosed range ([.-.])  or the enclosed
	      list ([...]). The dash character is used to define a range of characters from the ASCII value or the 8-bit class	of  the  character
	      that  comes before it to the ASCII value or the 8-bit class of the character that follows it. For example, [abcx-z] matches a, b, c,
	      x, y, or z.

	      The circumflex when it appears as the first character in a character class, indicates the complement of the set of characters within
	      that  class.  For example, [^abc] matches any character except a, b or c, including special characters like newline.  Groups regular
	      expressions. For example, (ab) will be considered as a single regular expression.  When enclosing numbers,  indicates  a	number	of
	      consecutive  occurrences	of  the  expression that comes before it.  For example, (ab){1,5} indicates a match for from 1 to 5 occur-
	      rences of the string ab.

	      When enclosing a name, the name represents a regular expression defined in the definitions section. For example, {digit} is replaced
	      by  the defined regular expression for digit. Note that the expansion takes place as if the definition were enclosed in parentheses.
	      Matches any single character except newline.  Matches zero or one of the preceding expressions. For example, ab?c  matches  both	ac
	      and  abc.  Matches zero or more of the preceding expressions. For example, a* is zero or more consecutive a characters.  The utility
	      of matching zero occurrences is more obvious in complicated expressions.	For example, the  expression,  [A-Za-z][A-Za-z0-9]*  indi-
	      cates  all  alphanumeric	strings  with  a  leading  alphabetic character, including strings that are only one alphabetic character.
	      Matches one or more of the preceding expressions. For example, [a-z]+ is all strings of lowercase letters.  Matches the expression x
	      followed	by  the  expression  y.   Matches  either the preceding expression or the following expression.  For example, a(br matches
	      either ab or cd.	Matches expression x only if expression y (trailing context) immediately follows it. For  example,  ab/cd  matches
	      the  string  ab but only if followed by cd. Only one trailing context is permitted per pattern.  When it appears at the beginning of
	      the pattern matches the beginning of a line. For example, ^abc will match the string abc if it is found at the beginning of a  line.
	      When  it appears at the end of a pattern matches the end of a line. It is equivalent to /
. For example, abc$ will match the string
	      abc if it is found at the end of a line.	Matches an End-of-File.  Identifies a state name (see above) and may only  appear  at  the
	      beginning of a pattern. For example, <done><<EOF>> matches an End-of-File, but only if it is in state done.

	      In  addition,  the  following  rules  apply for bracket expressions: These represent the set of collating elements in an equivalence
	      class and are enclosed within bracket-equal delimiters ([= =]). An equivalence class generally is designed to deal with primary-sec-
	      ondary  sorting;	that  is,  for languages like French that define groups of characters as sorting to the same primary location, and
	      then have a tie-breaking, secondary sort. For example, if a, `, and ^ belong to the same equivalence class, then [[=a=]b], [[=`=]b],
	      and  [[=^=]b]  are  each	equivalent  to [a`^b].	These represent the set of characters in the current locale belonging to the named
	      ctype class. These are expressed as a ctype class name enclosed in bracket-colon delimiters ([: :]).

	      In the C or POSIX locale,  this  operating  system  supports  the  following  character  class  expressions:  [:alpha:],	[:upper:],
	      [:lower:], [:digit:], [:alnum:], [:xdigit:], [:space:], [:print:], [:punct:], [:graph:], [:cntrl:].

	      Other locales may define additional character classes.

	      Letters  and  digits  never  have special meanings.  A character such as ^ or -, which has a special meaning in particular contexts,
	      refers simply to itself when found outside that context.	Spaces and tabs must be escaped to appear in a regular expression;  other-
	      wise  they  indicate  the  end  of  the expression.  Each pattern in a rule has a corresponding action, which can be any arbitrary C
	      statement. The pattern ends at the first non-escaped white space character; the remainder of the line is its action. If  the  action
	      is empty, then when the pattern is matched the input which matched it is discarded.

	      If  the  action  contains a {, then the action spans till the balancing } is found, and the action may cross multiple lines. Using a
	      return statement in an action returns from yylex().

	      An action consisting solely of a vertical bar (|) means same as the action for the next rule.

	      The flex variables which can be used within actions are: A string (char *) containing the current matched input. It cannot be  modi-
	      fied.   The  length  (int)  of  the  current matched input. It cannot be modified.  A stream (FILE *) that flex reads from (stdin by
	      default). It may be changed but because of the buffering flex uses this makes sense only before scanning begins. Once scanning  ter-
	      minates  because	an End-of-File was seen, void yyrestart (FILE *new_file) may be called to point yyin at a new input file. Alterna-
	      tively, yyin may be changed whenever a new or different buffer is selected (see yy_switch_to_buffer()).  A stream (FILE *) to  which
	      ECHO  output  is	written (stdout by default). It can be changed by the user.  Returns the current buffer (YY_BUFFER_STATE) used for
	      scanner input.

	      The flex command macros and functions that may be used within actions are: Copies yytext to the scanner's output.  Changes the scan-
	      ner state to be state.  This affects which rules are active. The state must be defined in a %s, or %x definition.  The initial state
	      of the scanner is INITIAL or 0 (zero).  Directs the scanner to proceed immediately to the next best pattern that matches	the  input
	      (which may be a prefix of the current match).  yytext and yyleng are reset appropriately.  Note that REJECT is a particularly expen-
	      sive feature in terms of scanner performance; if it is used in any of the scanner's actions, it will slow down all of the  scanner's
	      pattern  matching  operations.   REJECT  cannot  be  used  if flex is invoked with either -f or -F options.  Indicates that the next
	      matched text should be appended to the currently matched text in yytext (rather than replace it).  Returns all but the first n char-
	      acters  of  the  current	token  back  to  the input stream, where they will be rescanned when the scanner looks for the next match.
	      yytext and yyleng are adjusted accordingly.  Returns 0 (zero) if there is more input to scan or 1  if  there  is	not.  The  default
	      yywrap()	always returns 1. Currently it is implemented as a macro, however in future implementations it may become a function.  Can
	      be used in lieu of a return statement in an action.  It terminates the scanner and returns a 0 (zero) to the scanner's caller.

	      yyterminate() is automatically called when an End-of-File is encountered. It is a macro and may be  redefined.   Returns	a  YY_BUF-
	      FER_STATE  handle  to  a	new  input  buffer large enough to accommodate size characters and associated with the given file. When in
	      doubt, use YY_BUF_SIZE for the size.  Switches the scanner's processing to scan for tokens from the given buffer, which  must  be  a
	      YY_BUFFER_STATE.	 Deletes  the  given  buffer.	Enables scanning to continue after yyin has been pointed at a new file to process.
	      Controls how the scanning function, yylex() is declared. By default, it is int yylex(),  or,  if	prototypes  are  being	used,  int
	      yylex(void).   This  definition may be changed by redefining the YY_DECL macro.  This macro is expanded immediately before the {...}
	      (braces) that delimit the scanner function body.	Controls scanner input. By default, YY_INPUT reads  from  the  file-pointer  yyin.
	      Its  action  is  to  place up to max_size characters in the character array buf and return in the integer variable result either the
	      number of characters read or the constant YY_NULL to indicate EOF. Following is a sample redefinition of YY_INPUT,  in  the  defini-
	      tions section of the input file:

	      %{ #undef YY_INPUT #define YY_INPUT(buf,result,max_size)
		 {
		     int c = getchar();
		     result = (c == EOF) ? YY_NULL : (buf[0] = c, 1);
		 } %}

	      When  the scanner receives an End-of-File indication from YY_INPUT, it checks the yywrap() function. If yywrap() returns zero, it is
	      assumed that the yyin has been set up to point to another input file, and scanning continues. If it returns non-zero, then the scan-
	      ner  terminates,	returning zero to its caller.  Redefinable to provide an action which is always executed prior to the matched pat-
	      tern's action.  Redefinable to provide an action which is always executed before the first scan.	Is used in the scanner to separate
	      different actions. By default, it is simply a break, but may be redefined if necessary.

       The user functions section consists of complete C functions, which are passed directly into the lex.y.cc output file (the effect is similar
       to defining the functions in separate files and linking them with lex.y.cc).  This section is separated from the rules section  by  the	%%
       delimiter.

       Comments,  in  C syntax, can appear anywhere in the user functions or definitions sections.  In the rules section, comments can be embedded
       within actions. Empty lines or lines consisting of white space are ignored.

       The following macros are not normally called explicitly within an action, but are used internally by flex to handle the	input  and  output
       streams.   Reads  the  next  character from the input stream. You cannot redefine input().  Writes the next character to the output stream.
       Puts the character c back onto the input stream. It will be the next character scanned. You cannot redefine unput().

	      The libl.a contains default functions to support testing or quick use of a flex program without yacc; these functions can be  linked
	      in  through  -ll.  They can also be provided by the user.  A simple wrapper that simply calls setlocale() and then calls the yylex()
	      function.  The function called when the scanner reaches the end of an input stream.  The default definition simply returns 1,  which
	      causes the scanner in turn to return 0 (zero).

NOTES

       Some trailing context patterns cannot be properly matched and generate warning messages

	      Dangerous trailing context

	      These  are  patterns where the ending of the first part of the rule matches the beginning of the second part, such as zx*/xy*, where
	      the x* matches the x at the beginning of the trailing context.  For some trailing context  rules,  parts	that  are  actually  fixed
	      length  are not recognized as such, leading to the previously mentioned performance loss. In particular, patterns using {n} (such as
	      test{3}) are always considered variable length.

	      Combining trailing context with the special | (vertical bar) action can result in fixed trailing context being turned into the  more
	      expensive variable trailing context.  This happens in the following example:

	      %% abc| xyz/def Use of unput() invalidates the contents of yytext and yyleng within the current flex action.  Use of unput() to push
	      back more text than was matched can result in the pushed-back text matching a beginning-of-line (^) rule even though it did not come
	      at  the  beginning of the line.  Pattern matching of NULLs is substantially slower than matching other characters.  The flex command
	      does not generate correct #line directives for code internal to the scanner; thus, bugs in flex.skel  yield  invalid  line  numbers.
	      Due  to  both  buffering	of input and read-ahead, you cannot intermix calls to <stdio.h> routines, such as, for example, getchar(),
	      with flex rules and expect it to work.  Call input() instead.  The total table entries listed by the -v option excludes  the  number
	      of  table  entries needed to determine what rule was matched.  The number of entries is equal to the number of deterministic finite-
	      state automaton (DFA) states if the scanner does not use REJECT, and somewhat greater than the number of states if it does.   REJECT
	      cannot be used with the -f or -F options.

EXAMPLES

       The following command processes the file lexcommands to produce the scanner file lex.yy.c: flex lexcommands

	      This is then compiled and linked by the command: cc -oscanner lex.yy.c -ll

	      This  produces a program scanner.  The scanner program converts uppercase to lowercase letters, removes spaces at the end of a line,
	      and replaces multiple spaces with single spaces. The lexcommands command contains:

	      %% [A-Z]	 putchar(tolower(yytext[0])); [ ]+$ [ ]+  putchar(' ');

FILES

       Skeleton scanner.  Generated scanner C source.  Backtracking information generated from -b option.

SEE ALSO

       Commands:  yacc(1), sed(1), awk(1)

       Files:  locale(4)

																	   flex(1)

Linux and UNIX Man Pages

flex(1) [centos man page]

Check Out this Related Man Page