regexp(3) Library Functions Manual regexp(3)
NAME
advance, advance_r, compile, compile_r, step, step_r - Regular expression compile and match routines
SYNOPSIS
#define INIT declarations #define GETC getc code #define PEEKC peek code #define UNGETC(c) ungetc code #define RETURN(ptr) return code
#define ERROR(val) error code
#include <regexp.h>
char *compile(
char *instring,
char *expbuf,
const char *endbuf,
int eof);
int step(
const char *string,
const char *expbuf);
int advance(
const char *string,
const char *expbuf);
extern char *loc1, *loc2, *locs;
The following functions do not conform to current standards and are supported only for backward compatibility:
char *compile_r(
char *instring,
char *expbuf,
char *endbuf,
int eof,
struct regexp_data *regexp_data);
int advance_r(
char *string,
char *expbuf,
struct regexp_data *regexp_data);
int step_r(
char *string,
char *expbuf,
struct regexp_data *regexp_data);
STANDARDS
Interfaces documented on this reference page conform to industry standards as follows:
advance(), compile(), step(): XSH4.2
Refer to the standards(5) reference page for more information about industry standards and associated tags.
PARAMETERS
The value of the next character (byte) in the regular expression pattern. Returned by the next call to the GETC() and PEEKC() macros.
Specifies a pointer to the character following the last character of the compiled regular expression. Specifies an error value. Specifies
a string to be passed to the compile() function.
The instring parameter is never used explicitly by the compile() function, but you can use it in your macros. For example, you may
want to pass the string containing a pattern as the instring parameter to the compile() function and use the INIT() macro to set a
pointer to the beginning of this string. When your macros do not use instring, call the compile() function with a value of ((char *)
0) for this parameter. Points to a character array where the compiled regular expression is stored. Points to the location that
immediately follows the character array where the compiled regular expression is stored. When the compiled expression cannot be con-
tained in (endbuf-expbuf) number of bytes, a call to the ERROR(_BIGREGEXP) macro is made (see the ERRORS section). Specifies the
character that marks the end of the regular expression. For example, in ed this character is usually a / (slash). Points to a NULL
terminated string of characters, in the step() function, to be searched for a match. Is data for the compile_r(), step_r(), and
advance_r() functions.
DESCRIPTION
The compile(), advance(), and step() functions are used for general-purpose expression matching.
The compile() function takes a simple regular expression as input and produces a compiled expression that can be used with the step() and
advance() functions.
The following six macros, used in the compile() function, must be defined before the #include <regexp.h> statement in programs. The GETC(),
PEEKC(), and UNGETC() macros operate on the regular expression provided as input for the compile() function. The INIT() macro is used for
dependent declarations and initializations. In the regexp.h header file this macro is located right after the compile() function declara-
tions and opening { (left brace). Your INIT() declarations must end with a ; (semicolon).
The INIT() macro is frequently used to set a register variable to point to the beginning of the regular expression, so that this
pointer can be used in declarations for GETC(), PEEKC(), and UNGETC(). Alternatively, you can use INIT() to declare external vari-
ables that GETC(), PEEKC(), and UNGETC() need. The GETC() macro returns the value of the next character (byte) in the regular-
expression pattern. Successive calls to GETC() return successive characters of the regular expression. The PEEKC() macro returns
the next character (byte) in the regular expression. Immediate subsequent calls to this macro return the same byte, which is also
the next character returned by the GETC() macro. The UNGETC() macro causes the c parameter to be returned by the next call to the
GETC() and PEEKC() macros. No more than one character of pushback is ever needed because this character is guaranteed to be the last
character read by the GETC() macro. The value of the UNGETC() macro is always ignored. The RETURN() macro is used for normal exit
of the compile() function. The value of the ptr parameter is a pointer to the character following the last character of the com-
piled regular expression. This is useful in programs that manage memory allocation. The ERROR() macro is the abnormal return from
the compile() function. A call to this macro should never return a value. In this macro, val is an error number, which is
described in the ERRORS section of this reference page.
The step() function finds the first substring of the string parameter that matches the compiled expression pointed to by the expbuf parame-
ter. When there is no match, the step() function returns a value of 0 (zero). When there is a match, the step() function returns a nonzero
value and sets two global character pointers: loc1, which points to the first character of the substring that matches the pattern, and
loc2, which points to the character immediately following the substring that matches the pattern. When the regular expression matches the
entire expression, loc1 points to the first character of the string parameter and loc2 points to the NULL character at the end of the
expression specified by the string parameter.
The step() function uses the integer variable circf, which is set by the compile() function when the regular expression begins with a ^
(circumflex). When this variable is set, the step() function only tries to match the regular expression to the beginning of the string.
When you compile more than one regular expression before executing the first one, save the value of circf for each compiled expression and
set circf to the saved value before each call to step().
The advance() function tests whether an initial substring of the string parameter matches the expression pointed to by the expbuf parame-
ter. Using the same parameters that were passed to it, the step() function calls the advance() function. The step() function increments a
pointer through the string parameter characters and calls advance() until a nonzero value, which indicates a match, is returned, or until
the end of the expression pointed to by the string parameter is reached. To unconditionally constrain string to point to the beginning of
the expression, call the advance() function directly instead of calling step().
When the advance() function encounters an * (asterisk) or a {} sequence in the regular expression, it advances its pointer to the string
to be matched as far as possible and recursively calls itself, trying to match the remainder of the regular expression. As long as there is
no match, the advance() function backs up along the string until the function finds a match or reaches the point in the string where the
initial match with the * or {} character occurred.
It is sometimes desirable to stop this backing up before the initial pointer position in the string is reached. When the locs global char-
acter pointer is matched with the character at the pointer position in the string during the backing-up process, the advance() function
breaks out of the recursive loop that backs up and returns the value 0 (zero).
The compile_r(), step_r(), and advance_r() functions are the reentrant versions of the compile(), step(), and advance() functions. They
are supported in order to maintain backward compatibility with operating system versions prior to Tru64 UNIX Version 4.0.
The regexp.h header file defines the regexp_data structure.
EXAMPLES
The following is an example of the regular expression macros and calls from the grep command:
#define INIT register char *sp=instring; #define GETC() (*sp++) #define PEEKC() (*sp) #define
UNGETC(c) (--sp) #define RETURN(c) return; #define ERROR(c) regerr()
#include <regexp.h>
. . . compile (patstr, expbuf, &expbuf[ESIZE], '