Unix/Linux Go Back    


Linux 2.6 - man page for awk (linux section 1posix)

Linux & Unix Commands - Search Man Pages
Man Page or Keyword Search:   man
Select Man Page Set:       apropos Keyword Search (sections above)


AWK(P)				    POSIX Programmer's Manual				   AWK(P)

NAME
       awk - pattern scanning and processing language

SYNOPSIS
       awk [-F ERE][-v assignment] ... program [argument ...]

       awk [-F ERE] -f progfile ...  [-v assignment] ...[argument ...]

DESCRIPTION
       The  awk  utility shall execute programs written in the awk programming language, which is
       specialized for textual data manipulation. An awk program is a sequence	of  patterns  and
       corresponding  actions.	When  input is read that matches a pattern, the action associated
       with that pattern is carried out.

       Input shall be interpreted as a sequence of records. By default, a record is a line,  less
       its terminating <newline>, but this can be changed by using the RS built-in variable. Each
       record of input shall be matched in turn against each pattern in  the  program.	For  each
       pattern matched, the associated action shall be executed.

       The  awk  utility  shall  interpret  each  input  record as a sequence of fields where, by
       default, a field is a string of non- <blank>s. This default  white-space  field	delimiter
       can  be	changed by using the FS built-in variable or -F ERE. The awk utility shall denote
       the first field in a record $1, the second $2, and so on. The symbol $0 shall refer to the
       entire  record;	setting  any  other field causes the re-evaluation of $0. Assigning to $0
       shall reset the values of all other fields and the NF built-in variable.

OPTIONS
       The awk utility shall conform to the Base Definitions volume of IEEE Std 1003.1-2001, Sec-
       tion 12.2, Utility Syntax Guidelines.

       The following options shall be supported:

       -F  ERE
	      Define  the input field separator to be the extended regular expression ERE, before
	      any input is read; see Regular Expressions .

       -f  progfile
	      Specify the pathname of the file progfile containing an awk  program.  If  multiple
	      instances of this option are specified, the concatenation of the files specified as
	      progfile in the order specified shall be the  awk  program.  The	awk  program  can
	      alternatively be specified in the command line as a single argument.

       -v  assignment
	      The application shall ensure that the assignment argument is in the same form as an
	      assignment operand. The specified variable assignment shall occur prior to  execut-
	      ing the awk program, including the actions associated with BEGIN patterns (if any).
	      Multiple occurrences of this option can be specified.

OPERANDS
       The following operands shall be supported:

       program
	      If no -f option is specified, the first operand to awk shall be the text of the awk
	      program.	The  application shall supply the program operand as a single argument to
	      awk. If the text does not end in a <newline>, awk shall interpret the text as if it
	      did.

       argument
	      Either of the following two types of argument can be intermixed:

       file
	      A  pathname  of a file that contains the input to be read, which is matched against
	      the set of patterns in the program. If no file operands are specified, or if a file
	      operand is '-' , the standard input shall be used.

       assignment
	      An operand that begins with an underscore or alphabetic character from the portable
	      character   set	(see   the   table   in   the	Base   Definitions   volume    of
	      IEEE Std 1003.1-2001,  Section 6.1, Portable Character Set), followed by a sequence
	      of underscores, digits, and alphabetics from the portable character  set,  followed
	      by  the  '=' character, shall specify a variable assignment rather than a pathname.
	      The characters before the '=' represent the name of an awk variable; if  that  name
	      is  an  awk  reserved word (see Grammar ) the behavior is undefined. The characters
	      following the equal sign shall be interpreted as if they appeared in the	awk  pro-
	      gram  preceded  and  followed by a double-quote ( ' )' character, as a STRING token
	      (see Grammar ), except that if the last character is  an	unescaped  backslash,  it
	      shall  be  interpreted as a literal backslash rather than as the first character of
	      the sequence "\"" . The variable shall be assigned the value of that  STRING  token
	      and,  if	appropriate, shall be considered a numeric string (see Expressions in awk
	      ), the variable shall also be  assigned  its  numeric  value.  Each  such  variable
	      assignment  shall occur just prior to the processing of the following file, if any.
	      Thus, an assignment before the first file argument  shall  be  executed  after  the
	      BEGIN  actions  (if  any),  while  an assignment after the last file argument shall
	      occur before the END actions (if any). If there are no file arguments,  assignments
	      shall be executed before processing the standard input.

STDIN
       The  standard input shall be used only if no file operands are specified, or if a file op-
       erand is '-' ; see the INPUT FILES section. If the awk program contains no actions and  no
       patterns, but is otherwise a valid awk program, standard input and any file operands shall
       not be read and awk shall exit with a return status of zero.

INPUT FILES
       Input files to the awk program from any of the following sources shall be text files:

	* Any file operands or their equivalents, achieved by modifying the  awk  variables  ARGV
	  and ARGC

	* Standard input in the absence of any file operands

	* Arguments to the getline function

       Whether	the variable RS is set to a value other than a <newline> or not, for these files,
       implementations shall support records  terminated  with	the  specified	separator  up  to
       {LINE_MAX} bytes and may support longer records.

       If  -f progfile is specified, the application shall ensure that the files named by each of
       the progfile option-arguments are text files and their concatenation, in the same order as
       they appear in the arguments, is an awk program.

ENVIRONMENT VARIABLES
       The following environment variables shall affect the execution of awk:

       LANG   Provide  a  default  value for the internationalization variables that are unset or
	      null. (See the Base Definitions volume of IEEE Std 1003.1-2001, Section 8.2, Inter-
	      nationalization Variables for the precedence of internationalization variables used
	      to determine the values of locale categories.)

       LC_ALL If set to a non-empty string value, override the values of all the  other  interna-
	      tionalization variables.

       LC_COLLATE
	      Determine  the  locale  for the behavior of ranges, equivalence classes, and multi-
	      character collating elements within  regular  expressions  and  in  comparisons  of
	      string values.

       LC_CTYPE
	      Determine  the  locale for the interpretation of sequences of bytes of text data as
	      characters (for example, single-byte as opposed to multi-byte characters	in  argu-
	      ments  and  input  files), the behavior of character classes within regular expres-
	      sions, the identification of characters as letters, and the  mapping  of	uppercase
	      and lowercase characters for the toupper and tolower functions.

       LC_MESSAGES
	      Determine the locale that should be used to affect the format and contents of diag-
	      nostic messages written to standard error.

       LC_NUMERIC
	      Determine the radix character used when interpreting numeric input, performing con-
	      versions	between numeric and string values, and formatting numeric output. Regard-
	      less of locale, the period character (the  decimal-point	character  of  the  POSIX
	      locale)  is  the	decimal-point  character  recognized  in  processing awk programs
	      (including assignments in command line arguments).

       NLSPATH
	      Determine the location of message catalogs for the processing of LC_MESSAGES .

       PATH   Determine the search path when looking for commands executed  by	system(expr),  or
	      input  and  output  pipes; see the Base Definitions volume of IEEE Std 1003.1-2001,
	      Chapter 8, Environment Variables.

       In addition, all environment variables shall be visible via the awk variable ENVIRON.

ASYNCHRONOUS EVENTS
       Default.

STDOUT
       The nature of the output files depends on the awk program.

STDERR
       The standard error shall be used only for diagnostic messages.

OUTPUT FILES
       The nature of the output files depends on the awk program.

EXTENDED DESCRIPTION
   Overall Program Structure
       An awk program is composed of pairs of the form:

	      pattern { action }

       Either the pattern or the action (including the enclosing brace characters) can	be  omit-
       ted.

       A  missing  pattern shall match any record of input, and a missing action shall be equiva-
       lent to:

	      { print }

       Execution of the awk program shall start by first executing the	actions  associated  with
       all  BEGIN  patterns  in  the  order they occur in the program. Then each file operand (or
       standard input if no files were specified) shall be processed in turn by reading data from
       the file until a record separator is seen ( <newline> by default). Before the first refer-
       ence to a field in the record is evaluated, the record shall be split into fields, accord-
       ing  to	the  rules in Regular Expressions , using the value of FS that was current at the
       time the record was read. Each pattern in the program then shall be evaluated in the order
       of occurrence, and the action associated with each pattern that matches the current record
       executed. The action for a matching pattern shall be executed before evaluating subsequent
       patterns.  Finally,  the actions associated with all END patterns shall be executed in the
       order they occur in the program.

   Expressions in awk
       Expressions describe computations used in patterns and actions.	In the	following  table,
       valid  expression  operations  are given in groups from highest precedence first to lowest
       precedence last, with equal-precedence operators  grouped  between  horizontal  lines.  In
       expression  evaluation,	where the grammar is formally ambiguous, higher precedence opera-
       tors shall be evaluated before lower precedence operators.  In  this  table  expr,  expr1,
       expr2,  and expr3 represent any expression, while lvalue represents any entity that can be
       assigned to (that is, on the left side of an assignment operator). The precise  syntax  of
       expressions is given in Grammar .

			   Table: Expressions in Decreasing Precedence in awk

	     Syntax		   Name 		     Type of Result   Associativity
	     ( expr )		   Grouping		     Type of expr     N/A
	     $expr		   Field reference	     String	      N/A
	     ++ lvalue		   Pre-increment	     Numeric	      N/A
	     -- lvalue		   Pre-decrement	     Numeric	      N/A
	     lvalue ++		   Post-increment	     Numeric	      N/A
	     lvalue --		   Post-decrement	     Numeric	      N/A
	     expr ^ expr	   Exponentiation	     Numeric	      Right
	     ! expr		   Logical not		     Numeric	      N/A
	     + expr		   Unary plus		     Numeric	      N/A
	     - expr		   Unary minus		     Numeric	      N/A
	     expr * expr	   Multiplication	     Numeric	      Left
	     expr / expr	   Division		     Numeric	      Left
	     expr % expr	   Modulus		     Numeric	      Left
	     expr + expr	   Addition		     Numeric	      Left
	     expr - expr	   Subtraction		     Numeric	      Left
	     expr expr		   String concatenation      String	      Left
	     expr < expr	   Less than		     Numeric	      None
	     expr <= expr	   Less than or equal to     Numeric	      None
	     expr != expr	   Not equal to 	     Numeric	      None
	     expr == expr	   Equal to		     Numeric	      None
	     expr > expr	   Greater than 	     Numeric	      None
	     expr >= expr	   Greater than or equal to  Numeric	      None
	     expr ~ expr	   ERE match		     Numeric	      None
	     expr !~ expr	   ERE non-match	     Numeric	      None
	     expr in array	   Array membership	     Numeric	      Left
	     ( index ) in array    Multi-dimension array     Numeric	      Left
				   membership
	     expr && expr	   Logical AND		     Numeric	      Left
	     expr || expr	   Logical OR		     Numeric	      Left
	     expr1 ? expr2 : expr3 Conditional expression    Type of selected Right
							     expr2 or expr3
	     lvalue ^= expr	   Exponentiation assignment Numeric	      Right
	     lvalue %= expr	   Modulus assignment	     Numeric	      Right
	     lvalue *= expr	   Multiplication assignment Numeric	      Right
	     lvalue /= expr	   Division assignment	     Numeric	      Right
	     lvalue += expr	   Addition assignment	     Numeric	      Right
	     lvalue -= expr	   Subtraction assignment    Numeric	      Right
	     lvalue = expr	   Assignment		     Type of expr     Right

       Each  expression  shall	have  either  a string value, a numeric value, or both. Except as
       stated for specific contexts, the value of an expression shall be implicitly converted  to
       the  type needed for the context in which it is used. A string value shall be converted to
       a numeric value by the equivalent of the following calls to functions defined by the ISO C
       standard:

	      setlocale(LC_NUMERIC, "");
	      numeric_value = atof(string_value);

       A  numeric  value  that	is exactly equal to the value of an integer (see Concepts Derived
       from the ISO C Standard ) shall be converted to a string by the equivalent of  a  call  to
       the  sprintf function (see String Functions ) with the string "%d" as the fmt argument and
       the numeric value being converted as the first and only expr argument. Any  other  numeric
       value  shall  be converted to a string by the equivalent of a call to the sprintf function
       with the value of the variable CONVFMT as the fmt argument and  the  numeric  value  being
       converted as the first and only expr argument. The result of the conversion is unspecified
       if the value of CONVFMT is not a  floating-point  format  specification.  This  volume  of
       IEEE Std 1003.1-2001  specifies	no  explicit  conversions between numbers and strings. An
       application can force an expression to be treated as a number by adding zero to it, or can
       force it to be treated as a string by concatenating the null string ( "" ) to it.

       A string value shall be considered a numeric string if it comes from one of the following:

	1. Field variables

	2. Input from the getline() function

	3. FILENAME

	4. ARGV array elements

	5. ENVIRON array elements

	6. Array elements created by the split() function

	7. A command line variable assignment

	8. Variable assignment from another numeric string variable

       and after all the following conversions have been applied, the resulting string would lex-
       ically be recognized as a NUMBER token as described by the lexical conventions in  Grammar
       :

	* All leading and trailing <blank>s are discarded.

	* If the first non- <blank> is '+' or '-' , it is discarded.

	* Changing  each  occurrence  of the decimal point character from the current locale to a
	  period.

       If a '-' character is ignored in the preceding  description,  the  numeric  value  of  the
       numeric	string shall be the negation of the numeric value of the recognized NUMBER token.
       Otherwise, the numeric value of the numeric string shall be the numeric value of the  rec-
       ognized	NUMBER	token. Whether or not a string is a numeric string shall be relevant only
       in contexts where that term is used in this section.

       When an expression is used in a Boolean context, if it has a numeric  value,  a	value  of
       zero  shall be treated as false and any other value shall be treated as true. Otherwise, a
       string value of the null string shall be treated as false and any  other  value	shall  be
       treated as true. A Boolean context shall be one of the following:

	* The first subexpression of a conditional expression

	* An expression operated on by logical NOT, logical AND, or logical OR

	* The second expression of a for statement

	* The expression of an if statement

	* The expression of the while clause in either a while or do... while statement

	* An expression used as a pattern (as in Overall Program Structure)

       All arithmetic shall follow the semantics of floating-point arithmetic as specified by the
       ISO C standard (see Concepts Derived from the ISO C Standard ).

       The value of the expression:

	      expr1 ^ expr2

       shall be equivalent to the value returned by the ISO C standard function call:

	      pow(expr1, expr2)

       The expression:

	      lvalue ^= expr

       shall be equivalent to the ISO C standard expression:

	      lvalue = pow(lvalue, expr)

       except that lvalue shall be evaluated only once. The value of the expression:

	      expr1 % expr2

       shall be equivalent to the value returned by the ISO C standard function call:

	      fmod(expr1, expr2)

       The expression:

	      lvalue %= expr

       shall be equivalent to the ISO C standard expression:

	      lvalue = fmod(lvalue, expr)

       except that lvalue shall be evaluated only once.

       Variables and fields shall be set by the assignment statement:

	      lvalue = expression

       and the type of expression shall determine the resulting  variable  type.  The  assignment
       includes  the  arithmetic  assignments  ( "+=" , "-=" , "*=" , "/=" , "%=" , "^=" , "++" ,
       "--" ) all of which shall produce a numeric result. The left-hand side  of  an  assignment
       and  the  target  of  increment and decrement operators can be one of a variable, an array
       with index, or a field selector.

       The awk language supplies arrays that are used for storing numbers or strings. Arrays need
       not  be declared. They shall initially be empty, and their sizes shall change dynamically.
       The subscripts, or element identifiers, are strings, providing a type of associative array
       capability. An array name followed by a subscript within square brackets can be used as an
       lvalue and thus as an expression, as described in the grammar; see Grammar . Unsubscripted
       array names can be used in only the following contexts:

	* A parameter in a function definition or function call

	* The  NAME  token  following  any use of the keyword in as specified in the grammar (see
	  Grammar ); if the name used in this context is not an array name, the behavior is unde-
	  fined

       A  valid  array index shall consist of one or more comma-separated expressions, similar to
       the way in which multi-dimensional arrays  are  indexed	in  some  programming  languages.
       Because	awk  arrays are really one-dimensional, such a comma-separated list shall be con-
       verted to a single string by concatenating the string values of the separate  expressions,
       each  separated	from  the other by the value of the SUBSEP variable.  Thus, the following
       two index operations shall be equivalent:

	      var[expr1, expr2, ... exprn]

	      var[expr1 SUBSEP expr2 SUBSEP ... SUBSEP exprn]

       The application shall ensure that a multi-dimensioned index used with the in  operator  is
       parenthesized.  The  in operator, which tests for the existence of a particular array ele-
       ment, shall not cause that element to exist. Any other reference to  a  nonexistent  array
       element shall automatically create it.

       Comparisons  (with  the '<' , "<=" , "!=" , "==" , '>' , and ">=" operators) shall be made
       numerically if both operands are numeric, if one is numeric and the  other  has	a  string
       value  that  is a numeric string, or if one is numeric and the other has the uninitialized
       value. Otherwise, operands shall be converted to strings as required and a string compari-
       son  shall be made using the locale-specific collation sequence. The value of the compari-
       son expression shall be 1 if the relation is true, or 0 if the relation is false.

   Variables and Special Variables
       Variables can be used in an awk program by referencing them.  With the exception of  func-
       tion  parameters (see User-Defined Functions ), they are not explicitly declared. Function
       parameter names shall be local to the function; all other variable names shall be  global.
       The  same  name	shall  not be used as both a function parameter name and as the name of a
       function or a special awk variable. The same name shall not be used  both  as  a  variable
       name  with  global  scope  and  as the name of a function. The same name shall not be used
       within the same scope both as a scalar variable and as an array.  Uninitialized variables,
       including  scalar variables, array elements, and field variables, shall have an uninitial-
       ized value. An uninitialized value shall have both a numeric value of zero  and	a  string
       value  of the empty string. Evaluation of variables with an uninitialized value, to either
       string or numeric, shall be determined by the context in which they are used.

       Field variables shall be designated by a '$' followed by a number or numerical expression.
       The effect of the field number expression evaluating to anything other than a non-negative
       integer is unspecified; uninitialized variables or string values need not be converted  to
       numeric values in this context. New field variables can be created by assigning a value to
       them.  References to nonexistent fields (that is, fields after $NF), shall evaluate to the
       uninitialized  value. Such references shall not create new fields. However, assigning to a
       nonexistent field (for example, $(NF+2)=5) shall increase the  value  of  NF;  create  any
       intervening  fields  with  the uninitialized value; and cause the value of $0 to be recom-
       puted, with the fields being separated by the value of OFS. Each field variable shall have
       a  string  value  or  an uninitialized value when created.  Field variables shall have the
       uninitialized value when created from $0 using FS and the variable does	not  contain  any
       characters.  If	appropriate, the field variable shall be considered a numeric string (see
       Expressions in awk ).

       Implementations shall support the following other special variables that are set by awk:

       ARGC   The number of elements in the ARGV array.

       ARGV   An array of command line arguments, excluding options  and  the  program	argument,
	      numbered from zero to ARGC-1.

       The arguments in ARGV can be modified or added to; ARGC can be altered. As each input file
       ends, awk shall treat the next non-null element of  ARGV,  up  to  the  current	value  of
       ARGC-1, inclusive, as the name of the next input file. Thus, setting an element of ARGV to
       null means that it shall not be treated as an input file. The name '-' indicates the stan-
       dard  input.  If  an  argument  matches the format of an assignment operand, this argument
       shall be treated as an assignment rather than a file argument.

       CONVFMT
	      The printf format for converting numbers to strings (except for output  statements,
	      where OFMT is used); "%.6g" by default.

       ENVIRON
	      An  array representing the value of the environment, as described in the exec func-
	      tions defined in the System Interfaces volume of IEEE Std 1003.1-2001. The  indices
	      of the array shall be strings consisting of the names of the environment variables,
	      and the value of each array element shall be a string consisting of  the	value  of
	      that  variable.  If  appropriate,  the  environment  variable shall be considered a
	      numeric string (see Expressions in awk ); the array element  shall  also	have  its
	      numeric value.

       In all cases where the behavior of awk is affected by environment variables (including the
       environment of any commands that awk executes via the  system  function	or  via  pipeline
       redirections with the print statement, the printf statement, or the getline function), the
       environment used shall be the environment at the time awk began executing; it is implemen-
       tation-defined whether any modification of ENVIRON affects this environment.

       FILENAME
	      A pathname of the current input file. Inside a BEGIN action the value is undefined.
	      Inside an END action the value shall be the name of the last input file processed.

       FNR    The ordinal number of the current record in the current file. Inside a BEGIN action
	      the  value shall be zero. Inside an END action the value shall be the number of the
	      last record processed in the last file processed.

       FS     Input field separator regular expression; a <space> by default.

       NF     The number of fields in the current record. Inside a BEGIN action, the use of NF is
	      undefined  unless a getline function without a var argument is executed previously.
	      Inside an END action, NF shall retain the value it had for the  last  record  read,
	      unless  a  subsequent,  redirected, getline function without a var argument is per-
	      formed prior to entering the END action.

       NR     The ordinal number of the current record from the start of input.  Inside  a  BEGIN
	      action  the value shall be zero. Inside an END action the value shall be the number
	      of the last record processed.

       OFMT   The printf format for converting numbers to strings in output statements (see  Out-
	      put Statements ); "%.6g" by default. The result of the conversion is unspecified if
	      the value of OFMT is not a floating-point format specification.

       OFS    The print statement output field separation; <space> by default.

       ORS    The print statement output record separator; a <newline> by default.

       RLENGTH
	      The length of the string matched by the match function.

       RS     The first character of the string value of RS shall be the input record  separator;
	      a  <newline>  by	default.  If RS contains more than one character, the results are
	      unspecified.  If RS is null, then records are separated by sequences consisting  of
	      a <newline> plus one or more blank lines, leading or trailing blank lines shall not
	      result in empty records at the beginning or end of the input, and a <newline> shall
	      always be a field separator, no matter what the value of FS is.

       RSTART The  starting  position of the string matched by the match function, numbering from
	      1. This shall always be equivalent to the return value of the match function.

       SUBSEP The subscript separator string for multi-dimensional arrays; the default	value  is
	      implementation-defined.

   Regular Expressions
       The  awk  utility shall make use of the extended regular expression notation (see the Base
       Definitions volume of IEEE Std 1003.1-2001, Section  9.4,  Extended  Regular  Expressions)
       except  that it shall allow the use of C-language conventions for escaping special charac-
       ters within the EREs, as specified  in  the  table  in  the  Base  Definitions  volume  of
       IEEE Std 1003.1-2001, Chapter 5, File Format Notation ( '\\' , '\a' , '\b' , '\f' , '\n' ,
       '\r' , '\t' , '\v' ) and the following table; these escape sequences shall  be  recognized
       both  inside  and outside bracket expressions.  Note that records need not be separated by
       <newline>s and string constants can contain <newline>s, so even the "\n" sequence is valid
       in awk EREs. Using a slash character within an ERE requires the escaping shown in the fol-
       lowing table.

				     Table: Escape Sequences in awk

		 Escape
		 Sequence Description			 Meaning
		 \"	  Backslash quotation-mark	 Quotation-mark character
		 \/	  Backslash slash		 Slash character
		 \ddd	  A backslash character followed The character whose encoding
			  by the longest sequence of	 is represented by the one,
			  one, two, or three octal-digit two, or three-digit octal
			  characters (01234567). If all  integer. Multi-byte characters
			  of the digits are 0 (that is,  require multiple, concatenated
			  representation of the NUL	 escape sequences of this type,
			  character), the behavior is	 including the leading '\' for
			  undefined.			 each byte.
		 \c	  A backslash character followed Undefined
			  by any character not described
			  in this table or in the table
			  in the Base Definitions volume
			  of IEEE Std 1003.1-2001, Chap-
			  ter 5, File Format Notation (
			  '\\' , '\a' , '\b' , '\f' ,
			  '\n' , '\r' , '\t' , '\v' ).

       A regular expression can be matched against a specific field or string by using one of the
       two regular expression matching operators, '~' and "!~" . These operators shall	interpret
       their  right-hand operand as a regular expression and their left-hand operand as a string.
       If the regular expression matches the string, the '~' expression shall evaluate to a value
       of  1,  and  the  "!~"  expression shall evaluate to a value of 0. (The regular expression
       matching operation is as defined by the term matched in the  Base  Definitions  volume  of
       IEEE Std 1003.1-2001, Section 9.1, Regular Expression Definitions, where a match occurs on
       any part of the string unless the regular expression is limited	with  the  circumflex  or
       dollar  sign special characters.) If the regular expression does not match the string, the
       '~' expression shall evaluate to a value of 0, and the "!~" expression shall evaluate to a
       value  of 1. If the right-hand operand is any expression other than the lexical token ERE,
       the string value of the expression shall be interpreted as an extended regular expression,
       including the escape conventions described above.  Note that these same escape conventions
       shall also be applied in determining the value of a  string  literal  (the  lexical  token
       STRING),  and  thus  shall  be applied a second time when a string literal is used in this
       context.

       When an ERE token appears as an expression in any context other than as the right-hand  of
       the '~' or "!~" operator or as one of the built-in function arguments described below, the
       value of the resulting expression shall be the equivalent of:

	      $0 ~ /ere/

       The ere argument to the gsub, match, sub functions, and the fs argument to the split func-
       tion  (see  String Functions ) shall be interpreted as extended regular expressions. These
       can be either ERE tokens or arbitrary expressions, and shall be interpreted  in	the  same
       manner as the right-hand side of the '~' or "!~" operator.

       An  extended  regular expression can be used to separate fields by using the -F ERE option
       or by assigning a string containing the	expression  to	the  built-in  variable  FS.  The
       default	value  of  the	FS variable shall be a single <space>. The following describes FS
       behavior:

	1. If FS is a null string, the behavior is unspecified.

	2. If FS is a single character:

	    a. If FS is <space>, skip leading and trailing <blank>s; fields shall be delimited by
	       sets of one or more <blank>s.

	    b. Otherwise,  if FS is any other character c, fields shall be delimited by each sin-
	       gle occurrence of c.

	3. Otherwise, the string value of FS shall  be	considered  to	be  an	extended  regular
	   expression.	Each  occurrence  of  a sequence matching the extended regular expression
	   shall delimit fields.

       Except for the '~' and "!~" operators, and in the gsub, match,  split,  and  sub  built-in
       functions, ERE matching shall be based on input records; that is, record separator charac-
       ters (the first character of the value of the variable RS, <newline> by default) cannot be
       embedded  in the expression, and no expression shall match the record separator character.
       If the record separator is not <newline>, <newline>s embedded in  the  expression  can  be
       matched.  For the '~' and "!~" operators, and in those four built-in functions, ERE match-
       ing shall be based on text strings; that is, any character (including  <newline>  and  the
       record  separator)  can be embedded in the pattern, and an appropriate pattern shall match
       any character. However, in all awk ERE matching, the use of one or more NUL characters  in
       the pattern, input record, or text string produces undefined results.

   Patterns
       A  pattern  is  any  valid expression, a range specified by two expressions separated by a
       comma, or one of the two special patterns BEGIN or END.

   Special Patterns
       The awk utility shall recognize two special patterns, BEGIN and END.  Each  BEGIN  pattern
       shall  be matched once and its associated action executed before the first record of input
       is read (except possibly by use of the getline function-see Input/Output and General Func-
       tions - in a prior BEGIN action) and before command line assignment is done. Each END pat-
       tern shall be matched once and its associated action executed after  the  last  record  of
       input has been read. These two patterns shall have associated actions.

       BEGIN and END shall not combine with other patterns. Multiple BEGIN and END patterns shall
       be allowed. The actions associated with the BEGIN patterns shall be executed in the  order
       specified  in the program, as are the END actions. An END pattern can precede a BEGIN pat-
       tern in a program.

       If an awk program consists of only actions with the pattern BEGIN, and  the  BEGIN  action
       contains  no  getline  function,  awk  shall  exit without reading its input when the last
       statement in the last BEGIN action is executed. If an awk program consists of only actions
       with  the  pattern END or only actions with the patterns BEGIN and END, the input shall be
       read before the statements in the END actions are executed.

   Expression Patterns
       An expression pattern shall be evaluated as if it were an expression in a Boolean context.
       If the result is true, the pattern shall be considered to match, and the associated action
       (if any) shall be executed. If the result is false, the action shall not be executed.

   Pattern Ranges
       A pattern range consists of two expressions separated by a comma; in this case, the action
       shall be performed for all records between a match of the first expression and the follow-
       ing match of the second expression, inclusive. At this point, the  pattern  range  can  be
       repeated starting at input records subsequent to the end of the matched range.

   Actions
       An  action  is  a  sequence  of statements as shown in the grammar in Grammar . Any single
       statement can be replaced by a statement list enclosed in braces.  The  application  shall
       ensure  that  statements  in  a	statement list are separated by <newline>s or semicolons.
       Statements in a statement list shall be executed  sequentially  in  the	order  that  they
       appear.

       The  expression	acting as the conditional in an if statement shall be evaluated and if it
       is non-zero or non-null, the following statement shall be executed; otherwise, if else  is
       present, the statement following the else shall be executed.

       The  if,  while,  do...	while, for, break, and continue statements are based on the ISO C
       standard (see Concepts Derived from the ISO C Standard ), except that the Boolean  expres-
       sions shall be treated as described in Expressions in awk , and except in the case of:

	      for (variable in array)

       which  shall  iterate,  assigning each index of array to variable in an unspecified order.
       The results of adding new elements to array within such a for loop  are	undefined.  If	a
       break or continue statement occurs outside of a loop, the behavior is undefined.

       The  delete  statement shall remove an individual array element.  Thus, the following code
       deletes an entire array:

	      for (index in array)
		  delete array[index]

       The next statement shall cause all further processing of the current input  record  to  be
       abandoned.  The behavior is undefined if a next statement appears or is invoked in a BEGIN
       or END action.

       The exit statement shall invoke all END actions in the order in which they  occur  in  the
       program	source	and  then  terminate  the  program without reading further input. An exit
       statement inside an END action shall terminate the program without  further  execution  of
       END  actions.  If an expression is specified in an exit statement, its numeric value shall
       be the exit status of awk, unless subsequent errors are encountered or a  subsequent  exit
       statement with an expression is executed.

   Output Statements
       Both  print  and  printf  statements shall write to standard output by default. The output
       shall be written to the location specified by output_redirection if one	is  supplied,  as
       follows:

	      > expression>> expression| expression

       In  all	cases,	the  expression  shall be evaluated to produce a string that is used as a
       pathname into which to write (for '>' or ">>" ) or as a command to be executed (for '|' ).
       Using  the  first  two  forms, if the file of that name is not currently open, it shall be
       opened, creating it if necessary and using the first form, truncating the file. The output
       then  shall be appended to the file. As long as the file remains open, subsequent calls in
       which expression evaluates to the same string value shall  simply  append  output  to  the
       file.  The  file remains open until the close function (see Input/Output and General Func-
       tions ) is called with an expression that evaluates to the same string value.

       The third form shall write output onto a stream piped to  the  input  of  a  command.  The
       stream shall be created if no stream is currently open with the value of expression as its
       command name.  The stream created shall be equivalent to one created  by  a  call  to  the
       popen()	function defined in the System Interfaces volume of IEEE Std 1003.1-2001 with the
       value of expression as the command argument and a value of w as the mode argument. As long
       as  the	stream	remains  open, subsequent calls in which expression evaluates to the same
       string value shall write output to the existing stream. The stream shall remain open until
       the  close function (see Input/Output and General Functions ) is called with an expression
       that evaluates to the same string value.  At that time, the stream shall be closed  as  if
       by  a  call  to	the  pclose()  function  defined  in  the  System  Interfaces  volume  of
       IEEE Std 1003.1-2001.

       As described in detail by the grammar in Grammar , these output statements  shall  take	a
       comma-separated list of expressions referred to in the grammar by the non-terminal symbols
       expr_list, print_expr_list, or print_expr_list_opt. This list is referred to here  as  the
       expression list, and each member is referred to as an expression argument.

       The  print  statement shall write the value of each expression argument onto the indicated
       output stream separated by the current output field separator (see  variable  OFS  above),
       and  terminated	by  the  output record separator (see variable ORS above). All expression
       arguments shall be taken as strings, being converted if necessary; this	conversion  shall
       be  as described in Expressions in awk , with the exception that the printf format in OFMT
       shall be used instead of the value in CONVFMT. An empty expression list	shall  stand  for
       the whole input record ($0).

       The  printf  statement shall produce output based on a notation similar to the File Format
       Notation used to describe file formats in this volume  of  IEEE Std 1003.1-2001	(see  the
       Base Definitions volume of IEEE Std 1003.1-2001, Chapter 5, File Format Notation).  Output
       shall be produced as specified with the first expression argument as the string format and
       subsequent expression arguments as the strings arg1 to argn, inclusive, with the following
       exceptions:

	1. The format shall be an actual character string rather than a graphical representation.
	   Therefore,  it  cannot  contain  empty  character positions. The <space> in the format
	   string, in any context other than a flag  of  a  conversion	specification,	shall  be
	   treated as an ordinary character that is copied to the output.

	2. If the character set contains a ' ' character and that character appears in the format
	   string, it shall be treated as an ordinary character that is copied to the output.

	3. The escape sequences  beginning  with  a  backslash	character  shall  be  treated  as
	   sequences  of  ordinary characters that are copied to the output. Note that these same
	   sequences shall be interpreted lexically by awk when they appear in	literal  strings,
	   but they shall not be treated specially by the printf statement.

	4. A  field  width  or precision can be specified as the '*' character instead of a digit
	   string. In this case the next argument from the expression list shall be  fetched  and
	   its numeric value taken as the field width or precision.

	5. The implementation shall not precede or follow output from the d or u conversion spec-
	   ifier characters with <blank>s not specified by the format string.

	6. The implementation shall not precede output from the o conversion specifier	character
	   with leading zeros not specified by the format string.

	7. For	the  c	conversion  specifier character: if the argument has a numeric value, the
	   character whose encoding is that value shall be output. If the value is zero or is not
	   the	encoding of any character in the character set, the behavior is undefined. If the
	   argument does not have a numeric value, the first character of the string value  shall
	   be output; if the string does not contain any characters, the behavior is undefined.

	8. For each conversion specification that consumes an argument, the next expression argu-
	   ment shall be evaluated. With the exception of the c conversion  specifier  character,
	   the value shall be converted (according to the rules specified in Expressions in awk )
	   to the appropriate type for the conversion specification.

	9. If there are insufficient expression arguments to satisfy all the conversion  specifi-
	   cations in the format string, the behavior is undefined.

       10. If  any  character sequence in the format string begins with a '%' character, but does
	   not form a valid conversion specification, the behavior is unspecified.

       Both print and printf can output at least {LINE_MAX} bytes.

   Functions
       The awk language has a variety of built-in functions:  arithmetic,  string,  input/output,
       and general.

   Arithmetic Functions
       The  arithmetic	functions, except for int, shall be based on the ISO C standard (see Con-
       cepts Derived from the ISO C Standard ). The behavior is  undefined  in	cases  where  the
       ISO C  standard	specifies  that  an  error be returned or that the behavior is undefined.
       Although the grammar (see Grammar ) permits built-in functions to appear with no arguments
       or  parentheses,  unless the argument or parentheses are indicated as optional in the fol-
       lowing list (by displaying them within the "[]" brackets), such use is undefined.

       atan2(y,x)
	      Return arctangent of y/x in radians in the range [-pi,pi].

       cos(x) Return cosine of x, where x is in radians.

       sin(x) Return sine of x, where x is in radians.

       exp(x) Return the exponential function of x.

       log(x) Return the natural logarithm of x.

       sqrt(x)
	      Return the square root of x.

       int(x) Return the argument truncated to an integer. Truncation shall be toward 0 when x>0.

       rand() Return a random number n, such that 0<=n<1.

       srand([expr])
	      Set the seed value for rand to expr or use the time of day if expr is omitted.  The
	      previous seed value shall be returned.

   String Functions
       The  string  functions in the following list shall be supported. Although the grammar (see
       Grammar ) permits built-in functions to appear with no arguments  or  parentheses,  unless
       the argument or parentheses are indicated as optional in the following list (by displaying
       them within the "[]" brackets), such use is undefined.

       gsub(ere, repl[, in])
	      Behave like sub (see below), except that it shall replace all  occurrences  of  the
	      regular expression (like the ed utility global substitute) in $0 or in the in argu-
	      ment, when specified.

       index(s, t)
	      Return the position, in characters, numbering from 1, in string s  where	string	t
	      first occurs, or zero if it does not occur at all.

       length[([s])]
	      Return  the  length,  in	characters,  of its argument taken as a string, or of the
	      whole record, $0, if there is no argument.

       match(s, ere)
	      Return the position, in characters,  numbering  from  1,	in  string  s  where  the
	      extended regular expression ere occurs, or zero if it does not occur at all. RSTART
	      shall be set to the starting position (which is the same as  the	returned  value),
	      zero  if	no  match  is  found;  RLENGTH	shall be set to the length of the matched
	      string, -1 if no match is found.

       split(s, a[, fs	])
	      Split the string s into array elements a[1], a[2], ..., a[n],  and  return  n.  All
	      elements	of  the array shall be deleted before the split is performed. The separa-
	      tion shall be done with the ERE fs or with the field separator  FS  if  fs  is  not
	      given.  Each array element shall have a string value when created and, if appropri-
	      ate, the array element shall be considered a numeric string (see Expressions in awk
	      ). The effect of a null string as the value of fs is unspecified.

       sprintf(fmt, expr, expr, ...)
	      Format  the  expressions according to the printf format given by fmt and return the
	      resulting string.

       sub(ere, repl[, in  ])
	      Substitute the string repl in place of the first instance of the	extended  regular
	      expression  ERE in string in and return the number of substitutions. An ampersand (
	      '&' ) appearing in the string repl shall be replaced by the  string  from  in  that
	      matches  the  ERE.  An  ampersand preceded with a backslash ( '\' ) shall be inter-
	      preted as the literal ampersand character. An occurrence of two  consecutive  back-
	      slashes  shall  be  interpreted  as  just a single literal backslash character. Any
	      other occurrence of a backslash (for example, preceding any other character)  shall
	      be  treated as a literal backslash character. Note that if repl is a string literal
	      (the lexical token STRING; see Grammar ), the handling of the  ampersand	character
	      occurs  after  any  lexical  processing,	including  any	lexical  backslash escape
	      sequence processing. If in is specified and it is not an lvalue (see Expressions in
	      awk  ),  the  behavior  is  undefined.  If in is omitted, awk shall use the current
	      record ($0) in its place.

       substr(s, m[, n	])
	      Return the at most n-character substring of s that begins at position m,	numbering
	      from  1.	If  n  is omitted, or if n specifies more characters than are left in the
	      string, the length of the substring shall be limited by the length of the string s.

       tolower(s)
	      Return a string based on the string s. Each character in s  that	is  an	uppercase
	      letter  specified to have a tolower mapping by the LC_CTYPE category of the current
	      locale shall be replaced in the returned string by the lowercase	letter	specified
	      by the mapping. Other characters in s shall be unchanged in the returned string.

       toupper(s)
	      Return a string based on the string s. Each character in s that is a lowercase let-
	      ter specified to have a toupper mapping by the LC_CTYPE  category  of  the  current
	      locale  is replaced in the returned string by the uppercase letter specified by the
	      mapping. Other characters in s are unchanged in the returned string.

       All of the preceding functions that take ERE as a parameter expect a pattern or	a  string
       valued expression that is a regular expression as defined in Regular Expressions .

   Input/Output and General Functions
       The input/output and general functions are:

       close(expression)
	      Close  the  file or pipe opened by a print or printf statement or a call to getline
	      with the same string-valued expression. The limit on the number of open  expression
	      arguments  is  implementation-defined.  If  the  close was successful, the function
	      shall return zero; otherwise, it shall return non-zero.

       expression |  getline [var]
	      Read a record of input from a stream piped from  the  output  of	a  command.   The
	      stream shall be created if no stream is currently open with the value of expression
	      as its command name. The stream created shall be equivalent to  one  created  by	a
	      call  to	the popen() function with the value of expression as the command argument
	      and a value of r as the mode argument. As long as the stream remains  open,  subse-
	      quent  calls in which expression evaluates to the same string value shall read sub-
	      sequent records from the stream. The stream shall remain open until the close func-
	      tion  is called with an expression that evaluates to the same string value. At that
	      time, the stream shall be closed as if by a call to the pclose() function.  If  var
	      is  omitted,  $0 and NF shall be set; otherwise, var shall be set and, if appropri-
	      ate, it shall be considered a numeric string (see Expressions in awk ).

       The getline operator can form ambiguous constructs when there are  unparenthesized  opera-
       tors  (including  concatenate)  to the left of the '|' (to the beginning of the expression
       containing getline). In the context of the '$' operator, '|' shall behave as if it  had	a
       lower  precedence  than '$' . The result of evaluating other operators is unspecified, and
       conforming applications shall parenthesize properly all such usages.

       getline
	      Set $0 to the next input record from the current input file. This form  of  getline
	      shall set the NF, NR, and FNR variables.

       getline	var
	      Set  variable  var  to  the  next  input record from the current input file and, if
	      appropriate, var shall be considered a numeric string (see Expressions  in  awk  ).
	      This form of getline shall set the FNR and NR variables.

       getline [var]  < expression
	      Read  the next record of input from a named file. The expression shall be evaluated
	      to produce a string that is used as a pathname. If the file of  that  name  is  not
	      currently  open, it shall be opened. As long as the stream remains open, subsequent
	      calls in which expression evaluates to the same string value shall read  subsequent
	      records  from  the  file.  The  file  shall remain open until the close function is
	      called with an expression that evaluates to the same string value. If var is  omit-
	      ted,  $0	and  NF shall be set; otherwise, var shall be set and, if appropriate, it
	      shall be considered a numeric string (see Expressions in awk ).

       The getline operator can form ambiguous constructs when there are  unparenthesized  binary
       operators (including concatenate) to the right of the '<' (up to the end of the expression
       containing the getline). The result of evaluating such a  construct  is	unspecified,  and
       conforming applications shall parenthesize properly all such usages.

       system(expression)
	      Execute  the  command  given  by	expression in a manner equivalent to the system()
	      function defined in the System Interfaces volume of IEEE Std 1003.1-2001 and return
	      the exit status of the command.

       All forms of getline shall return 1 for successful input, zero for end-of-file, and -1 for
       an error.

       Where strings are used as the name of a file or pipeline,  the  application  shall  ensure
       that  the  strings  are	textually identical.  The terminology "same string value" implies
       that "equivalent strings", even those that differ only by  <space>s,  represent	different
       files.

   User-Defined Functions
       The awk language also provides user-defined functions. Such functions can be defined as:

	      function name([parameter, ...]) { statements }

       A  function can be referred to anywhere in an awk program; in particular, its use can pre-
       cede its definition. The scope of a function is global.

       Function parameters, if present, can be either scalars or arrays; the  behavior	is  unde-
       fined  if an array name is passed as a parameter that the function uses as a scalar, or if
       a scalar expression is passed as a parameter that the function uses as an array.  Function
       parameters shall be passed by value if scalar and by reference if array name.

       The  number  of parameters in the function definition need not match the number of parame-
       ters in the function call. Excess formal parameters can be used	as  local  variables.  If
       fewer  arguments  are supplied in a function call than are in the function definition, the
       extra parameters that are used in the function body  as	scalars  shall	evaluate  to  the
       uninitialized  value  until  they are otherwise initialized, and the extra parameters that
       are used in the function body as arrays shall be treated  as  uninitialized  arrays  where
       each element evaluates to the uninitialized value until otherwise initialized.

       When  invoking  a function, no white space can be placed between the function name and the
       opening parenthesis. Function calls can be nested and recursive calls  can  be  made  upon
       functions.  Upon  return  from any nested or recursive function call, the values of all of
       the calling function's parameters shall be unchanged, except for array  parameters  passed
       by  reference.  The  return statement can be used to return a value. If a return statement
       appears outside of a function definition, the behavior is undefined.

       In the function definition, <newline>s shall be optional  before  the  opening  brace  and
       after  the  closing brace. Function definitions can appear anywhere in the program where a
       pattern-action pair is allowed.

   Grammar
       The grammar in this section and the lexical conventions in  the	following  section  shall
       together  describe  the syntax for awk programs. The general conventions for this style of
       grammar are described in Grammar Conventions . A valid program can be represented  as  the
       non-terminal  symbol program in the grammar. This formal syntax shall take precedence over
       the preceding text syntax description.

	      %token NAME NUMBER STRING ERE
	      %token FUNC_NAME	 /* Name followed by '(' without white space. */

	      /* Keywords  */
	      %token	   Begin   End
	      /*	  'BEGIN' 'END' 			   */

	      %token	   Break   Continue   Delete   Do   Else
	      /*	  'break' 'continue' 'delete' 'do' 'else'  */

	      %token	   Exit   For	Function   If	In
	      /*	  'exit' 'for' 'function' 'if' 'in'	   */

	      %token	   Next   Print   Printf   Return   While
	      /*	  'next' 'print' 'printf' 'return' 'while' */

	      /* Reserved function names */
	      %token BUILTIN_FUNC_NAME
			  /* One token for the following:
			   * atan2 cos sin exp log sqrt int rand srand
			   * gsub index length match split sprintf sub
			   * substr tolower toupper close system
			   */
	      %token GETLINE
			  /* Syntactically different from other built-ins. */

	      /* Two-character tokens. */
	      %token ADD_ASSIGN SUB_ASSIGN MUL_ASSIGN DIV_ASSIGN MOD_ASSIGN POW_ASSIGN
	      /*     '+='	'-='	   '*='       '/='	 '%='	    '^=' */

	      %token OR   AND  NO_MATCH   EQ   LE   GE	 NE   INCR  DECR  APPEND
	      /*     '||' '&&' '!~' '==' '<=' '>=' '!=' '++'  '--'  '>>'   */

	      /* One-character tokens. */
	      %token '{' '}' '(' ')' '[' ']' ',' ';' NEWLINE
	      %token '+' '-' '*' '%' '^' '!' '>' '<' '|' '?' ':' '~' '$' '='

	      %start program
	      %%

	      program	       : item_list
			       | actionless_item_list
			       ;

	      item_list        : newline_opt
			       | actionless_item_list item terminator
			       | item_list	      item terminator
			       | item_list	    action terminator
			       ;

	      actionless_item_list : item_list		  pattern terminator
			       | actionless_item_list pattern terminator
			       ;

	      item	       : pattern action
			       | Function NAME	    '(' param_list_opt ')'
				     newline_opt action
			       | Function FUNC_NAME '(' param_list_opt ')'
				     newline_opt action
			       ;

	      param_list_opt   : /* empty */
			       | param_list
			       ;

	      param_list       : NAME
			       | param_list ',' NAME
			       ;

	      pattern	       : Begin
			       | End
			       | expr
			       | expr ',' newline_opt expr
			       ;

	      action	       : '{' newline_opt			     '}'
			       | '{' newline_opt terminated_statement_list   '}'
			       | '{' newline_opt unterminated_statement_list '}'
			       ;

	      terminator       : terminator ';'
			       | terminator NEWLINE
			       |	    ';'
			       |	    NEWLINE
			       ;

	      terminated_statement_list : terminated_statement
			       | terminated_statement_list terminated_statement
			       ;

	      unterminated_statement_list : unterminated_statement
			       | terminated_statement_list unterminated_statement
			       ;

	      terminated_statement : action newline_opt
			       | If '(' expr ')' newline_opt terminated_statement
			       | If '(' expr ')' newline_opt terminated_statement
				     Else newline_opt terminated_statement
			       | While '(' expr ')' newline_opt terminated_statement
			       | For '(' simple_statement_opt ';'
				    expr_opt ';' simple_statement_opt ')' newline_opt
				    terminated_statement
			       | For '(' NAME In NAME ')' newline_opt
				    terminated_statement
			       | ';' newline_opt
			       | terminatable_statement NEWLINE newline_opt
			       | terminatable_statement ';'	newline_opt
			       ;

	      unterminated_statement : terminatable_statement
			       | If '(' expr ')' newline_opt unterminated_statement
			       | If '(' expr ')' newline_opt terminated_statement
				    Else newline_opt unterminated_statement
			       | While '(' expr ')' newline_opt unterminated_statement
			       | For '(' simple_statement_opt ';'
				expr_opt ';' simple_statement_opt ')' newline_opt
				    unterminated_statement
			       | For '(' NAME In NAME ')' newline_opt
				    unterminated_statement
			       ;

	      terminatable_statement : simple_statement
			       | Break
			       | Continue
			       | Next
			       | Exit expr_opt
			       | Return expr_opt
			       | Do newline_opt terminated_statement While '(' expr ')'
			       ;

	      simple_statement_opt : /* empty */
			       | simple_statement
			       ;

	      simple_statement : Delete NAME '[' expr_list ']'
			       | expr
			       | print_statement
			       ;

	      print_statement  : simple_print_statement
			       | simple_print_statement output_redirection
			       ;

	      simple_print_statement : Print  print_expr_list_opt
			       | Print	'(' multiple_expr_list ')'
			       | Printf print_expr_list
			       | Printf '(' multiple_expr_list ')'
			       ;

	      output_redirection : '>'	  expr
			       | APPEND expr
			       | '|'	expr
			       ;

	      expr_list_opt    : /* empty */
			       | expr_list
			       ;

	      expr_list        : expr
			       | multiple_expr_list
			       ;

	      multiple_expr_list : expr ',' newline_opt expr
			       | multiple_expr_list ',' newline_opt expr
			       ;

	      expr_opt	       : /* empty */
			       | expr
			       ;

	      expr	       : unary_expr
			       | non_unary_expr
			       ;

	      unary_expr       : '+' expr
			       | '-' expr
			       | unary_expr '^'      expr
			       | unary_expr '*'      expr
			       | unary_expr '/'      expr
			       | unary_expr '%'      expr
			       | unary_expr '+'      expr
			       | unary_expr '-'      expr
			       | unary_expr	     non_unary_expr
			       | unary_expr '<'      expr
			       | unary_expr LE	     expr
			       | unary_expr NE	     expr
			       | unary_expr EQ	     expr
			       | unary_expr '>'      expr
			       | unary_expr GE	     expr
			       | unary_expr '~'      expr
			       | unary_expr NO_MATCH expr
			       | unary_expr In NAME
			       | unary_expr AND newline_opt expr
			       | unary_expr OR	newline_opt expr
			       | unary_expr '?' expr ':' expr
			       | unary_input_function
			       ;

	      non_unary_expr   : '(' expr ')'
			       | '!' expr
			       | non_unary_expr '^'	 expr
			       | non_unary_expr '*'	 expr
			       | non_unary_expr '/'	 expr
			       | non_unary_expr '%'	 expr
			       | non_unary_expr '+'	 expr
			       | non_unary_expr '-'	 expr
			       | non_unary_expr 	 non_unary_expr
			       | non_unary_expr '<'	 expr
			       | non_unary_expr LE	 expr
			       | non_unary_expr NE	 expr
			       | non_unary_expr EQ	 expr
			       | non_unary_expr '>'	 expr
			       | non_unary_expr GE	 expr
			       | non_unary_expr '~'	 expr
			       | non_unary_expr NO_MATCH expr
			       | non_unary_expr In NAME
			       | '(' multiple_expr_list ')' In NAME
			       | non_unary_expr AND newline_opt expr
			       | non_unary_expr OR  newline_opt expr
			       | non_unary_expr '?' expr ':' expr
			       | NUMBER
			       | STRING
			       | lvalue
			       | ERE
			       | lvalue INCR
			       | lvalue DECR
			       | INCR lvalue
			       | DECR lvalue
			       | lvalue POW_ASSIGN expr
			       | lvalue MOD_ASSIGN expr
			       | lvalue MUL_ASSIGN expr
			       | lvalue DIV_ASSIGN expr
			       | lvalue ADD_ASSIGN expr
			       | lvalue SUB_ASSIGN expr
			       | lvalue '=' expr
			       | FUNC_NAME '(' expr_list_opt ')'
				    /* no white space allowed before '(' */
			       | BUILTIN_FUNC_NAME '(' expr_list_opt ')'
			       | BUILTIN_FUNC_NAME
			       | non_unary_input_function
			       ;

	      print_expr_list_opt : /* empty */
			       | print_expr_list
			       ;

	      print_expr_list  : print_expr
			       | print_expr_list ',' newline_opt print_expr
			       ;

	      print_expr       : unary_print_expr
			       | non_unary_print_expr
			       ;

	      unary_print_expr : '+' print_expr
			       | '-' print_expr
			       | unary_print_expr '^'	   print_expr
			       | unary_print_expr '*'	   print_expr
			       | unary_print_expr '/'	   print_expr
			       | unary_print_expr '%'	   print_expr
			       | unary_print_expr '+'	   print_expr
			       | unary_print_expr '-'	   print_expr
			       | unary_print_expr	   non_unary_print_expr
			       | unary_print_expr '~'	   print_expr
			       | unary_print_expr NO_MATCH print_expr
			       | unary_print_expr In NAME
			       | unary_print_expr AND newline_opt print_expr
			       | unary_print_expr OR  newline_opt print_expr
			       | unary_print_expr '?' print_expr ':' print_expr
			       ;

	      non_unary_print_expr : '(' expr ')'
			       | '!' print_expr
			       | non_unary_print_expr '^'      print_expr
			       | non_unary_print_expr '*'      print_expr
			       | non_unary_print_expr '/'      print_expr
			       | non_unary_print_expr '%'      print_expr
			       | non_unary_print_expr '+'      print_expr
			       | non_unary_print_expr '-'      print_expr
			       | non_unary_print_expr	       non_unary_print_expr
			       | non_unary_print_expr '~'      print_expr
			       | non_unary_print_expr NO_MATCH print_expr
			       | non_unary_print_expr In NAME
			       | '(' multiple_expr_list ')' In NAME
			       | non_unary_print_expr AND newline_opt print_expr
			       | non_unary_print_expr OR  newline_opt print_expr
			       | non_unary_print_expr '?' print_expr ':' print_expr
			       | NUMBER
			       | STRING
			       | lvalue
			       | ERE
			       | lvalue INCR
			       | lvalue DECR
			       | INCR lvalue
			       | DECR lvalue
			       | lvalue POW_ASSIGN print_expr
			       | lvalue MOD_ASSIGN print_expr
			       | lvalue MUL_ASSIGN print_expr
			       | lvalue DIV_ASSIGN print_expr
			       | lvalue ADD_ASSIGN print_expr
			       | lvalue SUB_ASSIGN print_expr
			       | lvalue '=' print_expr
			       | FUNC_NAME '(' expr_list_opt ')'
				   /* no white space allowed before '(' */
			       | BUILTIN_FUNC_NAME '(' expr_list_opt ')'
			       | BUILTIN_FUNC_NAME
			       ;

	      lvalue	       : NAME
			       | NAME '[' expr_list ']'
			       | '$' expr
			       ;

	      non_unary_input_function : simple_get
			       | simple_get '<' expr
			       | non_unary_expr '|' simple_get
			       ;

	      unary_input_function : unary_expr '|' simple_get
			       ;

	      simple_get       : GETLINE
			       | GETLINE lvalue
			       ;

	      newline_opt      : /* empty */
			       | newline_opt NEWLINE
			       ;

       This grammar has several ambiguities that shall be resolved as follows:

	* Operator precedence and associativity shall be as described in Expressions in  Decreas-
	  ing Precedence in awk .

	* In  case  of ambiguity, an else shall be associated with the most immediately preceding
	  if that would satisfy the grammar.

	* In some contexts, a slash ( '/' ) that is used to surround an ERE  could  also  be  the
	  division  operator.  This  shall  be	resolved in such a way that wherever the division
	  operator could appear, a slash is assumed to be the division	operator.  (There  is  no
	  unary division operator.)

       One  convention	that might not be obvious from the formal grammar is where <newline>s are
       acceptable. There are several obvious placements such as terminating a  statement,  and	a
       backslash  can be used to escape <newline>s between any lexical tokens. In addition, <new-
       line>s without backslashes can follow a comma, an open brace, logical AND operator (  "&&"
       ),  logical OR operator ( "||" ), the do keyword, the else keyword, and the closing paren-
       thesis of an if, for, or while statement. For example:

	      { print  $1,
		       $2 }

   Lexical Conventions
       The lexical conventions for awk programs, with respect to the preceding grammar, shall  be
       as follows:

	1. Except as noted, awk shall recognize the longest possible token or delimiter beginning
	   at a given point.

	2. A comment shall consist of any characters beginning with the number sign character and
	   terminated  by, but excluding the next occurrence of, a <newline>. Comments shall have
	   no effect, except to delimit lexical tokens.

	3. The <newline> shall be recognized as the token NEWLINE.

	4. A backslash character immediately followed by a <newline> shall have no effect.

	5. The token STRING shall represent a string constant. A string constant shall begin with
	   the character ' .' Within a string constant, a backslash character shall be considered
	   to begin an escape sequence as specified in the table in the Base  Definitions  volume
	   of IEEE Std 1003.1-2001, Chapter 5, File Format Notation ( '\\' , '\a' , '\b' , '\f' ,
	   '\n' , '\r' , '\t' , '\v' ). In addition,  the  escape  sequences  in  Expressions  in
	   Decreasing Precedence in awk shall be recognized. A <newline> shall not occur within a
	   string constant. A string constant shall be terminated by the first	unescaped  occur-
	   rence  of the character '' after the one that begins the string constant. The value of
	   the string shall be the sequence of all unescaped  characters  and  values  of  escape
	   sequences between, but not including, the two delimiting '' characters.

	6. The	token  ERE  represents	an extended regular expression constant.  An ERE constant
	   shall begin with the slash character.  Within an ERE constant, a  backslash	character
	   shall  be considered to begin an escape sequence as specified in the table in the Base
	   Definitions volume of IEEE Std 1003.1-2001, Chapter 5, File Format Notation. In  addi-
	   tion,  the  escape  sequences  in Expressions in Decreasing Precedence in awk shall be
	   recognized. The application shall ensure that a <newline> does not occur within an ERE
	   constant. An ERE constant shall be terminated by the first unescaped occurrence of the
	   slash character after the one that begins  the  ERE	constant.  The	extended  regular
	   expression  represented  by	the  ERE  constant shall be the sequence of all unescaped
	   characters and values of escape sequences between, but not including, the two delimit-
	   ing slash characters.

	7. A  <blank>  shall have no effect, except to delimit lexical tokens or within STRING or
	   ERE tokens.

	8. The token NUMBER shall represent a numeric constant. Its form and numeric value  shall
	   be  equivalent to either of the tokens floating-constant or integer-constant as speci-
	   fied by the ISO C standard, with the following exceptions:

	    a. An integer constant cannot begin with 0x or include the hexadecimal digits  'a'	,
	       'b' , 'c' , 'd' , 'e' , 'f' , 'A' , 'B' , 'C' , 'D' , 'E' , or 'F' .

	    b. The value of an integer constant beginning with 0 shall be taken in decimal rather
	       than octal.

	    c. An integer constant cannot include a suffix ( 'u' , 'U' , 'l' , or 'L' ).

	    d. A floating constant cannot include a suffix ( 'f' , 'F' , 'l' , or 'L' ).

       If the value is too large or too small to be representable (see Concepts Derived from  the
       ISO C Standard ), the behavior is undefined.

	9. A  sequence	of  underscores,  digits, and alphabetics from the portable character set
	   (see the Base Definitions volume of IEEE Std 1003.1-2001, Section 6.1, Portable  Char-
	   acter Set), beginning with an underscore or alphabetic, shall be considered a word.

       10. The	following  words  are keywords that shall be recognized as individual tokens; the
	   name of the token is the same as the keyword:

	BEGIN		delete		END		function	in		printf
	break		do		exit		getline 	next		return
	continue	else		for		if		print		while

       11. The following words are names of built-in functions and shall  be  recognized  as  the
	   token BUILTIN_FUNC_NAME:

	atan2		gsub		log		split		sub		toupper
	close		index		match		sprintf 	substr
	cos		int		rand		sqrt		system
	exp		length		sin		srand		tolower

       The above-listed keywords and names of built-in functions are considered reserved words.

       12. The	token  NAME shall consist of a word that is not a keyword or a name of a built-in
	   function and is not followed immediately (without any delimiters) by the  '('  charac-
	   ter.

       13. The	token  FUNC_NAME  shall  consist  of  a word that is not a keyword or a name of a
	   built-in function, followed immediately (without any delimiters) by the '(' character.
	   The '(' character shall not be included as part of the token.

       14. The following two-character sequences shall be recognized as the named tokens:

			       Token Name   Sequence   Token Name   Sequence
			       ADD_ASSIGN   +=	       NO_MATCH     !~
			       SUB_ASSIGN   -=	       EQ	    ==
			       MUL_ASSIGN   *=	       LE	    <=
			       DIV_ASSIGN   /=	       GE	    >=
			       MOD_ASSIGN   %=	       NE	    !=
			       POW_ASSIGN   ^=	       INCR	    ++
			       OR	    ||	       DECR	    --
			       AND	    &&	       APPEND	    >>

       15. The	following  single  characters  shall  be recognized as tokens whose names are the
	   character:

	   <newline> { } ( ) [ ] , ; + - * % ^ ! > < | ? : ~ $ =

       There is a lexical ambiguity between the token ERE and the tokens '/' and DIV_ASSIGN. When
       an  input  sequence begins with a slash character in any syntactic context where the token
       '/' or DIV_ASSIGN could appear as the next token in a valid program, the longer	of  those
       two  tokens  that  can  be  recognized shall be recognized. In any other syntactic context
       where the token ERE could appear as the next token in a valid program, the token ERE shall
       be recognized.

EXIT STATUS
       The following exit values shall be returned:

	0     All input files were processed successfully.

       >0     An error occurred.

       The exit status can be altered within the program by using an exit expression.

CONSEQUENCES OF ERRORS
       If  any file operand is specified and the named file cannot be accessed, awk shall write a
       diagnostic message to standard error and terminate without any further action.

       If the program specified by either the program operand or a  progfile  operand  is  not	a
       valid  awk  program  (as  specified  in the EXTENDED DESCRIPTION section), the behavior is
       undefined.

       The following sections are informative.

APPLICATION USAGE
       The index, length, match, and substr functions should not be confused with  similar  func-
       tions  in the ISO C standard; the awk versions deal with characters, while the ISO C stan-
       dard deals with bytes.

       Because the concatenation operation is represented by adjacent expressions rather than  an
       explicit  operator, it is often necessary to use parentheses to enforce the proper evalua-
       tion precedence.

EXAMPLES
       The awk program specified in the command line is  most  easily  specified  within  single-
       quotes  (for  example, programs commonly contain characters that are special to the shell,
       including double-quotes.  In the cases where an awk program contains single-quote  charac-
       ters, it is usually easiest to specify most of the program as strings within single-quotes
       concatenated by the shell with quoted single-quote characters. For example:

	      awk '/'\''/ { print "quote:", $0 }'

       prints all lines from the standard input containing  a  single-quote  character,  prefixed
       with quote:.

       The following are examples of simple awk programs:

	1. Write to the standard output all input lines for which field 3 is greater than 5:

	   $3 > 5

	2. Write every tenth line:

	   (NR % 10) == 0

	3. Write any line with a substring matching the regular expression:

	   /(G|D)(2[0-9][[:alpha:]]*)/

	4. Print  any  line  with a substring containing a 'G' or 'D' , followed by a sequence of
	   digits and characters.  This example uses character classes digit and alpha	to  match
	   language-independent digit and alphabetic characters respectively:

	   /(G|D)([[:digit:][:alpha:]]*)/

	5. Write any line in which the second field matches the regular expression and the fourth
	   field does not:

	   $2 ~ /xyz/ && $4 !~ /xyz/

	6. Write any line in which the second field contains a backslash:

	   $2 ~ /\\/

	7. Write any line in which the second field contains a	backslash.  Note  that	backslash
	   escapes  are  interpreted  twice; once in lexical processing of the string and once in
	   processing the regular expression:

	   $2 ~ "\\\\"

	8. Write the second to the last and the last field in each line. Separate the fields by a
	   colon:

	   {OFS=":";print $(NF-1), $NF}

	9. Write  the line number and number of fields in each line. The three strings represent-
	   ing the line number, the colon, and the number of fields  are  concatenated	and  that
	   string is written to standard output:

	   {print NR ":" NF}

       10. Write lines longer than 72 characters:

	   length($0) > 72

       11. Write the first two fields in opposite order separated by OFS:

	   { print $2, $1 }

       12. Same, with input fields separated by a comma or <space>s and <tab>s, or both:

	   BEGIN { FS = ",[ \t]*|[ \t]+" }
		 { print $2, $1 }

       13. Add up the first column, print sum, and average:

		{s += $1 }
	   END	 {print "sum is ", s, " average is", s/NR}

       14. Write fields in reverse order, one per line (many lines out for each line in):

	   { for (i = NF; i > 0; --i) print $i }

       15. Write all lines between occurrences of the strings start and stop:

	   /start/, /stop/

       16. Write all lines whose first field is different from the previous one:

	   $1 != prev { print; prev = $1 }

       17. Simulate echo:

	   BEGIN  {
		   for (i = 1; i < ARGC; ++i)
		   printf("%s%s", ARGV[i], i==ARGC-1?"\n":" ")
	   }

       18. Write the path prefixes contained in the PATH environment variable, one per line:

	   BEGIN  {
		   n = split (ENVIRON["PATH"], path, ":")
		   for (i = 1; i <= n; ++i)
		   print path[i]
	   }

       19. If there is a file named input containing page headers of the form:

	   Page #

       and a file named program that contains:

	      /Page/   { $2 = n++; }
		       { print }

       then the command line:

	      awk -f program n=5 input

       prints the file input, filling in page numbers starting at 5.

RATIONALE
       This  description is based on the new awk, "nawk", (see the referenced The AWK Programming
       Language), which introduced a number of new features to the historical awk:

	1. New keywords: delete, do, function, return

	2. New built-in functions: atan2, close, cos, gsub, match, rand, sin, srand, sub, system

	3. New predefined variables: FNR, ARGC, ARGV, RSTART, RLENGTH, SUBSEP

	4. New expression operators: ?, :, ,, ^

	5. The FS variable and the third argument to  split,  now  treated  as	extended  regular
	   expressions.

	6. The	operator  precedence, changed to more closely match the C language.  Two examples
	   of code that operate differently are:

	   while ( n /= 10 > 1) ...
	   if (!"wk" ~ /bwk/) ...

       Several features have been added based on newer implementations of awk:

	* Multiple instances of -f progfile are permitted.

	* The new option -v assignment.

	* The new predefined variable ENVIRON.

	* New built-in functions toupper and tolower.

	* More formatting capabilities are added to printf to match the ISO C standard.

       The overall awk syntax has always been based on the C language, with a few  features  from
       the  shell  command language and other sources. Because of this, it is not completely com-
       patible with any other language, which has caused confusion for some users.  It is not the
       intent  of the standard developers to address such issues.  A few relatively minor changes
       toward making the language more compatible with the ISO C  standard  were  made;  most  of
       these  changes are based on similar changes in recent implementations, as described above.
       There remain several C-language conventions that are not in awk. One of the  notable  ones
       is  the	comma  operator,  which is commonly used to specify multiple expressions in the C
       language for statement. Also, there are various places where awk is more restrictive  than
       the C language regarding the type of expression that can be used in a given context. These
       limitations are due to the different features that the awk language does provide.

       Regular expressions in awk have been extended somewhat from historical implementations  to
       make   them   a	 pure	superset   of	extended   regular  expressions,  as  defined  by
       IEEE Std 1003.1-2001 (see the Base Definitions  volume  of  IEEE Std 1003.1-2001,  Section
       9.4, Extended Regular Expressions).  The main extensions are internationalization features
       and interval expressions.  Historical implementations of awk have long supported backslash
       escape  sequences  as an extension to extended regular expressions, and this extension has
       been retained despite inconsistency with other utilities. The number of	escape	sequences
       recognized in both extended regular expressions and strings has varied (generally increas-
       ing with time) among implementations. The set specified by  IEEE Std 1003.1-2001  includes
       most sequences known to be supported by popular implementations and by the ISO C standard.
       One sequence that is not supported is hexadecimal value escapes beginning with '\x' . This
       would  allow  values  expressed	in more than 9 bits to be used within awk as in the ISO C
       standard. However, because this syntax has a non-deterministic length, it does not  permit
       the  subsequent	character to be a hexadecimal digit. This limitation can be dealt with in
       the C language by the use of lexical string concatenation. In the awk language, concatena-
       tion  could  also  be  a  solution  for	strings, but not for extended regular expressions
       (either lexical ERE tokens or strings used dynamically as regular expressions). Because of
       this limitation, the feature has not been added to IEEE Std 1003.1-2001.

       When  a string variable is used in a context where an extended regular expression normally
       appears (where the lexical token ERE is used in the grammar) the string does  not  contain
       the literal slashes.

       Some versions of awk allow the form:

	      func name(args, ... ) { statements }

       This  has  been deprecated by the authors of the language, who asked that it not be speci-
       fied.

       Historical implementations of awk produce an error if a next statement is  executed  in	a
       BEGIN action, and cause awk to terminate if a next statement is executed in an END action.
       This behavior has not been documented, and it was not believed that it  was  necessary  to
       standardize it.

       The  specification  of conversions between string and numeric values is much more detailed
       than in the documentation of historical implementations or in the referenced The AWK  Pro-
       gramming Language.  Although most of the behavior is designed to be intuitive, the details
       are necessary to ensure compatible behavior from different implementations. This is  espe-
       cially  important  in  relational  expressions  since  the types of the operands determine
       whether a string or numeric comparison is performed. From the perspective of  an  applica-
       tion  writer,  it  is usually sufficient to expect intuitive behavior and to force conver-
       sions (by adding zero or concatenating a null string) when the type of an expression  does
       not  obviously match what is needed. The intent has been to specify historical practice in
       almost all cases. The one exception is that, in historical implementations, variables  and
       constants  maintain both string and numeric values after their original value is converted
       by any use. This means that referencing a variable or constant can  have  unexpected  side
       effects. For example, with historical implementations the following program:

	      {
		  a = "+2"
		  b = 2
		  if (NR % 2)
		      c = a + b
		  if (a == b)
		      print "numeric comparison"
		  else
		      print "string comparison"
	      }

       would  perform  a numeric comparison (and output numeric comparison) for each odd-numbered
       line, but perform a string comparison (and output string comparison)  for  each	even-num-
       bered  line.  IEEE Std 1003.1-2001  ensures that comparisons will be numeric if necessary.
       With historical implementations, the following program:

	      BEGIN {
		  OFMT = "%e"
		  print 3.14
		  OFMT = "%f"
		  print 3.14
	      }

       would output "3.140000e+00" twice, because in the  second  print  statement  the  constant
       "3.14"  would  have  a  string  value  from  the previous conversion. IEEE Std 1003.1-2001
       requires that the output of the second print statement be "3.140000"  .	The  behavior  of
       historical implementations was seen as too unintuitive and unpredictable.

       It  was	pointed  out  that with the rules contained in early drafts, the following script
       would print nothing:

	      BEGIN {
		  y[1.5] = 1
		  OFMT = "%e"
		  print y[1.5]
	      }

       Therefore, a new variable, CONVFMT, was introduced. The OFMT variable is now restricted to
       affecting  output  conversions of numbers to strings and CONVFMT is used for internal con-
       versions, such as comparisons or array indexing. The default value is the same as that for
       OFMT,  so unless a program changes CONVFMT (which no historical program would do), it will
       receive the historical behavior associated with internal string conversions.

       The POSIX awk lexical and syntactic conventions are specified more formally than in  other
       sources. Again the intent has been to specify historical practice. One convention that may
       not be obvious from the formal grammar as in other verbal descriptions is where <newline>s
       are  acceptable. There are several obvious placements such as terminating a statement, and
       a backslash can be used to escape <newline>s between  any  lexical  tokens.  In	addition,
       <newline>s without backslashes can follow a comma, an open brace, a logical AND operator (
       "&&" ), a logical OR operator ( "||" ), the do keyword, the else keyword, and the  closing
       parenthesis of an if, for, or while statement. For example:

	      { print $1,
		      $2 }

       The  requirement that awk add a trailing <newline> to the program argument text is to sim-
       plify the grammar, making it match a text file in form. There is no way for an application
       or test suite to determine whether a literal <newline> is added or whether awk simply acts
       as if it did.

       IEEE Std 1003.1-2001 requires several changes from historical implementations in order  to
       support internationalization. Probably the most subtle of these is the use of the decimal-
       point character, defined by the LC_NUMERIC category of the locale, in  representations  of
       floating-point  numbers.   This	locale-specific  character is used in recognizing numeric
       input, in converting between strings and numeric values, and in	formatting  output.  How-
       ever, regardless of locale, the period character (the decimal-point character of the POSIX
       locale) is the decimal-point character recognized in processing	awk  programs  (including
       assignments in command line arguments). This is essentially the same convention as the one
       used in the ISO C standard. The difference is that the C language includes the setlocale()
       function, which permits an application to modify its locale. Because of this capability, a
       C application begins executing with its locale set to the C locale, and only  executes  in
       the  environment-specified  locale  after an explicit call to setlocale(). However, adding
       such an elaborate  new  feature	to  the  awk  language	was  seen  as  inappropriate  for
       IEEE Std 1003.1-2001.  It  is possible to execute an awk program explicitly in any desired
       locale by setting the environment in the shell.

       The undefined behavior resulting from NULs in extended regular expressions  allows  future
       extensions for the GNU gawk program to process binary data.

       The behavior in the case of invalid awk programs (including lexical, syntactic, and seman-
       tic errors) is undefined because it was considered overly limiting on  implementations  to
       specify.  In most cases such errors can be expected to produce a diagnostic and a non-zero
       exit status. However, some implementations may choose to extend the language in ways  that
       make use of certain invalid constructs. Other invalid constructs might be deemed worthy of
       a warning, but otherwise cause some reasonable behavior.  Still other  constructs  may  be
       very  difficult	to detect in some implementations.  Also, different implementations might
       detect a given error during an initial parsing of the program (before  reading  any  input
       files)  while  others might detect it when executing the program after reading some input.
       Implementors should be aware that diagnosing errors as early  as  possible  and	producing
       useful  diagnostics  can  ease  debugging of applications, and thus make an implementation
       more usable.

       The unspecified behavior from using multi-character RS values is to allow possible  future
       extensions  based  on  extended regular expressions used for record separators. Historical
       implementations take the first character of the string and ignore the others.

       Unspecified behavior when split( string, array, <null>) is used is  to  allow  a  proposed
       future extension that would split up a string into an array of individual characters.

       In  the	context of the getline function, equally good arguments for different precedences
       of the | and < operators can be made. Historical practice has been that:

	      getline < "a" "b"

       is parsed as:

	      ( getline < "a" ) "b"

       although many would argue that the intent was that the file ab should be read. However:

	      getline < "x" + 1

       parses as:

	      getline < ( "x" + 1 )

       Similar problems occur with the | version of getline, particularly in combination with  $.
       For example:

	      $"echo hi" | getline

       (This  situation  is  particularly  problematic	when used in a print statement, where the
       |getline part might be a redirection of the print.)

       Since in most cases such constructs are not (or at least should not) be used (because they
       have a natural ambiguity for which there is no conventional parsing), the meaning of these
       constructs has been made explicitly unspecified. (The effect is that a conforming applica-
       tion  that  runs  into  the  problem  must  parenthesize  to resolve the ambiguity.) There
       appeared to be few if any actual uses of such constructs.

       Grammars can be written that would cause an error under these circumstances.  Where  back-
       wards-compatibility  is not a large consideration, implementors may wish to use such gram-
       mars.

       Some historical implementations have allowed some built-in functions to be called  without
       an  argument  list,  the  result being a default argument list chosen in some "reasonable"
       way. Use of length as a synonym for length($0) is the only one  of  these  forms  that  is
       thought	to  be widely known or widely used; this particular form is documented in various
       places (for example, most historical awk reference pages, although not in  the  referenced
       The  AWK  Programming Language) as legitimate practice. With this exception, default argu-
       ment lists have always been undocumented and vaguely defined, and it is not at  all  clear
       how  (or  if)  they  should  be generalized to user-defined functions.  They add no useful
       functionality and preclude possible future extensions that might need  to  name	functions
       without	calling  them.	 Not  standardizing  them seems the simplest course. The standard
       developers considered that length merited special treatment, however, since  it	has  been
       documented  in  the past and sees possibly substantial use in historical programs. Accord-
       ingly, this usage has been made legitimate, but Issue 5 removed	the  obsolescent  marking
       for  XSI-conforming  implementations  and many otherwise conforming applications depend on
       this feature.

       In sub and gsub, if repl is a string literal (the lexical token STRING), then two consecu-
       tive  backslash	characters should be used in the string to ensure a single backslash will
       precede the ampersand when the resultant string is passed to the function.  (For  example,
       to specify one literal ampersand in the replacement string, use gsub( ERE, "\\&" ).)

       Historically  the only special character in the repl argument of sub and gsub string func-
       tions was the ampersand ( '&' ) character and preceding it with	the  backslash	character
       was used to turn off its special meaning.

       The  description  in the ISO POSIX-2:1993 standard introduced behavior such that the back-
       slash character was another special character and it was unspecified  whether  there  were
       any  other  special  characters. This description introduced several portability problems,
       some of which are described below, and so it has been replaced with  the  more  historical
       description. Some of the problems include:

	* Historically,  to create the replacement string, a script could use gsub( ERE, "\\&" ),
	  but with the ISO POSIX-2:1993 standard wording, it was  necessary  to  use  gsub(  ERE,
	  "\\\\&"  ).  Backslash characters are doubled here because all string literals are sub-
	  ject to lexical analysis, which would reduce each pair of  backslash	characters  to	a
	  single backslash before being passed to gsub.

	* Since  it  was  unspecified  what  the special characters were, for portable scripts to
	  guarantee that characters are printed literally, each character had to be preceded with
	  a  backslash.  (For example, a portable script had to use gsub( ERE, "\\h\\i" ) to pro-
	  duce a replacement string of "hi" .)

       The description for comparisons in the ISO POSIX-2:1993 standard did not properly describe
       historical  practice  because of the way numeric strings are compared as numbers. The cur-
       rent rules cause the following code:

	      if (0 == "000")
		  print "strange, but true"
	      else
		  print "not true"

       to do a numeric comparison, causing the if to succeed. It should  be  intuitively  obvious
       that  this is incorrect behavior, and indeed, no historical implementation of awk actually
       behaves this way.

       To fix this problem, the definition of numeric string was enhanced to include  only  those
       values obtained from specific circumstances (mostly external sources) where it is not pos-
       sible to determine unambiguously whether the value  is  intended  to  be  a  string  or	a
       numeric.

       Variables that are assigned to a numeric string shall also be treated as a numeric string.
       (For example, the notion of a numeric string can be  propagated	across	assignments.)  In
       comparisons,  all  variables having the uninitialized value are to be treated as a numeric
       operand evaluating to the numeric value zero.

       Uninitialized variables include all types of variables including scalars, array	elements,
       and fields. The definition of an uninitialized value in Variables and Special Variables is
       necessary to describe the value placed on uninitialized variables and on fields	that  are
       valid  (for example, < $NF) but have no characters in them and to describe how these vari-
       ables are to be used in comparisons. A valid field, such as $1, that has no characters  in
       it can be obtained from an input line of "\t\t" when FS= '\t' . Historically, the compari-
       son ( $1<10) was done numerically after evaluating $1 to the value zero.

       The phrase "... also shall have the numeric value of the numeric string" was removed  from
       several	sections  of  the  ISO POSIX-2:1993  standard because is specifies an unnecessary
       implementation detail. It is not necessary for IEEE Std 1003.1-2001 to specify that  these
       objects	be  assigned  two  different  values.  It is only necessary to specify that these
       objects may evaluate to two different values depending on context.

       The description of numeric string processing is based on the behavior of the atof()  func-
       tion  in  the  ISO C  standard. While it is not a requirement for an implementation to use
       this function, many historical implementations of awk do. In the ISO C standard, floating-
       point  constants  use a period as a decimal point character for the language itself, inde-
       pendent of the current locale, but the atof() function and the associated  strtod()  func-
       tion  use  the  decimal	point  character of the current locale when converting strings to
       numeric values. Similarly in awk, floating-point constants in an awk script use	a  period
       independent  of	the  locale,  but  input  strings  use the decimal point character of the
       locale.

FUTURE DIRECTIONS
       None.

SEE ALSO
       Grammar	Conventions  ,	grep  ,  lex  ,  sed  ,   the	System	 Interfaces   volume   of
       IEEE Std 1003.1-2001, atof(), exec, popen(), setlocale(), strtod()

COPYRIGHT
       Portions  of  this  text  are  reprinted  and  reproduced in electronic form from IEEE Std
       1003.1, 2003 Edition, Standard for Information Technology  --  Portable	Operating  System
       Interface  (POSIX), The Open Group Base Specifications Issue 6, Copyright (C) 2001-2003 by
       the Institute of Electrical and Electronics Engineers, Inc and  The  Open  Group.  In  the
       event  of  any  discrepancy  between this version and the original IEEE and The Open Group
       Standard, the original IEEE and The Open Group Standard is the referee document. The orig-
       inal Standard can be obtained online at http://www.opengroup.org/unix/online.html .

IEEE/The Open Group			       2003					   AWK(P)
Unix & Linux Commands & Man Pages : ©2000 - 2018 Unix and Linux Forums


All times are GMT -4. The time now is 09:03 AM.