nawk(1) ultrix man page

nawk(1) 						      General Commands Manual							   nawk(1)

Name
       nawk - data transformation, report generation language

Syntax
       nawk [ -f programfile ] [ -Fs ] [ program ] [ var=value...  ] [ file ...  ]

Description
       The  language  is  a file-processing language which is well-suited to data manipulation and retrieval of information from text files.  This
       reference page provides a full technical description of if you are unfamiliar with the language, you will probably find it helpful to  read
       the Guide to the nawk Utility before reading the following material.

       A program consists of any number of user-defined functions and `rules' of the form:
       pattern {action}
       There are two ways to specify the program:

       (a)  Directly on the command line.  In this case, the program is a single command line argument, usually enclosed in apostrophes

       (b)  By	using the -f programfile option (where programfile contains the program).  More than one -f option can appear on the command line.
	    The program will consist of the concatenation of the contents of all the specified programfiles.  You can use - in	place  of  a  file
	    name, to obtain input from the standard input.

       The input data manipulated by the program is provided in files specified on the command line.  If no such files are specified, data is read
       from the standard input.  You can also specify a file name of - to mean the standard input.

       Input to is divided into records.  By default, records are separated by new-line characters; however, you can specify  a  different  record
       separator if you wish.

       One  at	a  time, and in order, each input record is compared with the pattern of every `rule' in the program.  When a pattern matches, the
       action part of the rule is performed on the current input record. Patterns and actions often refer to separate fields within a record.	By
       default,  fields are separated by white space (blanks, new-lines, or horizontal tab characters); however, you can specify a different field
       separator string using the -Fs option (see Input).

       You can omit the pattern or action part of a rule (but not both).  If pattern is omitted, the action is performed on every input record (as
       if every record matches).  If action is omitted, every record matching the pattern will be written to the standard output.

       If a line in a program contains a `#' character, the `#' and everything after it is considered to be a comment.

       Program	lines  can be continued by adding a backslash `' to the end of the line.  Statement lines ending with a comma `,', double or-bars
       `||', or double ampersands `&&', are automatically continued.

Options
       -f programfile
	      Tells to obtain its program from the specified file.  There can be more than one of these on the command line.

       -Fs    Says that s is the field separator character within records.

   Variables and Expressions
       There are three types of variables in identifiers, fields, and array elements.

       An identifier is a sequence of letters, digits, and underscores beginning with a letter or an underscore.

       Fields are described in the Input subsection.

       Arrays are associative collections of values called the elements of the array.  Array elements are referenced with constructs of the form
       identifier[subscript]
       where subscript has the form expr or expr,expr,...  Each such expr can have any string value.  Arrays with  multiple  expr  subscripts  are
       implemented  by concatenating the string values of each expr with a separator character SUBSEP separating multiple expr.  The initial value
       of SUBSEP is set to `34' (ASCII field separator).

       Fields and identifiers are sometimes called scalar variables to distinguish them from arrays.

       Variables are not declared and need not be initialized.	The value of an uninitialized variable is the empty string.  Variables can be ini-
       tialized on the command line using
       var=value
       Such  initializations  can be interspersed with the names of input files on the command line.  Initializations and input files will be pro-
       cessed in the order they appear on the command line.  For example, the command
       nawk -f progfile A=1 f1 f2 A=2 f3
       sets A to 1 before input is read from f1 and sets A to 2 before input is read from f3.

       Certain built-in variables have special meaning to as described in later sections.

       Expressions consist of constants, variables, functions, regular expressions and `subscript in array' conditions (see below)  combined  with
       operators.   Each  variable  and  expression  has a string value and a corresponding numeric value; the value appropriate to the context is
       used.  If a string is used in a numeric context, and the contents of the string cannot be interpreted as  a  number,  the  `value'  of  the
       string is taken to be zero.

       Numeric constants are sequences of decimal digits.

       String constants are quoted, as in "x".	Escape sequences accepted in literal strings are:

	      Escape   ASCII Character
	      -------------------------------
	      a       audible bell
	             backspace
	      f       formfeed
	      
       new-line
	      
       carriage return
	      	       horizontal tab
	      v       vertical tab
	      ooo     octal value ooo
	      xdd     hexadecimal value dd
	      "       quotation mark
	      c       any other character c

       The  regular  expression  syntax understood by is the extended regular expressions of the utility described in Characters enclosed in slash
       characters `/' are compiled as regular expressions when the program is read.  In addition, literal strings and variables are interpreted as
       dynamic	regular  expressions  on  the  right side of a `~' or `!~' operator, or as certain arguments to built-in matching and substitution
       functions.  Note that when literal strings are used as regular expressions, extra backslashes  are  needed  to  escape  regular	expression
       metacharacters because the backslash is also the literal string escape character.

       The `subscript in array' condition is defined as:
       index in array
       where index looks like expr or (expr,...,expr).	This condition evaluates to 1 if the string value of index is a subscript of array, and to
       0 otherwise.  This is a way to determine if an array element exists.  If the element does not exist, this condition will not create it.

   Symbol Table
       The symbol table can be accessed through the built-in array SYMTAB.
       SYMTAB[expr]
       is equivalent to the variable named by the evaluation of expr.  For example,
       SYMTAB["var"]
       is a synonym for the variable var.

   Environment
       A program can determine its initial environment by examining the ENVIRON array.	If the environment consists of entries of the form:
       name=value
       then
       ENVIRON[name]
       has string value
       "value"
       For example, the following program is equivalent to the default output of
       BEGIN   {
	       for (i in ENVIRON)
		       printf("%s=%s
", i, ENVIRON[i])
	       exit
       }

   Operators
       The usual precedence order of arithmetic operations is followed unless overridden with parentheses; a table giving the order of	operations
       appears at the end of the Guide to the nawk Utility.  The unary operators are

	      -    Negation
	      +    Nothing (place holder)
	      --   Decrement by one
	      ++   Increment by one

       where the `++' and `--' operators can be used as either postfix or prefix operators, as in C.

       The binary arithmetic operators are

	      +   Addition
	      -   Subtraction
	      *   Multiplication
	      /   Division
	      %   Modulus
	      ^   Exponentiation

       The conditional operator
       expr ? expr1 : expr2
       evaluates to expr1 if the value of expr is non-zero, and to expr2 otherwise.

       If two expressions are not separated by an operator, their string values are concatenated.

       The operator `~' yields 1 (true) if the regular expression on the right side matches the string on the left side.  The operator `!~' yields
       1 when the right side has no match on the left.	To illustrate:
       $2 ~ /[0-9]/
       selects any line where the second field contains at least one digit.  Any string or variable on the right side of `~'  or  `!~'	is  inter-
       preted as a dynamic regular expression.

       The relational operators are the usual `<', `<=', `>', `>=', `==', and `!='.

       The boolean operators are `||' (or), `&&' (and), and `!' (not).

       Values can be assigned to a variable with
       var = expr
       If op is a binary arithmetic operator,
       var op= expr
       is equivalent to
       var = var op expr

   Command Line Arguments
       The  built-in  variable	ARGC is set to the number of command line arguments.  The built-in array ARGV has elements subscripted with digits
       from zero to ARGC-1, giving command line arguments in the order they appeared on the command line.

       The ARGC count and the ARGV vector do not include command line options (beginning with `-') or the program file (following They do  include
       the name of the command itself, initialization statements of the form
       var=value
       and the names of input data files.

       The  language  actually creates ARGC and ARGV before doing anything else.  It then walks through ARGV processing the arguments.	If an ele-
       ment of ARGV is the empty string, it is simply skipped.	If it contains an equals sign `=', it is interpreted as a variable assignment.	If
       it is a minus sign `-', it stands for the standard input and input is immediately read from the standard input until end-of-file is encoun-
       tered.  Otherwise, the argument is taken to be a file name; input will be read from that file until end-of-file is reached.  Note that  the
       program	is  executed  by `walking through' ARGV in this way; thus if the program changes ARGV, different files can be read and assignments
       made.

   Input
       Input is divided into records.  Each record is separated from the next with a record separator character.  The value of the built-in  vari-
       able  RS  gives the current record separator character; by default, it begins as the new-line `
'.  If you assign a different character to
       RS, will use that as the record separator character from that point on.

       Records are divided into fields.  Each field is separated from the next with a field separator string, given by the value of  the  built-in
       variable  FS.   You can set a specific separator string by assigning a value to FS or by specifying the -Fs option on the command line.	FS
       can be be assigned a regular expression. For example,
       FS = "[,:$]"
       says that fields can be separated by commas, colons, or dollar signs.  As a special case, assigning FS a string	containing  only  a  blank
       character  sets	the  field separator to white space.  In this case, any sequence of contiguous space and/or tab characters is considered a
       single field separator.	This is the default for FS.  However, if FS is assigned a string containing any other  character,  that  character
       designates the start of a new field.  For example, if we set
       FS="	"
       (the tab character),
       texta 	 textb 	  	  	 textc
       contains  five  fields,	two of which only contain blanks.  With the default setting, the above would only contain three fields because the
       sequence of multiple blanks and tabs would be considered a single separator.

       Various pieces of information about input are provided by the built-in variables listed below.

       NF	   Number of fields in the current record
       NR	   Number of records read so far
       FILENAME    Name of file containing current record
       FNR	   Number of records read from current file

       Field specifiers have the form $i where i runs from 1 through NF.  Such a field specifier refers to the ith  field  of  the  current  input
       record.	$0 (zero) refers to the entire current input record.

       The  getline  function can read a value for a variable or $0 from the current input, from a file, or from a pipe.  The result of getline is
       an integer indicating whether the read operation was successful.  A value of 1 indicates success; 0 indicates end-of-file encountered;  and
       -1 indicates that an error occurred. Possible forms for getline are:

       getline
	    Reads next input record into $0 and splits the record into fields.	NF, NR, and FNR are set appropriately.

       getline var
	    Reads  next  input	record	into the variable var.	The record is not split into fields (which means that the current $i values do not
	    change).  NR and FNR are set appropriately.

       getline <expr
	    Interprets the string value of expr to be a file name.  The next record from that file is read into $0 and split into fields.   NF	is
	    set appropriately.

       getline var <expr
	    Interprets	the string value of expr to be a file name, and reads the next record from that file into the variable var.  The record is
	    not split into fields.

       expr | getline
	    Interprets the string value of expr as a command line to be executed.  Output from this command is piped into getline, and	read  into
	    $0 in a manner similar to getline <expr.  See the SYSTEM FUNCTION section for additional details.

       expr | getline var
	    Executes the string value of expr as a command and pipes the output of the command into getline.  The result is similar to getline var
	    <expr.

       close(expr)
	    Only a limited number of files and pipes can be open at one time.  This function will close open files or pipes.  The expr must be one
	    that  came	before	`|' or after `>' in getline, or after `>', `>>', or `|' in print or printf as described in the Output section.	By
	    closing files and pipes that are no longer needed, you can use any number of files and pipes in the course of executing a program.

   Built-In Arithmetic Functions
       int(expr)
	   Returns the integer part of the numeric value of expr.  If (expr) is omitted, the integer part of $0 is returned.

       exp(expr), log(expr), sqrt(expr)
	   Returns the exponential, natural logarithm, and square root of the numeric value of expr.  If (expr) is omitted, $0 is used.

       sin(expr), cos(expr)
	   Returns the sine and cosine of the numeric value of expr (interpreted as an angle in radians).

       atan2(expr1, expr2)
	   Returns the arctangent of expr1/expr2 in the range of -n through n.

       rand()
	   Returns a random floating-point number in the range 0 through 1.

       srand(expr)
	   Sets the seed of the rand function to the integer value of expr.  If (expr) is omitted, sets a default seed (which  is  the	same  each
	   time is invoked).

   Built-In String Functions
       len = length(expr)
	      Returns the number of characters in the string value of expr.  If (expr) is omitted, $0 is used.

       n = split(string, array, regexp)
	      Splits  the string into fields.  The expression regexp is a regular expression giving the field separator string for the purposes of
	      this operation.  The elements of array are assigned the separated fields in order; subscripts for array begin at 1.  All other  ele-
	      ments  of array are discarded.  The result of split is the number of fields into which string was divided (which is also the maximum
	      subscript for array).  Note that regexp divides the record in the same way that the FS field separator string does.   If	regexp	is
	      omitted in the call to split, the current value of FS will be used.

       str = substr(string, m, len)
	      Returns the substring of string that begins in position m and is at most len characters long.  The first character of the string has
	      m equal to one.  If len is omitted, the rest of string is returned.

       pos = index(s1, s2)
	      Returns the position of the first occurrence of string s2 in string s1; if s2 is not found in s1, index returns zero.

       pos = match(string, regexp)
	      Searches string for the first substring matching the regular expression regexp, and returns an integer giving the position  of  this
	      substring.  If no such substring is found, match returns zero.  The built-in variable RSTART is set to pos and the built-in variable
	      RLENGTH is set to the length of the matched string.  These are both set to zero if there is no match.  The regexp can be enclosed in
	      slashes or given as a string.

       n = gsub(regexp, repl, string)
	      globally	replaces  all  substrings  of  string that match the regular expression regexp, and replaces the substring with the string
	      repl.  If string is omitted, the current record ($0) is used.  The notation gsub returns the number of substrings that were replaced
	      or zero if no match occurred.

       n = sub(regexp, repl, string)
	      Works like gsub except that at most one match and substitution is attempted.

       str = sprintf(fmt, expr, expr...)
	      Formats  the  expression list expr, expr, ...  using specifications from the string fmt, then returns the formatted string.  The fmt
	      string consists of conversion specifications which convert and add the next expr to the string, and ordinary  characters	which  are
	      simply added to the string.  Conversion specifications have the form
	      %[-][x][.y]c
	      where

	      -   left justifies the field
	      x   is the minimum field width
	      y   is the precision
	      c   is the conversion character

	      In a string, the precision is the maximum number of characters to be printed from the string; in a number, the precision is the num-
	      ber of digits to be printed to the right of the decimal point in a floating point value.	If x or y is `*' (asterisk),  the  minimum
	      field width or precision will be the value of the next expr in the call to sprintf.

	      The conversion character c is one of following:

	      d   Decimal integer
	      o   Unsigned octal integer
	      x   Unsigned hexadecimal integer
	      u   Unsigned decimal integer
	      f   Floating point
	      e   Floating point (scientific notation)
	      g   The shorter of e and f (suppresses non-significant zeros)
	      c   Single character of an integer value
	      s   String

       n = ord(expr)
	      Returns the integer value of first character in the string value of expr.  This is useful in conjunction with `%c' in sprintf.

       str = tolower(expr)
	      Converts all letters in the string value of expr into lower case, and returns the result.  If expr is omitted, $0 is used.

       str = toupper(expr)
	      Converts all letters in the string value of expr into upper case, and returns the result.  If expr is omitted, $0 is used.

   The System Function
       status = system(expr)
	      Executes the string value of expr as a command.  For example,
	      system("tail " $1)
	      calls  the  command, using the string value of $1 as the file that should examine.  See the Restrictions section for a discussion of
	      the execution of the command.

   User-Defined Functions
       You can define your own functions using the form
       function name(parameter-list) {
	       statements
       }
       A function definition can appear in the place of a pattern {action} rule.  The parameter-list contains any number of  normal  (scalar)  and
       array  variables  separated  by commas.	When a function is called, scalar arguments are passed by value, and array arguments are passed by
       reference.  The names specified in the parameter-list are local to the function; all other names used  in  the  function  are  are  global.
       Local  scalar variables can be defined by adding them to the end of the parameter list.	These extra parameters are not used in any call to
       the function.

       A function returns to its caller either when the final statement in the function is executed, or when an explicit return statement is  exe-
       cuted.

   Patterns and Actions
       A pattern is a regular expression, a special pattern, a pattern range, or any arithmetic expression.

       BEGIN  is  a special pattern used to label actions that should be performed before any input records have been read.  END is a special pat-
       tern used to label actions that should be performed after all input records have been read.

       A pattern range is given as
       pattern1,pattern2
       This matches all lines from one that matches pattern1 to one that matches pattern2, inclusive.

       If a pattern is omitted, or if the numeric value of the pattern is non-zero (true), the resulting action is executed for the line.

       An action is a series of statements terminated by semicolons, new-lines, or closing braces.  A condition  is  any  expression;  a  non-zero
       value is considered true, and a zero value is considered false.	A statement is one of the following:
       expression

       if (condition)
	       statement
       [else
	       statement]

       while (condition)
	       statement

       do
	       statement
       while (condition)

       for (expression1; condition; expression2)
	       statement
       The for statement is equivalent to:
       expression1
       while (condition) {
	       statement
	       expression2
       }
       The for statement can also have the form
       for (i in array)
	       statement
       The statement is executed once for each element in array; on each repetition, the variable i will contain the name of a subscript of array,
       running through all the subscripts in an arbitrary order.  If array is multi-dimensional (has multiple subscripts), i will be expressed	as
       a single string with the SUBSEP character separating the subscripts.  The following simple statements are supported:

       break  Exits a for or a while loop immediately.

       continue
	      Stops the current iteration of a for or while loop and begins the next iteration (if there is one).

       next   Terminates  any processing for the current input record and immediately starts processing the next input record.	Processing for the
	      next record will begin with the first appropriate rule.

       exit[ (expr) ]
	      Immediately goes to the END action if it exists; if there is no END action, or if is already executing the END action,  the  program
	      terminates.  The exit status of the program is set to the numeric value of expr.	If (expr) is omitted, the exit status is 0.

       return [expr]
	      Returns  from  the  execution  of a function.  If an expr is specified, the value of the expression is returned as the result of the
	      function.  Otherwise, the function result is undefined.

       delete array[i]
	      Deletes element i from the given array.

       print expr, expr, ...
	      Described below.

       printf fmt, expr, expr, ...
	      Described below.

   Output
       The print and printf statements write to the standard output.  Output can be redirected to a file or pipe as described below.

       If >expr is added to a print or printf statement, the string value of expr is taken to be a file name, and output is written to that  file.
       Similarly,  if  >RI >> expr is added, output will be appended to the current contents of the file.  The distinction between `>' and `>>' is
       only important for the first print to the file expr.  Subsequent outputs to an already open file will append to what is there already.

       In order to eliminate ambiguities, statements such as
       print a > b c
       are syntactically illegal.  Parentheses must be used to resolve the ambiguity.

       If |expr is added to a print or printf statement, the string value of expr is taken to be an executable command.  The command  is  executed
       with the output from the statement piped as input into the command.

       As  noted  earlier,  only  a  limited number of files and pipes can be open at any time.  To avoid going over the limit, you should use the
       close function to close files and pipes when they are no longer needed.

       The print statement prints its arguments with only simple formatting.  If it has no arguments, the current input record is printed  in  its
       entirety.   The output record separator ORS is added to the end of the output produced by each print statement; when arguments in the print
       statement are separated by commas, the corresponding output values will be separated by the output field separator OFS.	ORS  and  OFS  are
       built-in  variables  whose  values  can	be  changed  by assigning them strings.  The default output record separator is a new-line and the
       default output field separator is a space. The format of numbers output by print is given by the string OFMT.  By  default,  the  value	is
       `%.6g'; this can be changed by assigning OFMT a different string value.

       The  printf  statement  formats its arguments using the fmt argument.  Formatting is the same as for the built-in function sprintf.  Unlike
       print, printf does not add output separators automatically.  This gives the program more precise control of the output.

Restrictions
       The longest input record is restricted to 20,000 bytes and the maximum number of fields supported is 4000.  The length of the  string  pro-
       duced by sprintf is limited to 1024 bytes.

       The  ord  function may not be recognized by other versions of The toupper and tolower functions and the ENVIRON array variable are found in
       the Bell Labs version of this version is a superset of `New as described in The AWK Programming Language by Aho, Weinberger, and Kernighan.

       The shell that is used by the functions
       getline	  print    printf    system
       and the return value of the system function is described in

Examples
       The following example outputs the contents of the file with line numbers prepended to each line:
       nawk '{print NR ":" $0}' input1

       The following is an example using var=value on the command line:
       nawk '{print NR SEP $0}' SEP=":" input1

       The program script can also be read from a file as in the command line:
       nawk -f addline.nawk input1
       This example produces the same output as the previous example when the file contains
       {print NR ":" $0}

       The following program appends all input lines starting with `January' to the file (which can already exist or not), and all lines  starting
       with `February' or `March' to the file
       /^January/ {print >> "jan"}
       /^February|^March/ {print >> "febmar"}

       This program prints the total and average for the last column of each input line:
	       {s += $NF}
       END     {print "sum is", s, "average is", s/NR}

       The following program interchanges the first and second fields of input lines:
       {
	       tmp = $1
	       $1 = $2
	       $2 = tmp
	       print
       }

       The following example inserts line numbers so that output lines are left-aligned:
       {printf "%-6d: %s
", NR, $0}

       This example prints input records in reverse order (assuming sufficient memory):
       {
	       a[NR] = $0 # index using record number
       }
       END {
	       for (i = NR; i>0; --i)
		       print a[i]
       }

       The next program determines the number of lines starting with the same first field:
       {
	       ++a[$1] # array indexed using the first field
       }
       END {   # note output will be in undefined order
	       for (i in a)
		       print a[i], "lines start with", i
       }

       The following program can be used to determine the number of lines in each input file:
       {
	       ++a[FILENAME]
       }
       END {
	       for (file in a)
		       if (a[file] == 1)
			       print file, "has 1 line"
		       else
			       print file, "has", a[file], "lines"
       }

       This program illustrates how a two dimensional array can be used in Assume the first field contains a product number, the second field con-
       tains a month number, and the third field contains a quantity (bought, sold, or whatever).  The program generates a table of products  ver-
       sus month.
       BEGIN   {NUMPROD = 5}
       {
	       array[$1,$2] += $3
       }
       END     {
	       print "	 Jan	 Feb	March	April	 May	" 
		   "June	July	 Aug	Sept	 Oct	 Nov	 Dec"
	       for (prod = 1; prod <= NUMPROD; prod++) {
		       printf "%-7s", "prod#" prod
		       for (month = 1; month <= 12; month++){
			       printf "	%5d", array[prod,month]
		       }
		       printf "
"
	       }
       }

       As this program reads in each line of input, it reports whether the line matches a pre-determined value:
       function randint() {
	       return (int((rand()+1)*10))
       }
       BEGIN   {
	       prize[randint(),randint()] = "$100";
	       prize[randint(),randint()] = "$10";
	       prize[1,1] = "the booby prize"
	       }
       {
	       if (($1,$2) in prize)
		       printf "You have won %s!
", prize[$1,$2]
       }
       END

       This example prints lines whose first and last fields are the same, reversing the order of the fields:
       $1==$NF {
	       for (i = NF; i > 0; --i)
		       printf "%s", $i (i>1 ? OFS : ORS)
       }

       The  following  program	prints the input files from the command line.  The infiles function first empties the array passed to it, and then
       fills the array.  Notice that the extra parameter i of infiles is a local variable.
       function infiles(f,   i) {
	       for (i in f)
		       delete f[i]
	       for (i = 1; i < ARGC; i++)
		       if (index(ARGV[i],"=") == 0)
			       f[i] = ARGV[i]
       }
       BEGIN   {
	       infiles(a)
	       for (i in a)
		       print a[i]
	       exit
       }

       This example is the standard recursive factorial function:
       function fact(num) {
	       if (num <= 1)
		       return 1
	       else
		       return num * fact(num - 1)
       }
       { print $0 " factorial is " fact($0) }

       The last program illustrates the use of getline with a pipe.  Here, getline sets the current record from the output of  the  command.   The
       program prints the number of words in each input file.
       function words(file,   string) {
	       string = "wc " fn
	       string | getline
	       close(string)
	       return ($2)
       }
       BEGIN   {
	       for (i=1; i<ARGC; i++) {
		       fn = ARGV[i]
		       printf "There are %d words in %s.",
			   words(fn), fn
	       }
       }

See Also
       ed(1), grep(1), sed(1), ex(1), system(3), ascii(7),
       "Awk - A Pattern Scanning and Processing Language" ULTRIX Supplementary Documents, Vol. II: Programmer

																	   nawk(1)
nawk(1) ultrix man page | unix.com