AWK is a programming language devised by Aho, Weinberger, and
Kernighan at Bell Labs (hence the name). Awk programs search
files for specific patterns and performs actions for every occur-
rence of these patterns. The patterns can be regular expressions
as used in the ed editor. The actions are expressed using a sub-
set of the C language. The patterns and actions are usually
placed in a rules file whose name must be the first argument in
the command line, preceded by the flag -f. Otherwise, the first
argument on the command line is taken to be a string containing
the rules themselves. All other arguments are taken to be the
names of text files on which the rules are to be applied, with -
being the standard input. To take rules from the standard input,
use -f -. The command: would read the patterns and actions rules
from the file rules and apply them to all the arguments. The
general format of a rules file is: ~~~<pattern> { <action> }
~~~<pattern> { <action> } ~~~... There may be any number of
these <pattern> { <action> } sequences in the rules file. Awk
reads a line of input from the current input file and applies ev-
ery <pattern> { <action> } in sequence to the line. If the <pat-
tern> corresponding to any { <action> } is missing, the action is
applied to every line of input. The default { <action> } is to
print the matched input line. The <pattern>s may consist of any
valid C expression. If the <pattern> consists of two expressions
separated by a comma, it is taken to be a range and the <action>
is performed on all lines of input that match the range. <pat-
tern>s may contain regular expressions delimited by an @ symbol.
Regular expressions can be thought of as a generalized wildcard
string matching mechanism, similar to that used by many operating
systems to specify file names. Regular expressions may contain
any of the following characters:
x An ordinary character
The backslash quotes any character
^ A circumflex at the beginning of an expr matches the be-
ginning of a line.
$ A dollar-sign at the end of an expression matches the end
of a line.
. A period matches any single character except newline.
* An expression followed by an asterisk matches zero or more
occurrences of that expression: fo* matches f, fo, foo,
fooo, etc.
+ An expression followed by a plus sign matches one or more
occurrences of that expression: fo+ matches fo, foo, fooo,
etc.
[] A string enclosed in square brackets matches any single
character in that string, but no others. If the first
character in the string is a circumflex, the expression
matches any character except newline and the characters in
the string. For example, [xyz] matches xx and zyx, while
[^xyz] matches abc but not axb. A range of characters may
be specified by two characters separated by -.
Actions are expressed as a subset of the C language. All vari-
ables are global and default to int's if not formally declared.
Only char's and int's and pointers and arrays of char and int are
allowed. Awk allows only decimal integer constants to be used--
no hex (0xnn) or octal (0nn). String and character constants may
contain all of the special C escapes (
,
, etc.). Awk sup-
ports the if, else, while and break flow of control constructs,
which behave exactly as in C. Also supported are the following
unary and binary operators, listed in order from highest to low-
est precedence:
Operator Type Associativity
() [] unary left to right
! ~ ++ -- - * & unaryright to left
* / % binary left to right
+ - binary left to right
<< >> binary left to right
< <= > >= binary left to right
== != binary left to right
& binary left to right
^ binary left to right
| binary left to right
&& binary left to right
|| binary left to right
= binary right to left
Comments are introduced by a '#' symbol and are terminated by the
first newline character. The standard /* and */ comment delim-
iters are not supported and will result in a syntax error. When
awk reads a line from the current input file, the record is auto-
matically separated into fields. A field is simply a string of
consecutive characters delimited by either the beginning or end
of line, or a field separator character. Initially, the field
separators are the space and tab character. The special unary
operator '$' is used to reference one of the fields in the cur-
rent input record (line). The fields are numbered sequentially
starting at 1. The expression $0 references the entire input
line. Similarly, the record separator is used to determine the
end of an input line, initially the newline character. The field
and record separators may be changed programatically by one of
the actions and will remain in effect until changed again. Mul-
tiple (up to 10) field separators are allowed at a time, but only
one record separator. Fields behave exactly like strings; and
can be used in the same context as a character array. These ar-
rays can be considered to have been declared as: char ($n)[
128 ]; In other words, they are 128 bytes long. Notice that the
parentheses are necessary because the operators [] and $ asso-
ciate from right to left; without them, the statement would have
parsed as: char $(1[ 128 ]); which is obviously ridiculous.
If the contents of one of these field arrays is altered, the $0
field will reflect this change. For example, this expression:
*$4 = 'A'; will change the first character of the fourth
field to an upper- case letter 'A'. Then, when the following in-
put line: 120 PRINT "Name address Zip" is
processed, it would be printed as: 120 PRINT "Name
Address Zip" Fields may also be modified with the strcpy()
function (see below). For example, the expression: strcpy(
$4, "Addr." ); applied to the same line above would yield:
120 PRINT "Name Addr. Zip" The following
variables are pre-defined:
FS Field separator (see below).
RS Record separator (see below also).
NF Number of fields in current input record (line).
NR Number of records processed thus far.
FILENAME Name of current input file.
BEGIN A special <pattern> that matches the beginning of
input text.
END A special <pattern> that matches the end of input
text.
Awk also provides some useful built-in functions for string ma-
nipulation and printing:
print(arg) Simple printing of strings only, terminated by
'
'.
printf(arg...)Exactly the printf() function from C.
getline() Reads the next record and returns 0 on end of
file.
nextfile() Closes the current input file and begins process-
ing the next file
strlen(s) Returns the length of its string argument.
strcpy(s,t) Copies the string t to the string s.
strcmp(s,t) Compares the s to t and returns 0 if they match.
toupper(c) Returns its character argument converted to upper-
case.
tolower(c) Returns its character argument converted to lower-
case.
match(s,@re@)Compares the string s to the regular expression
re and returns the number of matches found (zero
if none).
Awk was written by Saeko Hirabauashi and Kouichi Hirabayashi.