Query: nawk
OS: ultrix
Section: 1
Format: Original Unix Latex Style Formatted with HTML and a Horizontal Scroll Bar
nawk(1) General Commands Manual nawk(1) Name nawk - data transformation, report generation language Syntax nawk [ -f programfile ] [ -Fs ] [ program ] [ var=value... ] [ file ... ] Description The language is a file-processing language which is well-suited to data manipulation and retrieval of information from text files. This reference page provides a full technical description of if you are unfamiliar with the language, you will probably find it helpful to read the Guide to the nawk Utility before reading the following material. A program consists of any number of user-defined functions and `rules' of the form: pattern {action} There are two ways to specify the program: (a) Directly on the command line. In this case, the program is a single command line argument, usually enclosed in apostrophes (b) By using the -f programfile option (where programfile contains the program). More than one -f option can appear on the command line. The program will consist of the concatenation of the contents of all the specified programfiles. You can use - in place of a file name, to obtain input from the standard input. The input data manipulated by the program is provided in files specified on the command line. If no such files are specified, data is read from the standard input. You can also specify a file name of - to mean the standard input. Input to is divided into records. By default, records are separated by new-line characters; however, you can specify a different record separator if you wish. One at a time, and in order, each input record is compared with the pattern of every `rule' in the program. When a pattern matches, the action part of the rule is performed on the current input record. Patterns and actions often refer to separate fields within a record. By default, fields are separated by white space (blanks, new-lines, or horizontal tab characters); however, you can specify a different field separator string using the -Fs option (see Input). You can omit the pattern or action part of a rule (but not both). If pattern is omitted, the action is performed on every input record (as if every record matches). If action is omitted, every record matching the pattern will be written to the standard output. If a line in a program contains a `#' character, the `#' and everything after it is considered to be a comment. Program lines can be continued by adding a backslash `' to the end of the line. Statement lines ending with a comma `,', double or-bars `||', or double ampersands `&&', are automatically continued. Options -f programfile Tells to obtain its program from the specified file. There can be more than one of these on the command line. -Fs Says that s is the field separator character within records. Variables and Expressions There are three types of variables in identifiers, fields, and array elements. An identifier is a sequence of letters, digits, and underscores beginning with a letter or an underscore. Fields are described in the Input subsection. Arrays are associative collections of values called the elements of the array. Array elements are referenced with constructs of the form identifier[subscript] where subscript has the form expr or expr,expr,... Each such expr can have any string value. Arrays with multiple expr subscripts are implemented by concatenating the string values of each expr with a separator character SUBSEP separating multiple expr. The initial value of SUBSEP is set to ` 34' (ASCII field separator). Fields and identifiers are sometimes called scalar variables to distinguish them from arrays. Variables are not declared and need not be initialized. The value of an uninitialized variable is the empty string. Variables can be ini- tialized on the command line using var=value Such initializations can be interspersed with the names of input files on the command line. Initializations and input files will be pro- cessed in the order they appear on the command line. For example, the command nawk -f progfile A=1 f1 f2 A=2 f3 sets A to 1 before input is read from f1 and sets A to 2 before input is read from f3. Certain built-in variables have special meaning to as described in later sections. Expressions consist of constants, variables, functions, regular expressions and `subscript in array' conditions (see below) combined with operators. Each variable and expression has a string value and a corresponding numeric value; the value appropriate to the context is used. If a string is used in a numeric context, and the contents of the string cannot be interpreted as a number, the `value' of the string is taken to be zero. Numeric constants are sequences of decimal digits. String constants are quoted, as in "x". Escape sequences accepted in literal strings are: Escape ASCII Character ------------------------------- a audible bell backspace f formfeed new-line carriage return horizontal tab v vertical tab ooo octal value ooo xdd hexadecimal value dd " quotation mark c any other character c The regular expression syntax understood by is the extended regular expressions of the utility described in Characters enclosed in slash characters `/' are compiled as regular expressions when the program is read. In addition, literal strings and variables are interpreted as dynamic regular expressions on the right side of a `~' or `!~' operator, or as certain arguments to built-in matching and substitution functions. Note that when literal strings are used as regular expressions, extra backslashes are needed to escape regular expression metacharacters because the backslash is also the literal string escape character. The `subscript in array' condition is defined as: index in array where index looks like expr or (expr,...,expr). This condition evaluates to 1 if the string value of index is a subscript of array, and to 0 otherwise. This is a way to determine if an array element exists. If the element does not exist, this condition will not create it. Symbol Table The symbol table can be accessed through the built-in array SYMTAB. SYMTAB[expr] is equivalent to the variable named by the evaluation of expr. For example, SYMTAB["var"] is a synonym for the variable var. Environment A program can determine its initial environment by examining the ENVIRON array. If the environment consists of entries of the form: name=value then ENVIRON[name] has string value "value" For example, the following program is equivalent to the default output of BEGIN { for (i in ENVIRON) printf("%s=%s ", i, ENVIRON[i]) exit } Operators The usual precedence order of arithmetic operations is followed unless overridden with parentheses; a table giving the order of operations appears at the end of the Guide to the nawk Utility. The unary operators are - Negation + Nothing (place holder) -- Decrement by one ++ Increment by one where the `++' and `--' operators can be used as either postfix or prefix operators, as in C. The binary arithmetic operators are + Addition - Subtraction * Multiplication / Division % Modulus ^ Exponentiation The conditional operator expr ? expr1 : expr2 evaluates to expr1 if the value of expr is non-zero, and to expr2 otherwise. If two expressions are not separated by an operator, their string values are concatenated. The operator `~' yields 1 (true) if the regular expression on the right side matches the string on the left side. The operator `!~' yields 1 when the right side has no match on the left. To illustrate: $2 ~ /[0-9]/ selects any line where the second field contains at least one digit. Any string or variable on the right side of `~' or `!~' is inter- preted as a dynamic regular expression. The relational operators are the usual `<', `<=', `>', `>=', `==', and `!='. The boolean operators are `||' (or), `&&' (and), and `!' (not). Values can be assigned to a variable with var = expr If op is a binary arithmetic operator, var op= expr is equivalent to var = var op expr Command Line Arguments The built-in variable ARGC is set to the number of command line arguments. The built-in array ARGV has elements subscripted with digits from zero to ARGC-1, giving command line arguments in the order they appeared on the command line. The ARGC count and the ARGV vector do not include command line options (beginning with `-') or the program file (following They do include the name of the command itself, initialization statements of the form var=value and the names of input data files. The language actually creates ARGC and ARGV before doing anything else. It then walks through ARGV processing the arguments. If an ele- ment of ARGV is the empty string, it is simply skipped. If it contains an equals sign `=', it is interpreted as a variable assignment. If it is a minus sign `-', it stands for the standard input and input is immediately read from the standard input until end-of-file is encoun- tered. Otherwise, the argument is taken to be a file name; input will be read from that file until end-of-file is reached. Note that the program is executed by `walking through' ARGV in this way; thus if the program changes ARGV, different files can be read and assignments made. Input Input is divided into records. Each record is separated from the next with a record separator character. The value of the built-in vari- able RS gives the current record separator character; by default, it begins as the new-line ` '. If you assign a different character to RS, will use that as the record separator character from that point on. Records are divided into fields. Each field is separated from the next with a field separator string, given by the value of the built-in variable FS. You can set a specific separator string by assigning a value to FS or by specifying the -Fs option on the command line. FS can be be assigned a regular expression. For example, FS = "[,:$]" says that fields can be separated by commas, colons, or dollar signs. As a special case, assigning FS a string containing only a blank character sets the field separator to white space. In this case, any sequence of contiguous space and/or tab characters is considered a single field separator. This is the default for FS. However, if FS is assigned a string containing any other character, that character designates the start of a new field. For example, if we set FS=" " (the tab character), texta textb textc contains five fields, two of which only contain blanks. With the default setting, the above would only contain three fields because the sequence of multiple blanks and tabs would be considered a single separator. Various pieces of information about input are provided by the built-in variables listed below. NF Number of fields in the current record NR Number of records read so far FILENAME Name of file containing current record FNR Number of records read from current file Field specifiers have the form $i where i runs from 1 through NF. Such a field specifier refers to the ith field of the current input record. $0 (zero) refers to the entire current input record. The getline function can read a value for a variable or $0 from the current input, from a file, or from a pipe. The result of getline is an integer indicating whether the read operation was successful. A value of 1 indicates success; 0 indicates end-of-file encountered; and -1 indicates that an error occurred. Possible forms for getline are: getline Reads next input record into $0 and splits the record into fields. NF, NR, and FNR are set appropriately. getline var Reads next input record into the variable var. The record is not split into fields (which means that the current $i values do not change). NR and FNR are set appropriately. getline <expr Interprets the string value of expr to be a file name. The next record from that file is read into $0 and split into fields. NF is set appropriately. getline var <expr Interprets the string value of expr to be a file name, and reads the next record from that file into the variable var. The record is not split into fields. expr | getline Interprets the string value of expr as a command line to be executed. Output from this command is piped into getline, and read into $0 in a manner similar to getline <expr. See the SYSTEM FUNCTION section for additional details. expr | getline var Executes the string value of expr as a command and pipes the output of the command into getline. The result is similar to getline var <expr. close(expr) Only a limited number of files and pipes can be open at one time. This function will close open files or pipes. The expr must be one that came before `|' or after `>' in getline, or after `>', `>>', or `|' in print or printf as described in the Output section. By closing files and pipes that are no longer needed, you can use any number of files and pipes in the course of executing a program. Built-In Arithmetic Functions int(expr) Returns the integer part of the numeric value of expr. If (expr) is omitted, the integer part of $0 is returned. exp(expr), log(expr), sqrt(expr) Returns the exponential, natural logarithm, and square root of the numeric value of expr. If (expr) is omitted, $0 is used. sin(expr), cos(expr) Returns the sine and cosine of the numeric value of expr (interpreted as an angle in radians). atan2(expr1, expr2) Returns the arctangent of expr1/expr2 in the range of -n through n. rand() Returns a random floating-point number in the range 0 through 1. srand(expr) Sets the seed of the rand function to the integer value of expr. If (expr) is omitted, sets a default seed (which is the same each time is invoked). Built-In String Functions len = length(expr) Returns the number of characters in the string value of expr. If (expr) is omitted, $0 is used. n = split(string, array, regexp) Splits the string into fields. The expression regexp is a regular expression giving the field separator string for the purposes of this operation. The elements of array are assigned the separated fields in order; subscripts for array begin at 1. All other ele- ments of array are discarded. The result of split is the number of fields into which string was divided (which is also the maximum subscript for array). Note that regexp divides the record in the same way that the FS field separator string does. If regexp is omitted in the call to split, the current value of FS will be used. str = substr(string, m, len) Returns the substring of string that begins in position m and is at most len characters long. The first character of the string has m equal to one. If len is omitted, the rest of string is returned. pos = index(s1, s2) Returns the position of the first occurrence of string s2 in string s1; if s2 is not found in s1, index returns zero. pos = match(string, regexp) Searches string for the first substring matching the regular expression regexp, and returns an integer giving the position of this substring. If no such substring is found, match returns zero. The built-in variable RSTART is set to pos and the built-in variable RLENGTH is set to the length of the matched string. These are both set to zero if there is no match. The regexp can be enclosed in slashes or given as a string. n = gsub(regexp, repl, string) globally replaces all substrings of string that match the regular expression regexp, and replaces the substring with the string repl. If string is omitted, the current record ($0) is used. The notation gsub returns the number of substrings that were replaced or zero if no match occurred. n = sub(regexp, repl, string) Works like gsub except that at most one match and substitution is attempted. str = sprintf(fmt, expr, expr...) Formats the expression list expr, expr, ... using specifications from the string fmt, then returns the formatted string. The fmt string consists of conversion specifications which convert and add the next expr to the string, and ordinary characters which are simply added to the string. Conversion specifications have the form %[-][x][.y]c where - left justifies the field x is the minimum field width y is the precision c is the conversion character In a string, the precision is the maximum number of characters to be printed from the string; in a number, the precision is the num- ber of digits to be printed to the right of the decimal point in a floating point value. If x or y is `*' (asterisk), the minimum field width or precision will be the value of the next expr in the call to sprintf. The conversion character c is one of following: d Decimal integer o Unsigned octal integer x Unsigned hexadecimal integer u Unsigned decimal integer f Floating point e Floating point (scientific notation) g The shorter of e and f (suppresses non-significant zeros) c Single character of an integer value s String n = ord(expr) Returns the integer value of first character in the string value of expr. This is useful in conjunction with `%c' in sprintf. str = tolower(expr) Converts all letters in the string value of expr into lower case, and returns the result. If expr is omitted, $0 is used. str = toupper(expr) Converts all letters in the string value of expr into upper case, and returns the result. If expr is omitted, $0 is used. The System Function status = system(expr) Executes the string value of expr as a command. For example, system("tail " $1) calls the command, using the string value of $1 as the file that should examine. See the Restrictions section for a discussion of the execution of the command. User-Defined Functions You can define your own functions using the form function name(parameter-list) { statements } A function definition can appear in the place of a pattern {action} rule. The parameter-list contains any number of normal (scalar) and array variables separated by commas. When a function is called, scalar arguments are passed by value, and array arguments are passed by reference. The names specified in the parameter-list are local to the function; all other names used in the function are are global. Local scalar variables can be defined by adding them to the end of the parameter list. These extra parameters are not used in any call to the function. A function returns to its caller either when the final statement in the function is executed, or when an explicit return statement is exe- cuted. Patterns and Actions A pattern is a regular expression, a special pattern, a pattern range, or any arithmetic expression. BEGIN is a special pattern used to label actions that should be performed before any input records have been read. END is a special pat- tern used to label actions that should be performed after all input records have been read. A pattern range is given as pattern1,pattern2 This matches all lines from one that matches pattern1 to one that matches pattern2, inclusive. If a pattern is omitted, or if the numeric value of the pattern is non-zero (true), the resulting action is executed for the line. An action is a series of statements terminated by semicolons, new-lines, or closing braces. A condition is any expression; a non-zero value is considered true, and a zero value is considered false. A statement is one of the following: expression if (condition) statement [else statement] while (condition) statement do statement while (condition) for (expression1; condition; expression2) statement The for statement is equivalent to: expression1 while (condition) { statement expression2 } The for statement can also have the form for (i in array) statement The statement is executed once for each element in array; on each repetition, the variable i will contain the name of a subscript of array, running through all the subscripts in an arbitrary order. If array is multi-dimensional (has multiple subscripts), i will be expressed as a single string with the SUBSEP character separating the subscripts. The following simple statements are supported: break Exits a for or a while loop immediately. continue Stops the current iteration of a for or while loop and begins the next iteration (if there is one). next Terminates any processing for the current input record and immediately starts processing the next input record. Processing for the next record will begin with the first appropriate rule. exit[ (expr) ] Immediately goes to the END action if it exists; if there is no END action, or if is already executing the END action, the program terminates. The exit status of the program is set to the numeric value of expr. If (expr) is omitted, the exit status is 0. return [expr] Returns from the execution of a function. If an expr is specified, the value of the expression is returned as the result of the function. Otherwise, the function result is undefined. delete array[i] Deletes element i from the given array. print expr, expr, ... Described below. printf fmt, expr, expr, ... Described below. Output The print and printf statements write to the standard output. Output can be redirected to a file or pipe as described below. If >expr is added to a print or printf statement, the string value of expr is taken to be a file name, and output is written to that file. Similarly, if >RI >> expr is added, output will be appended to the current contents of the file. The distinction between `>' and `>>' is only important for the first print to the file expr. Subsequent outputs to an already open file will append to what is there already. In order to eliminate ambiguities, statements such as print a > b c are syntactically illegal. Parentheses must be used to resolve the ambiguity. If |expr is added to a print or printf statement, the string value of expr is taken to be an executable command. The command is executed with the output from the statement piped as input into the command. As noted earlier, only a limited number of files and pipes can be open at any time. To avoid going over the limit, you should use the close function to close files and pipes when they are no longer needed. The print statement prints its arguments with only simple formatting. If it has no arguments, the current input record is printed in its entirety. The output record separator ORS is added to the end of the output produced by each print statement; when arguments in the print statement are separated by commas, the corresponding output values will be separated by the output field separator OFS. ORS and OFS are built-in variables whose values can be changed by assigning them strings. The default output record separator is a new-line and the default output field separator is a space. The format of numbers output by print is given by the string OFMT. By default, the value is `%.6g'; this can be changed by assigning OFMT a different string value. The printf statement formats its arguments using the fmt argument. Formatting is the same as for the built-in function sprintf. Unlike print, printf does not add output separators automatically. This gives the program more precise control of the output. Restrictions The longest input record is restricted to 20,000 bytes and the maximum number of fields supported is 4000. The length of the string pro- duced by sprintf is limited to 1024 bytes. The ord function may not be recognized by other versions of The toupper and tolower functions and the ENVIRON array variable are found in the Bell Labs version of this version is a superset of `New as described in The AWK Programming Language by Aho, Weinberger, and Kernighan. The shell that is used by the functions getline print printf system and the return value of the system function is described in Examples The following example outputs the contents of the file with line numbers prepended to each line: nawk '{print NR ":" $0}' input1 The following is an example using var=value on the command line: nawk '{print NR SEP $0}' SEP=":" input1 The program script can also be read from a file as in the command line: nawk -f addline.nawk input1 This example produces the same output as the previous example when the file contains {print NR ":" $0} The following program appends all input lines starting with `January' to the file (which can already exist or not), and all lines starting with `February' or `March' to the file /^January/ {print >> "jan"} /^February|^March/ {print >> "febmar"} This program prints the total and average for the last column of each input line: {s += $NF} END {print "sum is", s, "average is", s/NR} The following program interchanges the first and second fields of input lines: { tmp = $1 $1 = $2 $2 = tmp print } The following example inserts line numbers so that output lines are left-aligned: {printf "%-6d: %s ", NR, $0} This example prints input records in reverse order (assuming sufficient memory): { a[NR] = $0 # index using record number } END { for (i = NR; i>0; --i) print a[i] } The next program determines the number of lines starting with the same first field: { ++a[$1] # array indexed using the first field } END { # note output will be in undefined order for (i in a) print a[i], "lines start with", i } The following program can be used to determine the number of lines in each input file: { ++a[FILENAME] } END { for (file in a) if (a[file] == 1) print file, "has 1 line" else print file, "has", a[file], "lines" } This program illustrates how a two dimensional array can be used in Assume the first field contains a product number, the second field con- tains a month number, and the third field contains a quantity (bought, sold, or whatever). The program generates a table of products ver- sus month. BEGIN {NUMPROD = 5} { array[$1,$2] += $3 } END { print " Jan Feb March April May " "June July Aug Sept Oct Nov Dec" for (prod = 1; prod <= NUMPROD; prod++) { printf "%-7s", "prod#" prod for (month = 1; month <= 12; month++){ printf " %5d", array[prod,month] } printf " " } } As this program reads in each line of input, it reports whether the line matches a pre-determined value: function randint() { return (int((rand()+1)*10)) } BEGIN { prize[randint(),randint()] = "$100"; prize[randint(),randint()] = "$10"; prize[1,1] = "the booby prize" } { if (($1,$2) in prize) printf "You have won %s! ", prize[$1,$2] } END This example prints lines whose first and last fields are the same, reversing the order of the fields: $1==$NF { for (i = NF; i > 0; --i) printf "%s", $i (i>1 ? OFS : ORS) } The following program prints the input files from the command line. The infiles function first empties the array passed to it, and then fills the array. Notice that the extra parameter i of infiles is a local variable. function infiles(f, i) { for (i in f) delete f[i] for (i = 1; i < ARGC; i++) if (index(ARGV[i],"=") == 0) f[i] = ARGV[i] } BEGIN { infiles(a) for (i in a) print a[i] exit } This example is the standard recursive factorial function: function fact(num) { if (num <= 1) return 1 else return num * fact(num - 1) } { print $0 " factorial is " fact($0) } The last program illustrates the use of getline with a pipe. Here, getline sets the current record from the output of the command. The program prints the number of words in each input file. function words(file, string) { string = "wc " fn string | getline close(string) return ($2) } BEGIN { for (i=1; i<ARGC; i++) { fn = ARGV[i] printf "There are %d words in %s.", words(fn), fn } } See Also ed(1), grep(1), sed(1), ex(1), system(3), ascii(7), "Awk - A Pattern Scanning and Processing Language" ULTRIX Supplementary Documents, Vol. II: Programmer nawk(1)
Similar Topics in the Unix Linux Community |
---|
can anyone explain this codes here? |