Linux and UNIX Man Pages

Linux & Unix Commands - Search Man Pages

egrep(1) [osf1 man page]

grep(1) 						      General Commands Manual							   grep(1)

NAME
grep, egrep, fgrep - Searches a file for patterns SYNOPSIS
grep [-E | -F] [-c | -l | -q] [-bhinsvwxy] [-pparagraph_separator] -e pattern_list [-e pattern_list]... [-f pattern_file]... [file...] grep [-E | -F] [-c | -l | -q] [-bhinsvwxy] [-pparagraph_separator] [-e pattern_list]... -f pattern_file [-f pattern_file]... [file...] grep [-E | -F] [-c | -l | -q] [-bhinsvwxy] [-pparagraph_separator] pattern_list [file...] The commands grep -E and grep -F are equivalent to the obsolescent commands egrep and fgrep, respectively. The grep command searches the specified files (standard input by default) for lines containing characters that match the specified pat- terns, and then writes matching lines to standard output. STANDARDS
Interfaces documented on this reference page conform to industry standards as follows: grep: XCU5.0 egrep: XCU5.0 fgrep: XCU5.0 Refer to the standards(5) reference page for more information about industry standards and associated tags. OPTIONS
Although most options can be combined, some combinations result in one option overriding another. For example, if you specify -n and -l, the output includes file names only (as specified by -l) and thus does not include line numbers (as specified by -n). Treats patterns as extended regular expressions and is equivalent to the obsolescent egrep command. Treats patterns as fixed strings and is equivalent to the obsolescent fgrep command. [Tru64 UNIX] Precedes each line by the block number on which it was found. Use this option to help find disk block numbers by context. Displays only a count of matching lines. Used to specify one or more patterns to match. If more than one pat- tern is specified in pattern_list, they must be separated by newline characters (carriage returns). The -e option is useful for specifying a pattern that begins with a - (dash). Specifies a file that contains patterns to match, one per line. [Tru64 UNIX] Suppresses reporting of file names when multiple files are processed. That is, it prevents the name of the file containing the matching line from being appended to that line. Ignores the case of letters pattern matching; that is, uppercase and lowercase in the input are considered to be identical. Lists only the name of each file containing matched lines. Each file name is listed only once; file names are separated by newline characters. The grep command returns (standard input) (or the local equivalent) in place of a file name if -l is specified with standard input. Precedes each line with its relative line number in the file. [Tru64 UNIX] Displays the entire paragraph containing matched lines. Paragraphs are delimited by paragraph separators, paragraph_separator, which are patterns in the same form as the search pattern. Lines containing the paragraph separators are used only as separators; they are never included in the output. The default para- graph separator is a blank line. Suppresses all output except error messages. This is useful for checking status. Suppresses error mes- sages arising from non-existent or unreadable files. Other error messages are still displayed. Displays all lines except those that match the specified pattern. Useful for filtering unwanted lines out of a file. [Tru64 UNIX] Matches only if the expression is found as a sep- arate word in the text. A word is any string of alphanumeric characters (letters, numerals, and underscores) delimited by nonalphanumeric characters (punctuation or white space) or by the beginning or end of the line). See ex. Displays a line only if the pattern matches the entire line. [Tru64 UNIX] Same as -i option. OPERANDS
Specify one or more patterns to be used during the search for input. This operand is treated as if it were specified as -e pattern_list. A path name of a file to be searched for the patterns. If no file operands are specified, the standard input is used. DESCRIPTION
By default, the grep command treats a pattern as a basic regular expression (BRE). With the -E option, the pattern is treated as an extended regular expression (ERE). With the -F option, the pattern is considered a fixed string. See the following discussion of regular expressions. In the output of the grep command, a matched line is preceded with the name of the file in which it was found if you specify more than one file (except when the -h option is specified). [Tru64 UNIX] You are strongly encouraged to single quote patterns to protect them from unwanted shell substitutions. In some cases, such as in multiline pattern lists and subexpressions, quoting is essential. When using the C shell interactively, you must enter a backslash before terminating a line in a multiline pattern. [Tru64 UNIX] Running grep on a file that is not a text file (for example, an file) produces unpredictable results and is discouraged. NOTES
The egrep and fgrep utility is marked LEGACY in XCU Issue 5. REGULAR EXPRESSIONS
Regular expressions (RE's) provide a powerful way to specify patterns to search for in text files (or in the standard input). This section explains the rules for constructing such patterns. On Tru64 UNIX there are two standard types of REs, and thus two sets of rules for building patterns. The two types of a regular expression that can be built by using these rules are termed either basic regular expression (BRE) or extended regular expression (ERE). There is much in common between BREs and EREs, but there are important differences as well. A variety of commands and utilities use one or the other type of RE, or both. Thus the rules described below are applicable in many con- texts. Nonetheless, the grep command is used illustratively here. The term regular expression, or RE, is used when there is no need to distinguish between BREs and EREs. The terms pattern and regular expression can be used interchangeably. The term match is used to describe a string in a file (or standard input) that is successfully specified by a pattern or RE. A pattern or an RE may also be referred to as a string. The matched string might also be termed a substring or a sequence (of characters). Simple REs match a single character. More complex REs are built from other REs as explained in the rules below. REs are defined recur- sively; for example, if you concatenate two REs, the resultant string is an RE. Regular Expression Concepts The concept of a character is generalized to the concept of a collating element. For many purposes, especially in English-speaking locales, the term collating element may be considered synonymous with character. Collating elements are relevant to bracket expressions, and are discussed in the following sections. A collating element is the smallest unit used to determine how to order characters. They are necessary for languages that treat some strings as individual collating elements. For example, in Spanish, the strings ch and ll each are collating symbols (that is, the Spanish primary sort order is a, b, c, ch, d,...,k, l, ll, m,...). As an example, suppose we have a file test that contains these three lines: ab acbcbc 12356 The command grep 'b' test results in this output: ab acbcbc because the RE b, the pattern, matches the letter b in the first and second lines of the file, and there is no b in the third line. The RE c would match just the second line. The RE bc, built by concatenating the prior two REs, would match just the second line. There are two instances of bc in the second line, so the pattern matches the line. However, in using some of the rules that build REs, it is important to understand exactly what substrings are matched by a pattern. Those rules are given in the following sections, but for illustration, consider the RE c.*b. This pattern means match a string beginning with c, ending with b, and with any number of characters between, including none. Thus this pattern matches lines containing cb, cxb, and canythingb, for example. The search for a match starts at the beginning of a string and stops when the first sequence matching the pattern is found scanning from left to right. If there is more than one possible leftmost match, the longest match is used. For example, in the file test above, the pat- tern c.*b matches the second through third characters of the second line, and also the second through the fifth characters. The latter, being the longer, is the actual match. However, a longer substring that is not the leftmost match is not a match. A null pattern will match any character, so the command grep '' test matches all three lines. A multicharacter collating element is considered a single character in the rules below that describe how to form a bracket expression, which matches a single character. However, when considering what the longest sequence is in a match involving a multicharacter collating element, the element counts not as one character but as the number of characters it matches. Pattern matching can be done in a case-insensitive manner. Case-insensitive processing permits matching of multicharacter collating ele- ments as well as characters. For example, in grep -i '[[.Ch.]]' file the RE [[.Ch.]] would match ch, Ch, cH, or CH. The notation is explained below. Some utilities that use regular expressions, including grep, process a file line by line. A line ends with a newline character. In general (but not with grep the newline character is regarded as an ordinary character and both a period and a nonmatching list can match one. (See discussion below.) Some utilities, including grep, do not allow newline characters in a pattern to be matched. Basic Regular Expressions Basic regular expressions (BREs) are built by concatenating simpler BREs. BREs can be classified as those that can match a single character in the search string, and those that can match multiple characters. The following BREs match a single character (or collating element): An ordinary character, a special character preceded by a backslash, or a period (.), matches a single character. A bracket expression matches a single character or a single collating element. These terms are defined in the following sections. BRE Ordinary Characters Any character, except for those listed in the section "BRE Special Characters," below, is an ordinary character and is a BRE that matches itself. Except for the following, do not quote ordinary characters with a backslash (): The characters (, ), { and }. The use of these characters quoted with backslashes is explained in the sections on subexpressions and interval expressions under the heading "BREs Matching Multiple Characters," following. The digits 1 to 9 inclusive. The use of these numerals quoted with backslashes is explained in the section on back-reference expressions under the heading "BREs Matching Multiple Characters," below. You can not use a backslash to quote a character inside a bracket expression; inside a bracket expression a backslash is an ordinary char- acter. These characters, (, ), {, }, and 1 - 9 are considered "ordinary characters" (see next section) because they do not have to be quoted with a backslash to match themselves as do "special characters." BRE Special Characters Some characters have special meaning when used in a BRE in some contexts, defined next. Outside such contexts, or in the context but quoted with a preceding backslash, these characters have no special meaning, and each is a BRE that matches itself. The BRE special characters and contexts are: The period, left bracket, and backslash are special except when used in a bracket expression (discussed below). A pattern containing a [ that is not preceded by a backslash and is not part of a bracket expression is not valid. The asterisk is special except when used in a bracket expression, as the first character of a complete pattern (after an initial ^, if any), or as the first character of a subexpression (after an initial ^ if any); The circumflex is special when used as an anchor or as the first character of a bracket expression. These concepts are explained below. The dollar sign is special when used as an anchor. Periods in BREs A period (.), when used outside a bracket expression, is a BRE that matches any character. BRE Bracket Expression A non-null string enclosed in [ ] (brackets) is called a Bracket Expression. It is a BRE that matches any single character (or collating element) in the enclosed string. For example, using the sample file test described above, the command grep '[a3][c5]' test outputs the second and third lines, acbcbc and 12356, because the two contiguous bracket expressions in the pattern match the substrings ac and 35 in those lines. A bracket expression is either a matching list expression or a nonmatching list expression. It consists of one or more collating elements, collating symbols, equivalence classes, character classes or range expressions. The right bracket (]) loses its special meaning and represents itself in a bracket expression if it occurs first in the list (after an ini- tial circumflex (^), if any). Otherwise, it terminates the bracket expression, unless it appears in a collating symbol (such as [.].] ) or is the ending right bracket for a collating symbol, equivalence class, or character class. The special characters (period, asterisk, left bracket and backslash) lose their special meanings within a bracket expression. The character sequences [., [=, and [: (left bracket followed by a period, equal sign, or colon) are special inside a bracket expression and are used to delimit collating symbols, equivalence class expressions and character class expressions. These symbols must be followed by a valid expression and the matching terminating sequence =], or :], as defined next. The rules follow for creating and using matching and nonmatching list expressions, collating symbol, equivalence class expression, charac- ter class expression, and range expression, in bracket expressions. A matching list expression, such as [a3] in the example above, speci- fies a list that matches any character or collating element in the list. The first character in the list can not be a circumflex. [a3] matches either the character a or the character 3. A nonmatching list expression begins with a circumflex (^), and specifies a list that matches any character or collating element except for the expressions in the list after the leading circumflex. For example, [^abc] is a BRE that matches any character or collating element except the characters a, b or c. If the circumflex does not appear immediately follow- ing the left bracket, it loses its special meaning. A collating symbol is a collating element enclosed within bracket-period ([. .]) delimiters. The concept is introduced above under the heading "Regular Expression Concepts." Multicharacter collating elements are represented as collating symbols to distinguish them from the individual characters in the collating symbol. For example, when using Spanish collation rules, [[.ch.]] is treated as a BRE matching the sequence ch, while [ch] is treated as an BRE matching c or h. In addition, [a-[.ch.]] matches a, b, c, and ch. (See range expressions, below.) Collat- ing symbols are valid only inside bracket expressions. An equivalence class expression specifies a set of collating elements that all sort to the same primary location. An equivalence class is enclosed in bracket-equal ([= =]) delimiters. An equivalence class generally is designed to deal with primary-secondary sorting; that is, for languages like French, that define groups of characters as sorting to the same primary location, and then having a tie-breaking, secondary sort. For example, if x, y, and z are collating elements that belong to the same equivalence class, then the bracket expressions [[=x=]a], [[=y=]a], and [[=z=]a] are equivalent to [xyza]. (Here we use x, y, and z as variables representing characters in the same equiva- lence class; in a typical example, x might be the collating element e, and y and z the characters e with an acute accent and e with a grave accent.) If the collating element within [= =] delimiters does not belong to an equivalence class, the equivalence class expression is treated as a collating symbol, that is, the delimiters are ignored. A character class expression enclosed in bracket- colon [: :] delimiters matches any of the set of characters in the named class. Members of each of the sets are determined by the current setting of the LC_CTYPE environment variable. The supported classes are: alpha, upper, lower, digit, alnum, xdigit, space, print, punct, graph, and cntrl. Here is an example of how to specify one of these classes: [[:lower:]] This matches any single lowercase character for the current locale. A range expression represents the set of collating elements that fall between two elements in the current collation sequence, inclusively. It is expressed as starting and ending points sepa- rated by a hyphen (-). For example, the BRE 1[a-d]2, which includes the bracket expression [a-d], containing the range expression a- d, represents a pattern that will match any of these strings: 1a2, 1b2, 1c2, and 1d2. Range expressions should not be used in portable applications because their behavior depends on collating sequences. A construction such as [a-d-g] is invalid. The hyphen character loses its special meaning in a bracket expression if it occurs first (after an initial ^, if any) or last, or as an ending range point in a range expression. For example, the expressions [-df] and [df-] are equivalent and match any of the characters d, f, or -. The expressions [^-df] and [^df-] are equivalent and match any characters except d, f and -; the expression [&--] matches any character between &, and - inclusive; the expression [--;] matches any of the characters between - and ; inclu- sive; and the expression [A--] is invalid, because A follows - in the collation sequence. A hyphen or right bracket may be repre- sented as collating symbols, [.-.] or [.].], anywhere in a bracket expression; Otherwise, if both - and ] are required in a bracket expression, bracket must be first (after an optional initial ^) and the hyphen last. BREs Matching Multiple Characters The rules above describe how to construct a BRE that matches a single character. In some of the examples above, patterns that match multi- ple characters were given based on the intuitive concept of concatenation. This, and the other rules used to build BREs which match multi- ple characters from BREs matching single characters, follow. The concatenation of BREs matches the concatenation of the strings matched by each component of the BRE. A subexpression can be defined within a BRE by enclosing it between the character pairs ( and ). Such a sub- expression matches whatever it would have matched without the ( and ). Up to nine subexpressions are saved into numbered holding spaces. Counting from left to right on the line, the first pattern saved is placed in the first holding space, the second pattern is placed in the second holding space, and so on. The character sequence , called a back-reference expression, matches the nth saved pattern, which is in the nth holding space. (The value of n is a digit, 1-9.) Thus, the pattern: (a)(b)c21 matches the string abcba. You can nest patterns to be saved in holding spaces. Whether the enclosed patterns are nested or are in a series, refers to the nth occurrence, counting from the left, of the delimiting characters ). In utilities that have replacement as well as search patterns, you can use expressions in the replacement strings as well as in the search patterns. A back-reference expression is invalid if less than n subexpressions precede the . Finally, any number of subexpressions are allowed in a search pattern even though the number of back-reference expressions is limited to nine. If a BRE x matches a single character, or is a subexpression or a back-reference, then the pattern x* (x followed by an asterisk), matches zero or more occur- rences of the character that the BRE x matches. For example, this pattern: ab*cd matches each of these strings: acd abcd abbcd abbbcd but not this string: abd A BRE that matches a single character, or that is a subexpression or a back-reference, followed by an interval expression of the format {i}, {i,} or {i,j}, matches what repeated consecutive occurrences of the BRE would match. Such a BRE followed by: matches exactly i occurrences of the character matched by the BRE matches at least i occurrences of the character matched by the BRE matches any number of occurrences of the character matched by the BRE from i to j, inclusive. The values of i and j must be integers in the range 0 <= i <= j <= 255. Whenever a choice exists, the pattern matches as many occur- rences as possible. Note that if i is 0 (zero), the interval expression is equivalent to the null BRE. BRE Expression Anchoring--Restricting What Patterns Match A pattern (an entire BRE) can be restricted to match from the beginning of a line, restricted to match up to the end of the line, or restricted to match the entire line. This is done by anchoring the search pattern. A ^ (circumflex) at the beginning of an expression or subexpression causes the pattern to match only a string that begins in the first character position on a line. For example, the pattern ^bc matches bc in the line bcdef but doesn't match bc in abcdef. The subexpression (^bc) also matches bcdef. A $ (dollar sign) at the end of a pattern causes that pattern to match only if the last matched character is the last character (not including the newline character) on a line. The construction ^pattern$ restricts the pattern to matching only an entire line. For example, the BRE ^abcd$ matches lines contain- ing the string abcd, where a is the first character on the line and d the last. BRE Precedence The order of precedence, for high to low, is as shown in the following table: collation-related bracket symbols [= =] [: :] [. .] escaped characters <special character> bracket expressions [ ] subexpressions/back-references ( ) single-character duplication * {i,j} concatenation anchoring ^ $ Extended Regular Expressions Like BREs, extended regular expressions (EREs) are built by concatenating simpler EREs. EREs can be classified as those that can match a single character, and those that can match multiple characters. An ERE ordinary character, an ERE special character preceded by a backslash, or a period matches a single character. A bracket expression matches a single character or a single collating element. An ERE matching a single character enclosed in parentheses (a group) matches the same strings as the ERE without parentheses. ERE Ordinary Characters Any character, except for special characters listed below, is an ordinary character and is an ERE that matches itself. ERE Special Characters Some characters have special meaning when used in a ERE in some contexts, defined next. Outside such contexts, or in the context but quoted with a preceding backslash, these character have no special meaning, and each is a ERE that matches itself. The ERE special characters and contexts are: The period, left bracket, backslash and left parenthesis are special except when used in a bracket expression. Outside a bracket expression, do not use a left parenthesis, (, unless it is quoted with a backslash, (. The right parenthesis is special when matched with a preceding left parenthesis, outside a bracket expression. To search for the string (), use the quoted form (). The aster- isk, plus sign, question mark, and left brace are special except when used in a bracket expression. Outside of a bracket expression, it is invalid to use any of them as the first character in an ERE, or immediately following a vertical line, a circumflex, or a left parenthesis. It is invalid to use a left brace that is not part of an interval expression. (Of course, quoting with a backslash removes such invalid- ity.) The vertical line is special except when used in a bracket expression. It is invalid to use a vertical line first or last in an ERE, or immediately following another vertical line or a left parenthesis, or immediately preceding a right parenthesis. The circumflex is spe- cial when used as an anchor or as the first character of a bracket expression. The dollar sign is special when used as an anchor. Periods in EREs A period (.), when used outside a bracket expression, is an ERE that matches any character. ERE Bracket Expression The rules for ERE Bracket Expressions are the same as for the BRE bracket expressions discussed above. EREs Matching Multiple Characters The rules above describe how to construct an ERE that matches a single character. The rules used to build EREs which match multiple charac- ters from EREs matching single characters follow. A concatenation of EREs matches the concatenation of the strings matched by each compo- nent of the ERE. A concatenation of EREs enclosed in parentheses, matches whatever the concatenation without the parentheses matches. For example, both EREs ab and (ab) match the second and third characters of the string cabcdabc. An ERE matching a single character or an ERE enclosed in parentheses followed by the special character plus sign (+) matches what one or more consecutive occurrences of the ERE would match. For example, the ERE (ab)a+ matches the second to sixth character in the string cabaaabc and c(ab)+ matches the first to seventh characters in the string cabababc. An ERE matching a single character or an ERE enclosed in parentheses followed by the special character asterisk (*) matches what zero or more consecutive occurrences of the ERE would match. For example, the ERE b*c matches the first character in the string cabbbcde, and the ERE c*de matches the second to sixth characters in the string dcccdec. The EREs [cd]+ and [cd][cd]* are equivalent and [cd]* and [cd][cd] are equivalent when matching the string cd. An ERE matching a single character or an ERE enclosed in parentheses followed by the special character question mark (?) matches what zero or one consecutive occurrence of the ERE would match. For example, the ERE c?d matches the third character in the string abdbcccde. An ERE matching a single character or an ERE enclosed in paren- theses followed by an interval expression of the format {i}, {i,}, or {i,j}, matches what repeated consecutive occurrences of the ERE would match. The rules for matching are the same as for BRE interval expressions (discussed above) except for the notational difference. For example, the ERE d{3} matches characters eight through 10 in the string abcbcbcddddde and the ERE (bc){2,} matches characters two to seven. ERE Alternation If x and y are EREs, then x|y is an ERE that matches any string that is matched by either x or y. For example, the ERE ((cd)|e)b matches the string cdb and the string eb. Single characters, or expressions matching single characters, separated by the vertical bar and enclosed in parentheses, match a single character. ERE Expression Anchoring ERE anchoring is the same as BRE anchoring, discussed above. ERE Precedence The order of precedence, for high to low, is as shown in the following table: collation-related bracket symbols [= =] [: :] [. .] escaped characters <special character> bracket expression [ ] grouping ( ) single-character duplication * + ? {i,j} concatenation anchoring ^ $ alternation | For example, the pattern ab|cd is the same as (ab)|(cd) and is not equivalent to a(b|c)d. EXIT STATUS
The exit values of the grep command are: A match was found. No match was found. A syntax error was found or a file was inaccessible, even if matches were found. EXAMPLES
To search several C-language source files for the pattern strcpy, enter: grep 'strcpy' *.c This searches for the string strcpy in all files in the current directory with names ending in To count the number of lines that match a pattern, enter: grep -F -c '{' pgm.c grep -F -c '}' pgm.c This displays the number of lines in pgm.c that contain left and right braces. If you do not put more than one { or } on a line in your C programs, and if the braces are properly balanced, then the two numbers displayed will be the same. If the numbers are not the same, then you can display the lines that contain braces with the command: grep -n -E '{|}' pgm.c To display all lines in a file that begin with an ASCII letter, enter: grep '^[a-zA-Z]' pgm.s Note that because grep -F searches only for fixed strings and does not use regular expressions such as bracket expressions or anchoring, the following command causes grep to search only for the literal string ^[a-zA-Z] in pgm.s: grep -F '^[a-zA-Z]' pgm.s To display all lines that contain ASCII letters in parentheses or digits in parentheses (with spaces optionally preceding and following the letters or digits), but not letter-digit combinations in parentheses, enter: grep -E '( *([a-zA-Z]*|[0-9]*) *)' my.txt This command displays lines in my.txt such as ( 783902) or (y), but not (alpha19c). Note that with grep -E, ( and ) match parentheses in the text and ( and ) are special characters that group parts of the pattern. With grep without the -E option, the reverse is true; use ( and ) to match parentheses and ( and ) to group characters. To dis- play all lines that do not match a pattern, enter: grep -v '^#' This displays all lines that do not begin with a # (number sign). To display the names of files that contain a pattern, enter: grep -F -l 'rose' *.list This searches the files in the current directory that end with and displays the names of those files that contain at least one line containing the string rose. To display all lines that contain uppercase characters, enter: grep [[:upper:]] pgm.s To display all lines that begin with a range of characters that includes a multicharacter collating symbol, enter: grep '^[a-[.ch.]]' pgm.s With your locale set to a Spanish locale, this command matches all lines that begin with a, b, c, or ch. ENVIRONMENT VARIABLES
The following environment variables affect the execution of grep, egrep, and fgrep: Provides a default value for the internationalization variables that are unset or null. If LANG is unset or null, the corresponding value from the default locale is used. If any of the inter- nationalization variables contain an invalid setting, the utility behaves as if none of the variables had been defined. If set to any string value, overrides the values of all the other internationalization variables. Determines the locale for the interpretation of sequences of bytes of text data as characters (for example, single-byte as opposed to multibyte characters in arguments and input files) and the behavior of character classes within regular expressions.. Determines the locale for the format and contents of diagnostic mes- sages written to standard error. Determines the location of message catalogues for the processing of LC_MESSAGES. SEE ALSO
Commands: ed(1), ex(1), ksh(1), sed(1), Bourne shell sh(1b), POSIX shell sh(1p) Standards: standards(5) grep(1)
Man Page