Sponsored Content
Full Discussion: Combining multiple greps
Top Forums UNIX for Beginners Questions & Answers Combining multiple greps Post 303031197 by RudiC on Saturday 23rd of February 2019 09:31:16 AM
Old 02-23-2019
In your post, you're not "grepping filenames", but grepping text that is the result of an ls command, containing file names.


As you pointed out, you need to differentiate between "grepping and globbing", which are not the same even though it sometimes may seem so. Either is a malapropism; the exact terms would be "regex matching" and "pattern matching".

"Globbing" deals with patterns and is done by the shell, mostly when dealing with directory contents. And, in one exceptional case, some recent shells can deal with regexes: in "conditional expressions". man bash:
Quote:
A ... binary operator, =~, is available ... the string to the right of the operator is considered an extended regular expression and matched accordingly
"Grepping" deals with regexes, basic and extended, abbr. BREs and EREs. They have many subtleties, and it pays off to spend some time reading the man page.

Patterns and regexes in principle have different syntaxes. There are some overlaps, e.g. the [...] bracket expression meaning "Match any one of the enclosed characters", but also "faux amis" (false friends) like the * . It's always annoying to keep those differences in mind when dealing with either, and I have to test and experiment every single time when I switch from one to the other.

Last edited by RudiC; 02-23-2019 at 11:45 AM..
These 3 Users Gave Thanks to RudiC For This Post:
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Combining multiple lines

I am fairly new to scripting. But I have been able to extract and format all of my information required into one file. My issue is that one character is on a separate line. I need to be able to add the character to the previous line. ex. abcdefghi 1 bcdefghij 3 cdefghijk 4 need to... (4 Replies)
Discussion started by: DUST
4 Replies

2. Shell Programming and Scripting

Sed: Combining Multiple Lines into one

Before I ask my actual question, is it going to be a problem that I want to run this process on a 15 Gig file that is ~140 million rows? What I'm trying to do: I have a file that looks like Color,Type,Count,Day Yellow,Full 5 Tuesday Green,Half 6 Wednesday Purple,Half 8 Tuesday ...... (3 Replies)
Discussion started by: goldfish
3 Replies

3. Shell Programming and Scripting

Combining multiple commands

Hi Guys, I am looking to optimze these 5 SSH lines to a single SSH to get my machine to not hang! lol! cat hosts.lst | xargs -n1 -t -i echo 'home/util/timeout 6 0 ssh -q {} top -b > util/{}.top &' >> r_query_info cat hosts.lst | xargs -n1 -t -i echo 'home/util/timeout 6 0 ssh -q {} uname -r... (5 Replies)
Discussion started by: wick3dsunny
5 Replies

4. Shell Programming and Scripting

Combining multiple files into one with the same name/different extension

I've been trying to find information in regard to creating a script that will generate HTML files. I currently have a series of files that contain code I need to surround with a <textarea> tag for easy viewing. I have about a thousand files that contain code, one file that contains the HTML code up... (10 Replies)
Discussion started by: 12o
10 Replies

5. Shell Programming and Scripting

Combining multiple variables into new variable

Hello, I am a new joiner to the forum, and have what i hope is a simple question, however I can't seem to find the answer so maybe it is not available within bash scripting. I intend to use the below script to archive files from multiple directories at once by using a loop, and a variable (n)... (10 Replies)
Discussion started by: dring
10 Replies

6. Shell Programming and Scripting

combining multiple sed statements

I need to run a cronjob that will monitor a directory for files with a certain extension, when one appears I then need to run the below scripts How do I go about combining the following sed statements into one script? and also retain the original filename.? sed 's/71502FSC1206/\n&/g' # add a... (2 Replies)
Discussion started by: firefox2k2
2 Replies

7. Shell Programming and Scripting

Bash Scipting (New); Run multiple greps > multiple files

Hi everyone, I'm new to the forums, as you can probably tell... I'm also pretty new to scripting and writing any type of code. I needed to know exactly how I can grep for multiple strings, in files located in one directory, but I need each string to output to a separate file. So I'd... (19 Replies)
Discussion started by: LDHB2012
19 Replies

8. Shell Programming and Scripting

Combining multiple files

I have 2 files. each having 3 coloums 1st field date as 20130322 2nd field time as 05:55 3rd field numberic value File 2 has entries missing for some date time. FILE1 20130322 05:35 2219 20130322 05:40 1809 20130322 05:45 1617 20130322 05:50 ... (2 Replies)
Discussion started by: sandeepkmehra
2 Replies

9. Shell Programming and Scripting

Combining multiple seds into one awk

i'm trying to optimize my script. i have a lot of instances where i'm doing something like this: echo $blah | sed 's~ ~|~g' | sed 's-_space_- -g' | sed 's-_LP_-(-g' | sed 's-_RP_-)-g' obviously, this is very inefficient. i know i can combine into one sed command with something this, using... (7 Replies)
Discussion started by: SkySmart
7 Replies

10. UNIX for Beginners Questions & Answers

Combining multiple files into one

Hello Everyone, I have 4 different files (one column in each) that I'm trying to combine into 1 file with four columns. Having issues trying to get the columns to format properly. I have tried the following: paste file1 file2 file3 file4 | column -s $'\t' -t > results.txt paste file1 file2... (1 Reply)
Discussion started by: malk71
1 Replies
regexp(5)							File Formats Manual							 regexp(5)

NAME
regexp - regular expression and pattern matching notation definitions DESCRIPTION
A is a mechanism supported by many utilities for locating and manipulating patterns in text. is used by shells and other utilities for file name expansion. This manual entry defines two forms of regular expressions: and and the one form of BASIC REGULAR EXPRESSIONS
Basic regular expression (RE) notation and construction rules apply to utilities defined as using basic REs. Any exceptions to the follow- ing rules are noted in the descriptions of the specific utilities that use REs. REs Matching a Single Character The following REs match a single character or a single collating element: An ordinary character is an RE that matches itself. An ordinary character is any character in the supported character set except newline and the regular expression special characters listed in Special Characters below. An ordinary character preceded by a backslash is treated as the ordinary character itself, except when the character is or or the digits through (see REs Matching Multiple Characters). Matching is based on the bit pattern used for encoding the character; not on the graphic representation of the character. A regular expression special character preceded by a backslash is a regular expression that matches the special character itself. When not preceded by a backslash, such characters have special meaning in the specification of REs. Regular expression special characters and the contexts in which they have special meaning are: The period, left square bracket, and backslash are special except when used in a bracket expression (see RE Bracket Expression). The asterisk is special except when used in a bracket expression, as the first character of a regular expression, or as the first character following the character pair (see REs Matching Multiple Characters). The circumflex is special when used as the first character of an entire RE (see Expression Anchoring) or as the first character of a bracket expression. The dollar sign is special when used as the last character of an entire RE (see Expression Anchoring). delimiter Any character used to bound (i.e., delimit) an entire RE is special for that RE. A period when used outside of a bracket expression, is an RE that matches any printable or nonprintable character except newline. RE Bracket Expression A bracket expression enclosed in square brackets is an RE that matches a single collating element contained in the nonempty set of collat- ing elements represented by the bracket expression. The following rules apply to bracket expressions: A bracket expression is either a or a and consists of one or more expressions in any order. Expressions can be: collating elements, collating symbols, noncollating characters, equivalence classes, range expressions, or character classes. The right bracket loses its special meaning and represents itself in a bracket expression if it occurs first in the list (after an initial if any). Otherwise, it terminates the bracket expression (unless it is the ending right bracket for a valid collating symbol, equivalence class, or character class, or it is the collating element within a collating symbol or equivalence class expression). The special characters (period, asterisk, left bracket, and backslash) lose their special meaning within a bracket expression. The character sequences: (left-bracket followed by a period, equal-sign or colon) are special inside a bracket expression and are used to delimit collating symbols, equivalence class expressions and character class expressions. These symbols must be fol- lowed by a valid expression and the matching terminating or A matching list expression specifies a list that matches any one of the characters represented in the list. The first character in the list cannot be the circumflex. For example, is an RE that matches any of or A expression begins with a circumflex and specifies a list that matches any character or collating element except newline and the characters represented in the list. For example, is an RE that matches any character except newline or or The circumflex has this special meaning when it occurs first in the list, immediately following the left square bracket. A is a sequence of one or more characters that represents a single element in the collating sequence as identified via the most current setting of the locale variable (see setlocale(3C)). A is a collating element enclosed within bracket-period delimiters. Multicharacter collating elements must be repre- sented as collating symbols to distinguish them from single-character collating elements. For example, if the string is a valid collating element, then is treated as an element matching the same string of characters, while is treated as a simple list of the characters and If the string within the bracket-period delimiters is not a valid collating element in the current collating sequence definition, the symbol is treated as an invalid expression. A is a character that is ignored for collating purposes. By definition, such characters cannot participate in equiva- lence classes or range expressions. An expression represents the set of collating elements belonging to an equivalence class. It is expressed by enclosing any one of the collating elements in the equivalence class within bracket-equal delimiters. For example, if and belong to the same equivalence class, then and are each equivalent to A represents the set of collating elements that fall between two elements in the current collation sequence as defined via the most current setting of the locale variable (see setlocale(3C)). It is expressed as the starting point and the ending point separated by a hyphen The starting range point and the ending range point must be a collating element, collating symbol, or equivalence class expression. An equivalence class expression used as an end point of a range expression is interpreted such that all collating elements within the equivalence class are included in the range. For example, if the collating order is and and the characters and belong to the same equivalence class, then the expression is treated as Both starting and ending range points must be valid collating elements, collating symbols, or equivalence class expres- sions, and the ending range point must collate equal to or higher than the starting range point; otherwise the expres- sion is invalid. For example, with the above collating order and assuming that is a noncollating character, then both the expressions and are invalid. An ending range point can also be the starting range point in a subsequent range expression. Each such range expres- sion is evaluated separately. For example, the bracket expression is treated as The hyphen character is treated as itself if it occurs first (after an initial if any) or last in the list, or as the rightmost symbol in a range expression. As examples, the expressions and are equivalent and match any of the charac- ters or the expressions and are equivalent and match any characters except newline, or the expression matches any of the characters in the defined collating sequence between and inclusive; the expression matches any of the characters in the defined collating sequence between and inclusive; and the expression is invalid, assuming precedes in the collating sequence. If a bracket expression must specify both and the must be placed first (after the if any) and the last within the bracket expression. A character class expression represents the set of characters belonging to a character class, as defined via the most current setting of the locale variable It is expressed as a character class name enclosed within bracket-colon delimiters. Standard character class expressions supported in all locales are: letters upper-case letters lower-case letters decimal digits hexadecimal digits letters or decimal digits characters producing white-space in displayed text printing characters punctuation characters characters with a visible representation control characters blank characters For example, if the locale variable is set to the expression is equivalent to Similarly the expression is same as REs Matching Multiple Characters The following rules may be used to construct REs matching multiple characters from REs matching a single character: RERE The concatenation of REs is an RE that matches the first encountered concatenation of the strings matched by each com- ponent of the RE. For example, the RE matches the second and third characters of the string An RE matching a single character followed by an asterisk is an RE that matches zero or more occurrences of the RE preceding the asterisk. The first encountered string that permits a match is chosen, and the matched string will encompass the maximum number of characters permitted by the RE. For example, in the string both the RE and the RE are matched by the substring in the second through fifth positions. An asterisk as the first character of an RE loses this special meaning and is treated as itself. A subexpression can be defined within an RE by enclosing it between the character pairs and Such a subexpression matches whatever it would have matched without the and Subexpressions can be arbitrarily nested. An asterisk immediately following the loses its special meaning and is treated as itself. An asterisk immediately following the is treated as an invalid character. The expression matches the same string of characters as was matched by a subexpression enclosed between and preceding the The charac- ter n must be a digit from through specifying the n-th subexpression (the one that begins with the n-th and ends with the corresponding paired For example, the expression matches a line consisting of two adjacent appearances of the same string. If the is followed by an asterisk, it matches zero or more occurrences of the subexpression referred to. For example, the expression matches the string An RE matching a single character followed by or is an RE that matches repeated occurrences of the RE. The values of m and n must be decimal integers in the range 0 through 255, with m specifying the exact or minimum number of occurrences and n specifying the maximum number of occur- rences. matches exactly m occurrences of the preceding RE, matches at least m occurrences, and matches any number of occurrences between m and n, inclusive. The first encountered string that matches the expression is chosen; it will contain as many occurrences of the RE as possible. For example, in the string the RE is matched by characters two through four, the RE is matched by characters two through eight, and the RE is matched by characters four through nine. Expression Anchoring An RE can be limited to matching strings that begin or end a line (i.e., anchored) according to the following rules: o A circumflex as the first character of an RE anchors the expression to the beginning of a line; only strings starting at the first character of a line are matched by the RE. For example, the RE matches the string in the line but not the same string in the line o A dollar sign as the last character of an RE anchors the expression to the end of a line; only strings ending at the last character of a line are matched by the RE. For example, the RE matches the string in the line but not the same string in the line o An RE anchored by both and matches only strings that are lines. For example, the RE matches only lines consisting of the string The use of duplication characters (+,*) following anchors is illegal. EXTENDED REGULAR EXPRESSIONS
The extended regular expression (ERE) notation and construction rules apply to utilities defined as using extended REs. Any exceptions to the following rules are noted in the descriptions of the specific utilities using EREs. EREs Matching a Single Character The following EREs match a single character or a single collating element: An ordinary character is an ERE that matches itself. An ordi- nary character is any character in the supported character set except newline and the regular expression special characters listed in Spe- cial Characters below. An ordinary character preceded by a backslash is treated as the ordinary character itself. Matching is based on the bit pattern used for encoding the character, not on the graphic representation of the character. A regular expression special charac- ter preceded by a backslash is a regular expression that matches the special character itself. When not preceded by a backslash, such characters have special meaning in the specification of EREs. The extended regular expression special characters and the contexts in which they have their special meaning are: The period, left square bracket, backslash, left parenthesis, right parenthesis, asterisk, plus sign, question mark, dollar sign, and vertical bar are special except when used in a bracket expression (see ERE Bracket Expression). The circumflex is special except when used in a bracket expression in a non-leading position. delimiter Any character used to bound (i.e., delimit) an entire ERE is special for that ERE. A period when used outside of a bracket expression, is an ERE that matches any printable or nonprintable character except newline. ERE Bracket Expression The syntax and rules for ERE bracket expressions are the same as for RE bracket expressions found above. EREs Matching Multiple Characters The following rules may be used to construct EREs matching multiple characters from EREs matching a single character: EREERE A concatenation of EREs matches the first encountered concatenation of the strings matched by each component of the ERE. Such a concatenation of EREs enclosed in parentheses matches whatever the concatenation without the parentheses matches. For example, both the ERE and the ERE matches the second and third characters of the string The longest over- all string is matched. The special character plus when following an ERE matching a single character, or a concatenation of EREs enclosed in parenthesis, is an ERE that matches one or more occurrences of the ERE preceding the plus sign. The string matched will contain as many occur- rences as possible. For example, the ERE matches the fourth through seventh characters in the string The special character asterisk when following an ERE matching a single character, or a concatenation of EREs enclosed in parenthesis, is an ERE that matches zero or more occurrences of the ERE preceding the asterisk. For example, the ERE matches the first character in the string If there is any choice, the longest left-most string that permits a match is chosen. For example, the ERE matches the third through seventh characters in the string The special character question mark when following an ERE matching a single character, or a concatenation of EREs enclosed in parenthesis, is an ERE that matches zero or one occurrences of the ERE preceding the question mark. The string matched will contain as many occur- rences as possible. For example, the ERE matches the second character in the string interval expression that functions the same way as basic regular expression syntax, Alternation Two EREs separated by the special character vertical bar matches a string that is matched by either ERE. For example, the ERE matches the string and the string A vertical bar '|' may not appear as follows: may not appear first or last in an ERE. may not appear immediately following a vertical bar. may not appear after a left parenthesis. may not appear immediately preceding a right parenthesis. Precedence The order of precedence is as follows, from high to low: square brackets asterisk, plus sign, question mark anchoring concatenation alternation For example, the ERE is interpreted as "match either or It does not mean "match followed by or followed in turn by (because concatenation has a higher order of precedence than alternation). Expression Anchoring An ERE can be limited to matching strings that begin or end a line (i.e., anchored) according to the following rules: o A circumflex matches the beginning of a line (anchors the expression to the beginning of a line). For example, the ERE matches the string in the line but not the same string in the line o A dollar sign matches the end of a line (anchors the expression to the end of a line). For example, the ERE matches the string in the line but not the same string in the line o An ERE anchored by both and matches only strings that are lines. For example, the ERE matches only lines consisting of the string Only empty lines match the ERE The use of duplication characters (+,*) following anchors is illegal. PATTERN MATCHING NOTATION
The following rules apply to pattern matching notation except as noted in the descriptions of the specific utilities using pattern match- ing. Patterns Matching a Single Character The following patterns match a single character or a single collating element: An ordinary character is a pattern that matches itself. An ordinary character is any character in the supported character set except newline and the pattern matching special characters listed in Special Characters below. Matching is based on the bit pattern used for encoding the character, not on the graphic representation of the character. A pattern matching special character preceded by a backslash is a pattern that matches the special character itself. When not preceded by a backslash, such characters have special meaning in the specification of patterns. The pattern matching special characters and the contexts in which they have their special meaning are: The question mark, asterisk, and left square bracket are special except when used in a bracket expression (see Pattern Bracket Expression). A question mark when used outside of a bracket expression, is a pattern that matches any printable or nonprintable character except new- line. Pattern Bracket Expression The syntax and rules for pattern bracket expressions are the same as for RE bracket expressions found above with the following exceptions: The exclamation point character replaces the circumflex character in its role in a non-matching list in the regular expression nota- tion. The backslash is used as an escape character within bracket expressions. Patterns Matching Multiple Characters The following rules may be used to construct patterns matching multiple characters from patterns matching a single character: The asterisk is a pattern that matches any string, including the null string. RERE The concatenation of patterns matching a single character is a valid pattern that matches the concatenation of the single characters or collating elements matched by each of the concatenated patterns. For example, the pattern matches the string and The concatenation of one or more patterns matching a single character with one or more asterisks is a valid pattern. In such patterns, each asterisk matches a string of zero or more characters, up to the first character that matches the character following the asterisk in the pattern. For example, the pattern matches the strings and but not the string When an asterisk is the first or last character in a pattern, it matches zero or more characters that precede or follow the characters matched by the remainder of the pattern. For example, the pattern matches the strings and the pattern matches the strings and Rule Qualification for Patterns Used for Filename Expansion The rules described above for pattern matching are qualified by the following rules when the pattern matching notation is used for filename expansion by sh(1), csh(1), ksh(1), and make(1). If a filename (including the component of a pathname that follows the slash character) begins with a period the period must be explicitly matched by using a period as the first character of the pattern; it cannot be matched by either the asterisk special character, the question mark special character, or a bracket expression. This rule does not apply to make(1). The slash character in a pathname must be explicitly matched by using a slash in the pattern; it cannot be matched by either the asterisk special character, the question mark special character, or a bracket expression. For make(1) only the part of the pathname following the last slash character can be matched by a special character. That is, all special characters preceding the last slash character lose their special meaning. Specified patterns are matched against existing filenames and pathnames, as appropriate. If the pattern matches any existing file- names or pathnames, the pattern is replaced with those filenames and pathnames, sorted according to the collating sequence in effect. If the pattern does not match any existing filenames or pathnames, the pattern string is left unchanged. If the pattern begins with a tilde character, all of the ordinary characters preceding the first slash (or all characters if there is no slash) are treated as a possible login name. If the login name is null (i.e., the pattern contains only the tilde or the tilde is immediately followed by a slash), the tilde is replaced by a pathname of the process's home directory, followed by a slash. Otherwise, the combination of tilde and login name are replaced by a pathname of the home directory associated with the login name, followed by a slash. If the system cannot identify the login name, the result is implementation-defined. This rule does not apply to sh(1) or make(1). If the pattern contains a character, variable substitution can take place. Environmental variables can be embedded within patterns as: or: Braces are used to guarantee that characters following name are not interpreted as belonging to name. Substitution occurs in the order specified only once; that is, the resulting string is not examined again for new names that occurred because of the substitu- tion. Rule Qualification for Patterns Used in the case Command The rules described above for pattern matching are qualified by the following rule when the pattern matching notation is used in the case command of sh(1) and ksh(1). Multiple alternative patterns in a single clause can be specified by separating individual patterns with the vertical bar character strings matching any of the patterns separated this way will cause the corresponding command list to be selected. SEE ALSO
ksh(1), sh(1), fnmatch(3C), glob(3C), regcomp(3C), setlocale(3C), environ(5). STANDARDS CONFORMANCE
regexp(5)
All times are GMT -4. The time now is 06:49 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy