...Why are there so many different types of regular expression?
That shouldn't be so surprising. They are just different methods of arriving at the same solution. Just as there are different routes to go from point A to point B.
The regex in the first awk solution used a character class with the repetition quantifier "+".
Code:
/^[ ]+/
There's a blank space and a tab character in there which means that this regex searches for one or more occurrences, from the beginning, of either a space or tab character (or any combination of both). The sub function substitutes those with a zero-length string.
I think "\t" for the actual tab character should work; at least it does for gawk:
Code:
gawk '{sub(/^[ \t]+/,""); print}' file
The "+" repetition quantifier is an Extended Regular Expression (ERE). Gnu sed allows it, but the sed binaries in most Unix systems do not. So the other type of regex used in the sed solution was this -
Code:
/^ \{1,\}/
The bracket repetition operator is for finer control over repetition. "+" means one or more - there's no limit for "more". Whereas {m,n} means at least m at the most n repetitions. Further, {m,} means at least m repetitions - there's no upper limit here. So {1,} becomes equivalent to "+", which is why the sed solution works more or less the same. (It doesn't take care of tab characters though).
Quote:
...Perl regular expression is not similar to sed regular expression.
Perl is a different beast altogether. All the above concepts, BREs as well as EREs, are for POSIX regexes. Perl, on the other hand, started off by implementing Henry Spencer's regular expression library. It's regex syntax is richer, more consistent and more extensive than those of POSIX compliant regexes.
Hello,
I have the following to remove spaces from beginning and end of a string.
infile=`echo "$infilename" | sed 's/^ *//;s/ *$//`
How do I modify the above code to remove spaces from beginning, end and in the middle of the string also.
ex:
... (4 Replies)
Hello and thx for reading this
I'm using sed to remove only the leading spaces in a file
bash-280R# cat foofile
some text
some text
some text
some text
some text
bash-280R#
bash-280R# sed 's/^ *//' foofile > foofile.use
bash-280R# cat foofile.use
some text
some text
some text... (6 Replies)
How to delete ending/trailing spaces using awk,sed,perl?
Input:(each line has extra spaces at the end)
3456 565
3 7
35 878
Expected output:
3456 565
3 7
35 878 (5 Replies)
if the answer is obvious, sorry, I'm new here.
anyway, I'm using tr to encrypt with rot-13:
echo `cat $script | tr 'a-zA-Z' 'n-za-mN-ZA-M'` > $script
it works, but it removes any consecutive spaces so that there is just one space between words. I've had this problem before while using sed to... (5 Replies)
Hi all,
i am getting count from oracle 11g by spooling it to a file.
Now there are some newline characters and blank spaces i need to remove these.
pl provide me a awk/sed solution.
the spooled file is attached.
i tried this.. but not getting req o/p (6 Replies)
Hi, I'm writing a ksh script and trying to use an awk / sed / or perl one-liner to remove the last 4 characters of a line in a file if it begins with a period.
Here is the contents of the file... the column in which I want to remove the last 4 characters is the last column. ($6 in awk). I've... (10 Replies)
I have a variable
I want to remove the spaces in between.
The output should be
How can this be done
Any help will be appreciated. Thanks in advance (1 Reply)
Greetings All,
I would like to find all occurences of a pattern and delete a substring from the all matching lines EXCEPT the first. For example:
1234::group:user1,user2,user3,blah1,blah2,blah3
2222::othergroup:user9,user8
4444::othergroup2:user3,blah,blah,user1
1234::group3:user5,user1
... (11 Replies)
The following command works echo "some text with spaces" | sh -c 'sed -e 's/t//g''But this doesn't and should echo "some text with spaces" | sh -c 'sed -e 's/ //g''Any ideas? (3 Replies)
Discussion started by: Tribe
3 Replies
LEARN ABOUT OSF1
regcmp
regcmp(3) Library Functions Manual regcmp(3)NAME
regcmp, regex - Compile and execute regular expression
LIBRARY
Standard C Library (libc.so, libc. a)
SYNOPSIS
#include <libgen.h>
char *regcmp( const char *string1, ... /*, (char *)0 */);
char *regex( const char *re, const char *subject, ... );
STANDARDS
Interfaces documented on this reference page conform to industry standards as follows:
regcmp(), regex(): XPG4-UNIX
Refer to the standards(5) reference page for more information about industry standards and associated tags.
PARAMETERS
Points to the string that is to be matched or converted. Points to a compiled regular expression string. Points to the string that is to
be matched against re.
DESCRIPTION
The regcmp() function compiles a regular expression consisting of the concatenated arguments and returns a pointer to the compiled form.
The end of arguments is indicated by a null pointer. The malloc() function is used to create space for the compiled form. It is the
responsibility of the process to free unneeded space so allocated. A null pointer returned from regcmp() indicates an invalid argument.
The regex() function executes a compiled pattern against the subject string. Additional arguments of type char must be passed to receive
matched subexpressions back. A global character pointer, __loc1, points to the first matched character in the subject string.
The regcmp() and regex() functions support the simple regular expressions which are defined in the grep(1) reference page, but the syntax
and semantics are slightly different. The following are the valid symbols and their associated meanings: The left and right bracket,
asterisk, period, and circumflex symbols retain their meanings as defined in the grep(1) reference page. A dollar sign matches the end of
the string;
matches a new line. Used within brackets, the hyphen signifies an ASCII character range. For example [a-z] is equivalent
to [abcd...xyz]. The - (hyphen) can represent itself only if used as the first or last character. For example, the character class
expression []-] matches the characters ] (right bracket) and - (hyphen). A regular expression followed by a + (plus sign) means one or
more times. For example, [0-9]+ is equivalent to [0-9][0-9]*. Integer values enclosed in {} braces indicate the number of times the pre-
ceding regular expression can be applied. The value m is the minimum number and u is a number, less than 256, which is the maximum. The
syntax {m} indicates the exact number of times the regular expression can be applied. The syntax {m,} is analogous to {m,infinity}. The +
(plus sign) and * (asterisk) operations are equivalent to {1,} and {0,}, respectively. The value of the enclosed regular expression is
returned. The value is stored in the (n+1)th argument following the subject argument. A maximum of ten enclosed regular expressions are
allowed. The regex() function makes its assignments unconditionally. Parentheses are used for grouping. An operator, such as *, +, or
{}, can work on a single character or a regular expression enclosed in parentheses. For example, (a*(cb+)*)$0.
Since all of the symbols defined above are special characters, they must be escaped to be used as themselves.
NOTES
The regcmp() and regex() interfaces are scheduled to be withdrawn from a future version of the X/Open CAE Specification.
These interfaces are obsolete; they are guaranteed to function properly only in the C/POSIX locale and so should be avoided. Use the POSIX
regcomp() interface instead of regcmp() and regex().
RETURN VALUES
Upon successful completion, the regcmp() function returns a pointer to the compiled regular expression. Otherwise, a null pointer is
returned and errno may be set to indicate the error.
Upon successful completion, the regex() function returns a pointer to the next unmatched character in the subject string. Otherwise, a
null pointer is returned.
RELATED INFORMATION
Commands: grep(1)
Functions: malloc(3), regcomp(3)
Standards: standards(5) delim off
regcmp(3)