Sponsored Content
Top Forums Shell Programming and Scripting How to remove spaces using awk,sed,perl? Post 302446984 by durden_tyler on Friday 20th of August 2010 10:46:58 AM
Old 08-20-2010
Quote:
Originally Posted by cola
...Why are there so many different types of regular expression?
That shouldn't be so surprising. They are just different methods of arriving at the same solution. Just as there are different routes to go from point A to point B.

The regex in the first awk solution used a character class with the repetition quantifier "+".

Code:
/^[ 	]+/

There's a blank space and a tab character in there which means that this regex searches for one or more occurrences, from the beginning, of either a space or tab character (or any combination of both). The sub function substitutes those with a zero-length string.
I think "\t" for the actual tab character should work; at least it does for gawk:

Code:
gawk '{sub(/^[ \t]+/,""); print}' file

The "+" repetition quantifier is an Extended Regular Expression (ERE). Gnu sed allows it, but the sed binaries in most Unix systems do not. So the other type of regex used in the sed solution was this -

Code:
/^ \{1,\}/

The bracket repetition operator is for finer control over repetition. "+" means one or more - there's no limit for "more". Whereas {m,n} means at least m at the most n repetitions. Further, {m,} means at least m repetitions - there's no upper limit here. So {1,} becomes equivalent to "+", which is why the sed solution works more or less the same. (It doesn't take care of tab characters though).

Quote:
...Perl regular expression is not similar to sed regular expression.
Perl is a different beast altogether. All the above concepts, BREs as well as EREs, are for POSIX regexes. Perl, on the other hand, started off by implementing Henry Spencer's regular expression library. It's regex syntax is richer, more consistent and more extensive than those of POSIX compliant regexes.

HTH,
tyler_durden
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

how to remove spaces in a string using sed.

Hello, I have the following to remove spaces from beginning and end of a string. infile=`echo "$infilename" | sed 's/^ *//;s/ *$//` How do I modify the above code to remove spaces from beginning, end and in the middle of the string also. ex: ... (4 Replies)
Discussion started by: radhika
4 Replies

2. Shell Programming and Scripting

sed over writes my original file (using sed to remove leading spaces)

Hello and thx for reading this I'm using sed to remove only the leading spaces in a file bash-280R# cat foofile some text some text some text some text some text bash-280R# bash-280R# sed 's/^ *//' foofile > foofile.use bash-280R# cat foofile.use some text some text some text... (6 Replies)
Discussion started by: laser
6 Replies

3. Shell Programming and Scripting

How to delete ending/trailing spaces using awk,sed,perl?

How to delete ending/trailing spaces using awk,sed,perl? Input:(each line has extra spaces at the end) 3456 565 3 7 35 878 Expected output: 3456 565 3 7 35 878 (5 Replies)
Discussion started by: cola
5 Replies

4. Shell Programming and Scripting

Command to remove duplicate lines with perl,sed,awk

Input: hello hello hello hello monkey donkey hello hello drink dance drink Output should be: hello hello monkey donkey drink dance (9 Replies)
Discussion started by: cola
9 Replies

5. Shell Programming and Scripting

tr and sed remove spaces. how to stop this?

if the answer is obvious, sorry, I'm new here. anyway, I'm using tr to encrypt with rot-13: echo `cat $script | tr 'a-zA-Z' 'n-za-mN-ZA-M'` > $script it works, but it removes any consecutive spaces so that there is just one space between words. I've had this problem before while using sed to... (5 Replies)
Discussion started by: Trichopterus
5 Replies

6. Shell Programming and Scripting

sed remove newlines and spaces

Hi all, i am getting count from oracle 11g by spooling it to a file. Now there are some newline characters and blank spaces i need to remove these. pl provide me a awk/sed solution. the spooled file is attached. i tried this.. but not getting req o/p (6 Replies)
Discussion started by: rishav
6 Replies

7. Shell Programming and Scripting

Need an awk / sed / or perl one-liner to remove last 4 characters with non-unique pattern.

Hi, I'm writing a ksh script and trying to use an awk / sed / or perl one-liner to remove the last 4 characters of a line in a file if it begins with a period. Here is the contents of the file... the column in which I want to remove the last 4 characters is the last column. ($6 in awk). I've... (10 Replies)
Discussion started by: right_coaster
10 Replies

8. Shell Programming and Scripting

PERL : Remove spaces in a variable

I have a variable I want to remove the spaces in between. The output should be How can this be done Any help will be appreciated. Thanks in advance (1 Reply)
Discussion started by: irudayaraj
1 Replies

9. Shell Programming and Scripting

Using sed, awk or perl to remove substring of all lines except the first

Greetings All, I would like to find all occurences of a pattern and delete a substring from the all matching lines EXCEPT the first. For example: 1234::group:user1,user2,user3,blah1,blah2,blah3 2222::othergroup:user9,user8 4444::othergroup2:user3,blah,blah,user1 1234::group3:user5,user1 ... (11 Replies)
Discussion started by: jacksolm
11 Replies

10. Shell Programming and Scripting

Can't remove spaces with sed when calling it from sh -c

The following command works echo "some text with spaces" | sh -c 'sed -e 's/t//g''But this doesn't and should echo "some text with spaces" | sh -c 'sed -e 's/ //g''Any ideas? (3 Replies)
Discussion started by: Tribe
3 Replies
regcmp(3)						     Library Functions Manual							 regcmp(3)

NAME
regcmp, regex - Compile and execute regular expression LIBRARY
Standard C Library (libc.so, libc. a) SYNOPSIS
#include <libgen.h> char *regcmp( const char *string1, ... /*, (char *)0 */); char *regex( const char *re, const char *subject, ... ); STANDARDS
Interfaces documented on this reference page conform to industry standards as follows: regcmp(), regex(): XPG4-UNIX Refer to the standards(5) reference page for more information about industry standards and associated tags. PARAMETERS
Points to the string that is to be matched or converted. Points to a compiled regular expression string. Points to the string that is to be matched against re. DESCRIPTION
The regcmp() function compiles a regular expression consisting of the concatenated arguments and returns a pointer to the compiled form. The end of arguments is indicated by a null pointer. The malloc() function is used to create space for the compiled form. It is the responsibility of the process to free unneeded space so allocated. A null pointer returned from regcmp() indicates an invalid argument. The regex() function executes a compiled pattern against the subject string. Additional arguments of type char must be passed to receive matched subexpressions back. A global character pointer, __loc1, points to the first matched character in the subject string. The regcmp() and regex() functions support the simple regular expressions which are defined in the grep(1) reference page, but the syntax and semantics are slightly different. The following are the valid symbols and their associated meanings: The left and right bracket, asterisk, period, and circumflex symbols retain their meanings as defined in the grep(1) reference page. A dollar sign matches the end of the string; matches a new line. Used within brackets, the hyphen signifies an ASCII character range. For example [a-z] is equivalent to [abcd...xyz]. The - (hyphen) can represent itself only if used as the first or last character. For example, the character class expression []-] matches the characters ] (right bracket) and - (hyphen). A regular expression followed by a + (plus sign) means one or more times. For example, [0-9]+ is equivalent to [0-9][0-9]*. Integer values enclosed in {} braces indicate the number of times the pre- ceding regular expression can be applied. The value m is the minimum number and u is a number, less than 256, which is the maximum. The syntax {m} indicates the exact number of times the regular expression can be applied. The syntax {m,} is analogous to {m,infinity}. The + (plus sign) and * (asterisk) operations are equivalent to {1,} and {0,}, respectively. The value of the enclosed regular expression is returned. The value is stored in the (n+1)th argument following the subject argument. A maximum of ten enclosed regular expressions are allowed. The regex() function makes its assignments unconditionally. Parentheses are used for grouping. An operator, such as *, +, or {}, can work on a single character or a regular expression enclosed in parentheses. For example, (a*(cb+)*)$0. Since all of the symbols defined above are special characters, they must be escaped to be used as themselves. NOTES
The regcmp() and regex() interfaces are scheduled to be withdrawn from a future version of the X/Open CAE Specification. These interfaces are obsolete; they are guaranteed to function properly only in the C/POSIX locale and so should be avoided. Use the POSIX regcomp() interface instead of regcmp() and regex(). RETURN VALUES
Upon successful completion, the regcmp() function returns a pointer to the compiled regular expression. Otherwise, a null pointer is returned and errno may be set to indicate the error. Upon successful completion, the regex() function returns a pointer to the next unmatched character in the subject string. Otherwise, a null pointer is returned. RELATED INFORMATION
Commands: grep(1) Functions: malloc(3), regcomp(3) Standards: standards(5) delim off regcmp(3)
All times are GMT -4. The time now is 06:21 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy