Understanding sed

03-01-2013

Registered User

1,170, 106

Join Date: Sep 2008

Last Activity: 10 October 2019, 7:06 AM EDT

Posts: 1,170

Thanks Given: 22

Thanked 106 Times in 101 Posts

Understanding sed

Hi,

can some one suggest me,how "sed" is managed to delete the second field here.

Any explanation on , how the below code is working would be appreciated.

Code:

sed 's/^\([^:]*\):[^:]:/\1::/'  /etc/passwd

sed 's/[^:]*:/:/2'  /etc/passwd

panyam

View Public Profile for panyam

Find all posts by panyam

03-01-2013

Moderator

12,296, 3,792

Join Date: Nov 2008

Last Activity: 1 January 2021, 1:47 AM EST

Location: Amsterdam

Posts: 12,296

Thanks Given: 679

Thanked 3,792 Times in 3,282 Posts

Hi, the first one deletes the second field only if it consists of one character... It replace the first field and a colon + a character and a colon with the first field (that was captured with \( .. \) and recalled by\1) and two colons

The second replaces any number of non-colons and a colon by a colon, the number two means the it does this with the second occurrence on a line...

This User Gave Thanks to Scrutinizer For This Post:

Scrutinizer

View Public Profile for Scrutinizer

Find all posts by Scrutinizer

03-01-2013

Registered User

1,170, 106

Join Date: Sep 2008

Last Activity: 10 October 2019, 7:06 AM EDT

Posts: 1,170

Thanks Given: 22

Thanked 106 Times in 101 Posts

Hi Scrutinizer,

Thanks for the reply. I got your second correctly.

Your first answer is slightly confusing me!..

Let's say the sample input is:

Code:

abcd:x:panyam:panyam:Panyam:512

echo "abcd:x:panyam:panyam:Panyam:512" | sed 's/^\([^:]*\):[^:]:/\1::/' gives me:

abcd::panyam:panyam:Panyam:512

\([^:]*\) matches : abcd
[^:] matches :x

\1 : prints only "abcd"..

Now, how the rest of the line is coming in output as it is? I mean

Code:

"panyam:panyam:Panyam:512"

.

Is it because sed prints the non matching patterns as it is?

Is my understanding correct?

panyam

View Public Profile for panyam

Find all posts by panyam

03-01-2013

Moderator

12,296, 3,792

Join Date: Nov 2008

Last Activity: 1 January 2021, 1:47 AM EST

Location: Amsterdam

Posts: 12,296

Thanks Given: 679

Thanked 3,792 Times in 3,282 Posts

Hi Panyam, yes the rest of the line remains unaltered. It is not part of the substitution, so it gets printed like it is.

Scrutinizer

View Public Profile for Scrutinizer

Find all posts by Scrutinizer

03-01-2013

Registered User

446, 96

Join Date: Oct 2010

Last Activity: 7 June 2018, 4:01 PM EDT

Posts: 446

Thanks Given: 32

Thanked 96 Times in 88 Posts

An attempt to further explain the regular expression, with an ulterior motive of setting up for a question.

Given:

Code:

sed 's/^\([^:]*\):[^:]:/\1::/'

search for a pattern in the string matching:

Code:

^    = start of the line
\(   = Start of first remembered pattern
[^:] = followed by any character that is not a :
*    = followed by any number of the previous character class
       (characters that are not colons)
\)   = end of first remembered pattern
:    = followed by a colon
[^:] = followed by any character that is not a colon
:    = followed by a colon

<describes the first 2 fields, along with their seperators>

Replace with:

Code:

\1   = the first remembered pattern (the first field)
::   = followed by 2 literal colons

In other words, replace the first 2 colon separated fields
with the first field and 2 colons (deletes the 2nd field).

Question: if this sed command was in a script, could it be commented like I did above in the code? Can a sed regex be multi-line with comments?

One could also do:

Code:

s/:[^:]*:/::/

gary_w

View Public Profile for gary_w

Find all posts by gary_w

03-01-2013

Registered User

3,231, 978

Join Date: Dec 2009

Last Activity: 11 June 2014, 8:40 PM EDT

Posts: 3,231

Thanks Given: 179

Thanked 978 Times in 791 Posts

Quote:

Originally Posted by gary_w

Question: if this sed command was in a script, could it be commented like I did above in the code? Can a sed regex be multi-line with comments?

Nope. You can have comments in a sed script, but not within a regular expression. What you are asking is possible with perl if you use the /x regular expression modifier.

Regards,
Alister

---------- Post updated at 10:54 PM ---------- Previous update was at 10:49 PM ----------

Quote:

Originally Posted by gary_w

One could also do:

Code:

s/:[^:]*:/::/

s/:[^:]*/:/ would work just as well, unless it's necessary to prevent the last field in a line without a trailing colon from matching, or even s/[^:]*//2.

REgards,
Alister

alister

View Public Profile for alister

Find all posts by alister

03-02-2013

Moderator

12,296, 3,792

Join Date: Nov 2008

Last Activity: 1 January 2021, 1:47 AM EST

Location: Amsterdam

Posts: 12,296

Thanks Given: 679

Thanked 3,792 Times in 3,282 Posts

Quote:

Originally Posted by alister

[..] or even s/[^:]*//2

That is what I would think too, but this does not work like that everywhere.. This works wiith GNU sed and sed on AIX7 and with regular sed on Solaris, but not with /usr/xpg4/bin/sed on Solaris nor with sed on HPUX and OSX and some other UNIX flavor.

In those cases where it does not work, the desired effect was obtained when s/[^:]*//3 was used instead (and for the 3rd field s/[^:]*//5 and so on).

How can this be? What I think this may have to do with how the respective regex engines interpret a zero match after a previous match. The first match of

echo aaa:bbb:ccc:ddd:eee:fff | sed 's/[^:]*//'

renders

Code:

:bbb:ccc:ddd:eee:fff

On this every engine agrees. After the first match the engine arrives after the previous match and before the first colon. But what then constitutes the next match? For GNU sed and some other mentioned above this apparently means the next iteration of non-colon characters after the first colon. But the other engines apparently interpret zero repetitions of the non-colons before the colon as the next match, which constitutes an empty string and which I guess could be labeled as a "strict" interpretation of [^:]*.

Anyway, it seems safest to include one colon in the match line in the OP's second example, or insist a pattern of 1 or more non-colons, i.e. sed 's/[^:][^:]*//2' or sed 's/[^:]\{1,\}//2'

Regards,

S.

Last edited by Scrutinizer; 03-02-2013 at 05:44 AM..

These 2 Users Gave Thanks to Scrutinizer For This Post:

Scrutinizer

View Public Profile for Scrutinizer

Find all posts by Scrutinizer

Shell Programming and Scripting

Understanding sed

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Need Quick help on Understanding sed Regex

Discussion started by: Nandy

2. Shell Programming and Scripting

Need your help in understanding this

Discussion started by: sathyaonnuix

3. Shell Programming and Scripting

help understanding regex with grep & sed

Discussion started by: trogdortheburni

4. UNIX for Dummies Questions & Answers

understanding sed command

Discussion started by: forroughuse

5. UNIX for Dummies Questions & Answers

understanding {%/*}/

Discussion started by: vemana

6. Shell Programming and Scripting

need help understanding mv

Discussion started by: taiL

7. UNIX for Dummies Questions & Answers

Help understanding sed

Discussion started by: Makaer

8. Shell Programming and Scripting

understanding the sed command

Discussion started by: mac4rfree

9. UNIX for Advanced & Expert Users

need further understanding of init.d

Discussion started by: jigarlakhani