Quote:
Originally Posted by
glev2005
why does sed 's/.* //' show the last word in a line
and
sed 's/ .*//' show the first word in a line? How is that blank space before or after the ".*" being interpreted in the regex?
i would think the first example would delete the first word and the next example would delete the second word because you have .* to match any number of characters and then a space substituted with nothing, should'nt that remove the first matched word on the line that is terminated with whitespace?
---------- Post updated at 07:07 PM ---------- Previous update was at 06:49 PM ----------
I think i get it... if you do /.* // that will grab the first thing on the line if it has no space before it, and replace it will whitespace, and then print the rest of the line,
but if you do / .*// it will ignore what is before the first space, and begin substitution after the first space removing the rest of the line by substituting white space. Correct?
Well, no, not completely correct.
It is helpful to use an interactive regex tool. Try the one at http://gskinner.com/RegExr/ or download one for your OS.
With respect to the particular two regex patterns that you asked about, look at the individual elements and you see why you get what you get:
' ' - a space - matches that ASCII character. Not a tab, not a new line, only a space. ASCII character 32.
'.' - is a RegEx 'range' match. It means match any single character BUT a newline in line mode. It is important to understand that '.' is a single character. [a-z] is also a range like '.', but matches a single character from the more limited set of 'abcdefghijklmnopqrstuvwxyz'
'*' - is a RegEx quantifier, ie, 'how many' of the proceeding item. In this case, * means zero or more with as many as will match.
Those are the individual things, now look at how they combine:
/ .*/ means 'match 1 and only one space and only a space, then match every character and any character until the end of the line.' Next, look at the replacement. You have 's/ .*//'. That means 'leave the line alone until you match a single space. Match as many of any characters except new line and replace with nothing.' If you add a space at the beginning of your line, the first word would be deleted as well.
/.* / means 'match any character except new line (the '.'); do that as many as you possible can (the '*') until you either hit the end of the line or a space.'
s/.* // means 'match any character, as many as you can, including skipping over spaces (because it is greedy) until you come to the last space before the end of the line so the entire match is true, and replace with nothing.' Hence it matches everything up to and including the space before the last word on the line. If you added a space after the final word, it will delete that word too.
If you changed the pattern to s/.*? // then it is "ungreedy" meaning its stops at the shortest match, where s/.* // is "greedy" -- it will take as many characters as possible that match the match. s/.*? // will there for only delete the first word and following space. If you have a space at the beginning of the line, it will stop at the space because '*' means 'zero or more.'
Play with the interactive form and it will hit you like lighting at a certain point...