Can you please explain this letter per letter word per word. So next time I will do it on my own?
Of course, but we would really appreciate it if you could obey the forum rules and post code (any code, data and output) in CODE-tags or - if they appear in running text, like commands - in ICODE-tags. For instance write the command ls --full-time -rt |sed 's/\(.*\)\(\..*\+.*\) *\(.*\)$/\1 \3/g' like this.
Back to your question:
First, the basic command:
We replace something (in fact every instance of something, because of the "g" at the end) by \1 \3. \1 and \3 are so-called "back-references". They work like variables: you search for something in the search part (the "<something>") and whatever you have found is put into the variable. The "first" and the "third" such found things will be put into the result, effectively deleting the second.
Now, lets have a look at the "something" which the input line is broken up into:
Whatever is between "\(" and "\)" is put into such a backreference, hence we see three such pairs (marked bold) and a few characters in between:
Let us first deal with the things outside the bracket pairs: * is a space, followed zero or more spaces. The asterisk means "zero or more of the character (in fact "regex", but in this case the regex is only a single character) before", hence "one or more of this character" is expressed by first such a character, then the same character with the asterisk:
The $ means "end of line" and is a way of "anchoring" a regular expression. If you search for a group of characters they could appear anywhere in a line. If you want to specifically search for a word appearing at the beginning or the end of a line these anchors (there is ^ for "beginning of line" and $ for "end of line") are the means to express that.
To sum up so far, the search expression means:
For the "??" parts:
The dot (.) means "any character", therefore, in conjunction with the asterisk, which means "any number of what precedes me", "any number of any character" - the first bracket pair pretty much mathces everything in any length.
If this would be the whole regex it would match the complete line. But because it isn't the second brackets pair is in fact limiting it:
This matches a literal dot character (because the dot has a special meaning to sed if you want to match only a real literal dot you need to "escape" it - precede it with a backslash: "." = "any character "\." = "a literal dot character". Analogous for "\+" (escaped "+" character), hence: the meaning of the regexp inside the bracket pair is: a literal dot, followed by anything, followed by a literal "+", followed by anything.
You should now be able to decipher the rest and put together what it means in context. One thing you need to know, though: regexps are always "greedy" meaning that if there are several ways to match something always the longest possible match
is used. For example, here is some input and a regexp. The matched part is marked bold:
Notice that "aB" would also have been a valid match for a.*B, but the longest possible is the one i marked. Therefore will the first regexp part i.e. skip over the first literal dot (after the filemode field: drwxrwxrwx.) and only go for the second one.
@drl: I think you could forego the "g" at the end, because you anchor the regexp at the end-of-line anyway.
I hope this helps.
bakunin
These 2 Users Gave Thanks to bakunin For This Post:
Hi again. Sorry for all the questions — I've tried to do all this myself but I'm just not good enough yet, and the help I've received so far from bartus11 has been absolutely invaluable. Hopefully this will be the last bit of file manipulation I need to do.
I have a file which is formatted as... (4 Replies)
Good evening,
I have a file and wish to replace the 8th and 9th characters on the first line only no matter what they are with 44 and the file permanantly changed.
e.g. file example.txt before change:
123456789123456
hi how are you
blah
blah
file example.txt after change:
... (4 Replies)
hi,
I have an data from file where it has
20110904 234516 <<hdd-10#console|0c.57,passed,5,28,READ,0,20822392,8,5,4,0,40,0,-1,0,29909,25000,835,3.3,0,0,0,0,implied,0,0,2011/9/5-2:3:17,2011/9/5-2:3:47,X292_0F15,TAP ,NQ09,J40LTG\r\r\n
I want to remove characters till #console| i.e want... (1 Reply)
Dear all,
I'm stuck on a certain problem regarding counting the number of characters in one line and then adjusting the number of characters of another line to this number.
This was my original input data:
@HWI-ST471_57:1:1:1231:2079/2... (4 Replies)
Hi,
I wanted to create a script what would take two numbers out of two files and add them together, but I got stuck with greping numbers what have a dot in it.
So far I have grepped the two lines what include the numbers I need (from both files) to a third file and from that file I try to... (7 Replies)
hi everybody
I am a new user to this forum and its previous posts have been very useful. I'm searching in a file using grep for patterns like
12.13.444
55.44.443
i.e. of form
<digit><digit>.<digit><digit>.<digit><digit><digit>
Can anybody help me with this.
Thanks in advance (4 Replies)