Quote:
Originally Posted by
sudon't
That was helpful. If I've understood correctly, bipinajith's pattern is telling awk to treat the data as a single column array
$0 means 'the entire unmodified line', pure and simple. You can change what 'line' means to awk, but by default, it uses newlines like everything else.
Quote:
assign keys to each value, (hash table?)
You can think of it as a hash table if you like. It could actually be built from a tree or list, but that doesn't really matter -- the point is, you can do
a["qwerty"]=5; print a["qwerty"] and get 5 out.
Quote:
It's still hard for me to parse the pattern, though. It seems awk is doing an awful lot with very little.
awk is like grep or sed, in that it has a built-in loop which runs code on every line individually. But it's like perl or shell in that it has no hardcoded function.
awk statements are like
conditional { code block }. Whenever the conditional is true, it runs the { code block }. If you leave off { code block }, it assumes { print }, which will print the entire unmodified line.
So,
awk '1' acts like
cat, because 1 is always true.
awk '/regex/' acts like
grep "regex", because /regex/ is true whenever the current line matches the regular expression.
Now imagine what happens for every line for
awk 'A[$0]++
The first time a line is seen, the value of A for that line will be "", a blank string. awk will consider this false, and not print the line. Next time, it will be a nonzero number, which awk considers true, causing it to print the line.