You just can use it instead of your code. No need for wc, uniqu or sort any more.
This is the code commented with explanation:
Code:
# define the field separator
BEGIN{FS=":"}
# start to loop through the file. For each record, add one to
# the array hits[site.name] thix will count the total number of hits
{
hits[$1]++
# create one array with index: "site.name:ip_address"
# this way there will be only one index site.name:ip_adress (unique hits)
ip_hits[$1 ":" substr($2, 1, 15)]
}
# once all fil is processed we print array contents by looping through
# associative indexes for (i in array)....
END {
# this line counts the number of *different* elements in array ip_hits
for (i in ip_hits){ sub(/:.+/, "", i); unique[i]++}
# print title
pr_format="%-20s %5s %s\n"
printf pr_format, "Page:", "Hits", "Unique Hits"
# loop through array hits and array unique
for (i in hits) printf pr_format, i, hits[i], unique[i]
}
To help you understand: here is a printout of the different arrays used above
- hits (counts total number of hits per site)
Code:
[hits/cds.hits] => 10
[hits/contact.hits] => 4
[hits/books.hits] => 4
- ip_hits (no value here. Just indexes)
Code:
[hits/cds.hits:182.210.215.110] =>
[hits/books.hits:143.217.64.204 ] =>
[hits/cds.hits:150.205.160.134] =>
[hits/cds.hits:221.253.17.33 ] =>
[hits/books.hits:81.140.86.170 ] =>
[hits/cds.hits:67.231.144.166 ] =>
[hits/contact.hits:15.111.224.138 ] =>
[hits/contact.hits:234.33.121.120 ] =>
[hits/cds.hits:193.140.224.143] =>
[hits/cds.hits:199.226.220.114] =>
[hits/cds.hits:94.83.157.230 ] =>
[hits/books.hits:62.145.39.14 ] =>
[hits/contact.hits:199.179.222.39 ] =>
[hits/cds.hits:148.214.45.187 ] =>
- unique (counts number of elements in ip_hits)
Code:
[hits/cds.hits] => 8
[hits/contact.hits] => 3
[hits/books.hits] => 3
To use the code:
Code:
$ awk -f script-name you-input-file