The UNIX and Linux Forums  

Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
Google UNIX.COM


Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts here.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
uniq command??? skyineyes UNIX for Dummies Questions & Answers 2 05-28-2008 03:27 AM
uniq options dhanamurthy Shell Programming and Scripting 0 05-08-2008 05:08 AM
Uniq using only the first field Digby UNIX for Dummies Questions & Answers 8 01-16-2008 02:25 AM
help on UniQ vishal_ranjan HP-UX 0 06-21-2007 07:33 AM
sort/uniq jimmyflip UNIX for Dummies Questions & Answers 3 10-17-2002 02:09 AM

Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 05-16-2008
Registered User
 

Join Date: May 2008
Posts: 53
using uniq and awk??

I have a file that is populated:

Code:
hits/books.hits:143.217.64.204       Thu Sep 21 22:24:57 GMT 2006
hits/books.hits:62.145.39.14         Fri Sep 22 00:38:32 GMT 2006
hits/books.hits:81.140.86.170        Fri Sep 22 08:45:26 GMT 2006
hits/books.hits:81.140.86.170        Fri Sep 22 09:13:57 GMT 2006
hits/cds.hits:67.231.144.166         Mon Sep 04 23:57:22 GMT 2006
hits/cds.hits:182.210.215.110        Tue Sep 05 22:53:53 GMT 2006
hits/cds.hits:94.83.157.230          Wed Sep 06 22:13:28 GMT 2006
hits/cds.hits:148.214.45.187         Wed Sep 06 23:25:22 GMT 2006
hits/cds.hits:221.253.17.33          Sat Sep 09 00:58:14 GMT 2006
hits/cds.hits:182.210.215.110        Sun Sep 10 05:29:28 GMT 2006
hits/cds.hits:193.140.224.143        Sun Sep 10 16:35:11 GMT 2006
hits/cds.hits:182.210.215.110        Fri Sep 15 21:08:31 GMT 2006
hits/cds.hits:199.226.220.114        Sat Sep 16 13:38:18 GMT 2006
hits/cds.hits:150.205.160.134        Mon Sep 18 19:22:45 GMT 2006
hits/contact.hits:15.111.224.138     Sat Sep 09 12:07:26 GMT 2006
hits/contact.hits:199.179.222.39     Sat Sep 09 22:30:15 GMT 2006
hits/contact.hits:199.179.222.39     Sun Sep 10 12:14:13 GMT 2006
hits/contact.hits:234.33.121.120     Sun Sep 10 22:19:39 GMT 2006
I have code that sorts this information:

Code:
echo "Page:\t \t Hits \t Unique Hits \t "

awk -F: '{print $1}' HITS | uniq
wc -l < HITS
awk -F: '{print $2}' HITS | sort -u | wc -l
My results for this code are:

Code:
Page:                            Hits                   Unique Hits
hits/books.hits
hits/cds.hits
hits/contact.hits
18
14
18 represents the total hits for the entire file
14 represents unique IP's in column two from the entire file.

However I want the output to look like this, so that each individual page name lists its own hits and unique IP's:

Code:
Page:                      Hits                       Unique Hits
hits/books.hits             4                               3
hits/cds.hits               10                              8
hits/contact.hits           4                               3
Any suggestions/solutions?

Last edited by amatuer_lee_3; 05-16-2008 at 05:57 PM.
Reply With Quote
Forum Sponsor
  #2 (permalink)  
Old 05-16-2008
Registered User
 

Join Date: Oct 2006
Location: Belgium
Posts: 171
awk can do it all:

Code:
BEGIN {FS=":"}
{
        hits[$1]++
        ip_hits[$1 ":" substr($2, 1, 15)]
}
END {
        for (i in ip_hits){ sub(/:.+/, "", i); unique[i]++ }
        printf "%-20s %5s %s\n", "Page:", "Hits", "Unique Hits"
        for (i in hits) printf "%-20s %5s %5s\n", i, hits[i], unique[i]
}
Reply With Quote
  #3 (permalink)  
Old 05-16-2008
Registered User
 

Join Date: May 2008
Posts: 53
could you explain this a little better for me please. i like to understand code im using thanks.

and how can i encorporate it into my code? do i just put it in as it is?
Reply With Quote
  #4 (permalink)  
Old 05-16-2008
Registered User
 

Join Date: Oct 2006
Location: Belgium
Posts: 171
You just can use it instead of your code. No need for wc, uniqu or sort any more.

This is the code commented with explanation:
Code:
 # define the field separator
BEGIN{FS=":"}

# start to loop through the file. For each record, add one to
# the array hits[site.name] thix will count the total number of hits
{
        hits[$1]++

        # create one array with index: "site.name:ip_address"
        # this way there will be only one index site.name:ip_adress (unique hits)
        ip_hits[$1 ":" substr($2, 1, 15)]
}

# once all fil is processed we print array contents by looping through
# associative indexes for (i in array)....
END {
        # this line counts the number of *different* elements in array ip_hits
        for (i in ip_hits){ sub(/:.+/, "", i); unique[i]++}

        # print title
        pr_format="%-20s %5s %s\n"
        printf pr_format, "Page:", "Hits", "Unique Hits"

        # loop through array hits and array unique
        for (i in hits) printf pr_format, i, hits[i], unique[i]
}
To help you understand: here is a printout of the different arrays used above
  • hits (counts total number of hits per site)
    Code:
    [hits/cds.hits] => 10
    [hits/contact.hits] => 4
    [hits/books.hits] => 4
  • ip_hits (no value here. Just indexes)
    Code:
    [hits/cds.hits:182.210.215.110] => 
    [hits/books.hits:143.217.64.204 ] => 
    [hits/cds.hits:150.205.160.134] => 
    [hits/cds.hits:221.253.17.33  ] => 
    [hits/books.hits:81.140.86.170  ] => 
    [hits/cds.hits:67.231.144.166 ] => 
    [hits/contact.hits:15.111.224.138 ] => 
    [hits/contact.hits:234.33.121.120 ] => 
    [hits/cds.hits:193.140.224.143] => 
    [hits/cds.hits:199.226.220.114] => 
    [hits/cds.hits:94.83.157.230  ] => 
    [hits/books.hits:62.145.39.14   ] => 
    [hits/contact.hits:199.179.222.39 ] => 
    [hits/cds.hits:148.214.45.187 ] =>
  • unique (counts number of elements in ip_hits)
    Code:
    [hits/cds.hits] => 8
    [hits/contact.hits] => 3
    [hits/books.hits] => 3

To use the code:
Code:
$ awk -f script-name you-input-file
Reply With Quote
  #5 (permalink)  
Old 05-17-2008
radoulov's Avatar
addict
 

Join Date: Jan 2007
Location: Milan, Italy/Varna, Bulgaria
Posts: 1,389
Another one
(use nawk or /usr/xpg4/bin/awk on Solaris):

Code:
awk -F'[: ]' 'END {
fmt = "%-20s\t%s\t%s\n"
printf fmt, "Page:", "Hits", "Unique Hits"
for (p in h)
  printf fmt, p, h[p], u[p]
}
!_[$1,$2]++ { u[$1]++ }
{ h[$1]++ }' file
Reply With Quote
  #6 (permalink)  
Old 05-17-2008
Registered User
 

Join Date: May 2008
Posts: 53
Code:
awk -F'[: ]' 'END {
fmt = "%-20s\t%s\t%s\n"
printf fmt, "Page:", "Hits", "Unique Hits"
for (p in h)
  printf fmt, p, h[p], u[p]
}
!_[$1,$2]++ { u[$1]++ }
{ h[$1]++ }' file
Again could you explain this for me please?
Reply With Quote
  #7 (permalink)  
Old 05-17-2008
era era is offline
Herder of Useless Cats
 

Join Date: Mar 2008
Location: /there/is/only/bin/sh
Posts: 2,501
h[x] counts the number of occurrences of x in field $1.

u[x] counts the number of occurrences of x in field $1, discarding any duplicates where the same combination of $1 and $2 has been seen before.
Reply With Quote
Google UNIX.COM
Reply

Tags
solaris

Thread Tools
Display Modes




All times are GMT -7. The time now is 12:05 PM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited.
The UNIX and Linux Forums Content Copyright ©1993-2008 The CEP Blog All Rights Reserved -Ad Management by RedTyger Visit The Global Fact Book

Content Relevant URLs by vBSEO 3.2.0