using uniq and awk??


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting using uniq and awk??
# 1  
Old 05-16-2008
using uniq and awk??

I have a file that is populated:

Code:
hits/books.hits:143.217.64.204       Thu Sep 21 22:24:57 GMT 2006
hits/books.hits:62.145.39.14         Fri Sep 22 00:38:32 GMT 2006
hits/books.hits:81.140.86.170        Fri Sep 22 08:45:26 GMT 2006
hits/books.hits:81.140.86.170        Fri Sep 22 09:13:57 GMT 2006
hits/cds.hits:67.231.144.166         Mon Sep 04 23:57:22 GMT 2006
hits/cds.hits:182.210.215.110        Tue Sep 05 22:53:53 GMT 2006
hits/cds.hits:94.83.157.230          Wed Sep 06 22:13:28 GMT 2006
hits/cds.hits:148.214.45.187         Wed Sep 06 23:25:22 GMT 2006
hits/cds.hits:221.253.17.33          Sat Sep 09 00:58:14 GMT 2006
hits/cds.hits:182.210.215.110        Sun Sep 10 05:29:28 GMT 2006
hits/cds.hits:193.140.224.143        Sun Sep 10 16:35:11 GMT 2006
hits/cds.hits:182.210.215.110        Fri Sep 15 21:08:31 GMT 2006
hits/cds.hits:199.226.220.114        Sat Sep 16 13:38:18 GMT 2006
hits/cds.hits:150.205.160.134        Mon Sep 18 19:22:45 GMT 2006
hits/contact.hits:15.111.224.138     Sat Sep 09 12:07:26 GMT 2006
hits/contact.hits:199.179.222.39     Sat Sep 09 22:30:15 GMT 2006
hits/contact.hits:199.179.222.39     Sun Sep 10 12:14:13 GMT 2006
hits/contact.hits:234.33.121.120     Sun Sep 10 22:19:39 GMT 2006

I have code that sorts this information:

Code:
echo "Page:\t \t Hits \t Unique Hits \t "

awk -F: '{print $1}' HITS | uniq
wc -l < HITS
awk -F: '{print $2}' HITS | sort -u | wc -l

My results for this code are:

Code:
Page:                            Hits                   Unique Hits
hits/books.hits
hits/cds.hits
hits/contact.hits
18
14

18 represents the total hits for the entire file
14 represents unique IP's in column two from the entire file.

However I want the output to look like this, so that each individual page name lists its own hits and unique IP's:

Code:
Page:                      Hits                       Unique Hits
hits/books.hits             4                               3
hits/cds.hits               10                              8
hits/contact.hits           4                               3

Any suggestions/solutions?

Last edited by amatuer_lee_3; 05-16-2008 at 09:57 PM..
# 2  
Old 05-16-2008
awk can do it all:

Code:
BEGIN {FS=":"}
{
        hits[$1]++
        ip_hits[$1 ":" substr($2, 1, 15)]
}
END {
        for (i in ip_hits){ sub(/:.+/, "", i); unique[i]++ }
        printf "%-20s %5s %s\n", "Page:", "Hits", "Unique Hits"
        for (i in hits) printf "%-20s %5s %5s\n", i, hits[i], unique[i]
}

# 3  
Old 05-16-2008
could you explain this a little better for me please. i like to understand code im using Smilie thanks.

and how can i encorporate it into my code? do i just put it in as it is?
# 4  
Old 05-17-2008
You just can use it instead of your code. No need for wc, uniqu or sort any more.

This is the code commented with explanation:
Code:
 # define the field separator
BEGIN{FS=":"}

# start to loop through the file. For each record, add one to
# the array hits[site.name] thix will count the total number of hits
{
        hits[$1]++

        # create one array with index: "site.name:ip_address"
        # this way there will be only one index site.name:ip_adress (unique hits)
        ip_hits[$1 ":" substr($2, 1, 15)]
}

# once all fil is processed we print array contents by looping through
# associative indexes for (i in array)....
END {
        # this line counts the number of *different* elements in array ip_hits
        for (i in ip_hits){ sub(/:.+/, "", i); unique[i]++}

        # print title
        pr_format="%-20s %5s %s\n"
        printf pr_format, "Page:", "Hits", "Unique Hits"

        # loop through array hits and array unique
        for (i in hits) printf pr_format, i, hits[i], unique[i]
}

To help you understand: here is a printout of the different arrays used above
  • hits (counts total number of hits per site)
    Code:
    [hits/cds.hits] => 10
    [hits/contact.hits] => 4
    [hits/books.hits] => 4

  • ip_hits (no value here. Just indexes)
    Code:
    [hits/cds.hits:182.210.215.110] => 
    [hits/books.hits:143.217.64.204 ] => 
    [hits/cds.hits:150.205.160.134] => 
    [hits/cds.hits:221.253.17.33  ] => 
    [hits/books.hits:81.140.86.170  ] => 
    [hits/cds.hits:67.231.144.166 ] => 
    [hits/contact.hits:15.111.224.138 ] => 
    [hits/contact.hits:234.33.121.120 ] => 
    [hits/cds.hits:193.140.224.143] => 
    [hits/cds.hits:199.226.220.114] => 
    [hits/cds.hits:94.83.157.230  ] => 
    [hits/books.hits:62.145.39.14   ] => 
    [hits/contact.hits:199.179.222.39 ] => 
    [hits/cds.hits:148.214.45.187 ] =>

  • unique (counts number of elements in ip_hits)
    Code:
    [hits/cds.hits] => 8
    [hits/contact.hits] => 3
    [hits/books.hits] => 3


To use the code:
Code:
$ awk -f script-name you-input-file

# 5  
Old 05-17-2008
Another one
(use nawk or /usr/xpg4/bin/awk on Solaris):

Code:
awk -F'[: ]' 'END {
fmt = "%-20s\t%s\t%s\n"
printf fmt, "Page:", "Hits", "Unique Hits"
for (p in h)
  printf fmt, p, h[p], u[p]
}
!_[$1,$2]++ { u[$1]++ }
{ h[$1]++ }' file

# 6  
Old 05-17-2008
Code:
awk -F'[: ]' 'END {
fmt = "%-20s\t%s\t%s\n"
printf fmt, "Page:", "Hits", "Unique Hits"
for (p in h)
  printf fmt, p, h[p], u[p]
}
!_[$1,$2]++ { u[$1]++ }
{ h[$1]++ }' file

Again could you explain this for me please?
# 7  
Old 05-17-2008
h[x] counts the number of occurrences of x in field $1.

u[x] counts the number of occurrences of x in field $1, discarding any duplicates where the same combination of $1 and $2 has been seen before.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

awk or uniq

Hi Help, I have a file which looks like 1 20 30 40 50 60 6 2 20 30 40 50 60 8 7 20 30 40 50 60 7 4 30 40 50 60 70 8 5 30 40 50 60 70 9 2 30 40 50 60 70 8 I want the o/p as 1 20 30 40 50 60 6 4 30 40 50 60 70 8 Is there a way I can use uniq command or awk to do this? ... (11 Replies)
Discussion started by: Indra2011
11 Replies

2. Shell Programming and Scripting

awk compare and keep uniq

Hi all I was wondering if you may help me in resolving an issue. In particular I have a file like this: the ... represent different string and what I wrote Cur or Ent are the constant. Well, what I would like to obtain is a file in which are reported only the ID in which the second column... (6 Replies)
Discussion started by: giuliangiuseppe
6 Replies

3. Shell Programming and Scripting

Sort uniq or awk

Hi again, I have files with the following contents datetime,ip1,port1,ip2,port2,number How would I find out how many times ip1 field shows up a particular file? Then how would I find out how many time ip1 and port 2 shows up? Please mind the file may contain 100k lines. (8 Replies)
Discussion started by: LDHB2012
8 Replies

4. Shell Programming and Scripting

Rewriting GNU uniq in awk

Within a shell script I use uniq -w 16 -D in order to process all lines in which the first 16 characters are duplicated. Now I want to also run that script on a BSD based system where the included version of uniq does not support the -w (--check-chars) option. To get around this I have... (7 Replies)
Discussion started by: mij
7 Replies

5. Shell Programming and Scripting

awk uniq and longest string of a column as index

I met a challenge to filter ~70 millions of sequence rows and I want using awk with conditions: 1) longest string of each pattern in column 2, ignore any sub-string, as the index; 2) all the unique patterns after 1); 3) print the whole row; input: 1 ABCDEFGHI longest_sequence1 2 ABCDEFGH... (12 Replies)
Discussion started by: yifangt
12 Replies

6. Shell Programming and Scripting

awk - getting uniq count on multiple col

Hi My file have 7 column, FIle is pipe delimed Col1|Col2|col3|Col4|col5|Col6|Col7 I want to find out uniq record count on col3, col4 and col2 ( same order) how can I achieve it. ex 1|3|A|V|C|1|1 1|3|A|V|C|1|1 1|4|A|V|C|1|1 Output should be FREQ|A|V|3|2 FREQ|A|V|4|1 Here... (5 Replies)
Discussion started by: sanranad
5 Replies

7. Shell Programming and Scripting

[uniq + awk?] How to remove duplicate blocks of lines in files?

Hello again, I am wanting to remove all duplicate blocks of XML code in a file. This is an example: input: <string-array name="threeItems"> <item>item1</item> <item>item2</item> <item>item3</item> </string-array> <string-array name="twoItems"> <item>item1</item> <item>item2</item>... (19 Replies)
Discussion started by: raidzero
19 Replies

8. Shell Programming and Scripting

Text Proccessing with sort,uniq,awk

Hello, I have a log file with the following input: X , ID , Date, Time, Y 01,01368,2010-12-02,09:07:00,Pass 01,01368,2010-12-02,10:54:00,Pass 01,01368,2010-12-02,13:07:04,Pass 01,01368,2010-12-02,18:54:01,Pass 01,01368,2010-12-03,09:02:00,Pass 01,01368,2010-12-03,13:53:00,Pass... (12 Replies)
Discussion started by: rollyah
12 Replies

9. Shell Programming and Scripting

Help with uniq or awk??

Hi, my dilemna is this: example i got a file of fruit.txt which contains: Apple 6 Apple_new 7 old_orange 9 orange 10 Is there any way for me to have an output of Apple 13 Orange 19 using shell script: (6 Replies)
Discussion started by: shinoman28
6 Replies

10. Shell Programming and Scripting

How to replicate data using Uniq or awk

Hi, I have this scenario; where there are two classes:- apple and orange. 1,2,3,4,5,6,apple 1,1,0,4,2,3,apple 1,3,3,3,3,4,apple 1,1,1,1,1,1,orange 1,2,3,1,1,1,orange Basically for apple, i have 3 entries in the file, and for orange, I have 2 entries. Im trying to edit the file and find... (5 Replies)
Discussion started by: ahjiefreak
5 Replies
Login or Register to Ask a Question