using uniq and awk??

Login or Register to Ask a Question and Join Our Community

using uniq and awk??

Tags

awk, shell scripts, solaris, uniq

Login to Discuss or Reply to this Discussion in Our Community

Top Forums Shell Programming and Scripting using uniq and awk??

05-16-2008

Registered User

53, 0

Join Date: May 2008

Last Activity: 17 May 2008, 10:34 PM EDT

Posts: 53

Thanks Given: 0

Thanked 0 Times in 0 Posts

using uniq and awk??

I have a file that is populated:

Code:

hits/books.hits:143.217.64.204       Thu Sep 21 22:24:57 GMT 2006
hits/books.hits:62.145.39.14         Fri Sep 22 00:38:32 GMT 2006
hits/books.hits:81.140.86.170        Fri Sep 22 08:45:26 GMT 2006
hits/books.hits:81.140.86.170        Fri Sep 22 09:13:57 GMT 2006
hits/cds.hits:67.231.144.166         Mon Sep 04 23:57:22 GMT 2006
hits/cds.hits:182.210.215.110        Tue Sep 05 22:53:53 GMT 2006
hits/cds.hits:94.83.157.230          Wed Sep 06 22:13:28 GMT 2006
hits/cds.hits:148.214.45.187         Wed Sep 06 23:25:22 GMT 2006
hits/cds.hits:221.253.17.33          Sat Sep 09 00:58:14 GMT 2006
hits/cds.hits:182.210.215.110        Sun Sep 10 05:29:28 GMT 2006
hits/cds.hits:193.140.224.143        Sun Sep 10 16:35:11 GMT 2006
hits/cds.hits:182.210.215.110        Fri Sep 15 21:08:31 GMT 2006
hits/cds.hits:199.226.220.114        Sat Sep 16 13:38:18 GMT 2006
hits/cds.hits:150.205.160.134        Mon Sep 18 19:22:45 GMT 2006
hits/contact.hits:15.111.224.138     Sat Sep 09 12:07:26 GMT 2006
hits/contact.hits:199.179.222.39     Sat Sep 09 22:30:15 GMT 2006
hits/contact.hits:199.179.222.39     Sun Sep 10 12:14:13 GMT 2006
hits/contact.hits:234.33.121.120     Sun Sep 10 22:19:39 GMT 2006

I have code that sorts this information:

Code:

echo "Page:\t \t Hits \t Unique Hits \t "

awk -F: '{print $1}' HITS | uniq
wc -l < HITS
awk -F: '{print $2}' HITS | sort -u | wc -l

My results for this code are:

Code:

Page:                            Hits                   Unique Hits
hits/books.hits
hits/cds.hits
hits/contact.hits
18
14

18 represents the total hits for the entire file
14 represents unique IP's in column two from the entire file.

However I want the output to look like this, so that each individual page name lists its own hits and unique IP's:

Code:

Page:                      Hits                       Unique Hits
hits/books.hits             4                               3
hits/cds.hits               10                              8
hits/contact.hits           4                               3

Any suggestions/solutions?

Last edited by amatuer_lee_3; 05-16-2008 at 09:57 PM..

amatuer_lee_3

View Public Profile for amatuer_lee_3

Find all posts by amatuer_lee_3

05-16-2008

Registered User

544, 43

Join Date: Oct 2006

Last Activity: 27 March 2017, 3:00 AM EDT

Location: Belgium

Posts: 544

Thanks Given: 5

Thanked 43 Times in 29 Posts

awk can do it all:

Code:

BEGIN {FS=":"}
{
        hits[$1]++
        ip_hits[$1 ":" substr($2, 1, 15)]
}
END {
        for (i in ip_hits){ sub(/:.+/, "", i); unique[i]++ }
        printf "%-20s %5s %s\n", "Page:", "Hits", "Unique Hits"
        for (i in hits) printf "%-20s %5s %5s\n", i, hits[i], unique[i]
}

ripat

View Public Profile for ripat

Find all posts by ripat

05-16-2008

Registered User

53, 0

Join Date: May 2008

Last Activity: 17 May 2008, 10:34 PM EDT

Posts: 53

Thanks Given: 0

Thanked 0 Times in 0 Posts

could you explain this a little better for me please. i like to understand code im using

thanks.

and how can i encorporate it into my code? do i just put it in as it is?

amatuer_lee_3

View Public Profile for amatuer_lee_3

Find all posts by amatuer_lee_3

05-17-2008

Registered User

544, 43

Join Date: Oct 2006

Last Activity: 27 March 2017, 3:00 AM EDT

Location: Belgium

Posts: 544

Thanks Given: 5

Thanked 43 Times in 29 Posts

You just can use it instead of your code. No need for wc, uniqu or sort any more.

This is the code commented with explanation:

Code:

 # define the field separator
BEGIN{FS=":"}

# start to loop through the file. For each record, add one to
# the array hits[site.name] thix will count the total number of hits
{
        hits[$1]++

        # create one array with index: "site.name:ip_address"
        # this way there will be only one index site.name:ip_adress (unique hits)
        ip_hits[$1 ":" substr($2, 1, 15)]
}

# once all fil is processed we print array contents by looping through
# associative indexes for (i in array)....
END {
        # this line counts the number of *different* elements in array ip_hits
        for (i in ip_hits){ sub(/:.+/, "", i); unique[i]++}

        # print title
        pr_format="%-20s %5s %s\n"
        printf pr_format, "Page:", "Hits", "Unique Hits"

        # loop through array hits and array unique
        for (i in hits) printf pr_format, i, hits[i], unique[i]
}

To help you understand: here is a printout of the different arrays used above

hits (counts total number of hits per site)

Code:

[hits/cds.hits] => 10
[hits/contact.hits] => 4
[hits/books.hits] => 4

ip_hits (no value here. Just indexes)

Code:

[hits/cds.hits:182.210.215.110] => 
[hits/books.hits:143.217.64.204 ] => 
[hits/cds.hits:150.205.160.134] => 
[hits/cds.hits:221.253.17.33  ] => 
[hits/books.hits:81.140.86.170  ] => 
[hits/cds.hits:67.231.144.166 ] => 
[hits/contact.hits:15.111.224.138 ] => 
[hits/contact.hits:234.33.121.120 ] => 
[hits/cds.hits:193.140.224.143] => 
[hits/cds.hits:199.226.220.114] => 
[hits/cds.hits:94.83.157.230  ] => 
[hits/books.hits:62.145.39.14   ] => 
[hits/contact.hits:199.179.222.39 ] => 
[hits/cds.hits:148.214.45.187 ] =>

unique (counts number of elements in ip_hits)

Code:

[hits/cds.hits] => 8
[hits/contact.hits] => 3
[hits/books.hits] => 3

To use the code:

Code:

$ awk -f script-name you-input-file

ripat

View Public Profile for ripat

Find all posts by ripat

05-17-2008

Registered User

5,690, 630

Join Date: Jan 2007

Last Activity: 9 January 2017, 4:40 AM EST

Location: Варна, България / Milano, Italia

Posts: 5,690

Thanks Given: 184

Thanked 630 Times in 587 Posts

Another one
(use nawk or /usr/xpg4/bin/awk on Solaris):

Code:

awk -F'[: ]' 'END {
fmt = "%-20s\t%s\t%s\n"
printf fmt, "Page:", "Hits", "Unique Hits"
for (p in h)
  printf fmt, p, h[p], u[p]
}
!_[$1,$2]++ { u[$1]++ }
{ h[$1]++ }' file

radoulov

View Public Profile for radoulov

Find all posts by radoulov

05-17-2008

Registered User

53, 0

Join Date: May 2008

Last Activity: 17 May 2008, 10:34 PM EDT

Posts: 53

Thanks Given: 0

Thanked 0 Times in 0 Posts

Code:

awk -F'[: ]' 'END {
fmt = "%-20s\t%s\t%s\n"
printf fmt, "Page:", "Hits", "Unique Hits"
for (p in h)
  printf fmt, p, h[p], u[p]
}
!_[$1,$2]++ { u[$1]++ }
{ h[$1]++ }' file

Again could you explain this for me please?

amatuer_lee_3

View Public Profile for amatuer_lee_3

Find all posts by amatuer_lee_3

05-17-2008

Registered User

3,653, 12

Join Date: Mar 2008

Last Activity: 28 March 2011, 6:41 AM EDT

Location: /there/is/only/bin/sh

Posts: 3,653

Thanks Given: 0

Thanked 12 Times in 10 Posts

h[x] counts the number of occurrences of x in field $1.

u[x] counts the number of occurrences of x in field $1, discarding any duplicates where the same combination of $1 and $2 has been seen before.

era

View Public Profile for era

Find all posts by era

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

awk or uniq

Hi Help, I have a file which looks like 1 20 30 40 50 60 6 2 20 30 40 50 60 8 7 20 30 40 50 60 7 4 30 40 50 60 70 8 5 30 40 50 60 70 9 2 30 40 50 60 70 8 I want the o/p as 1 20 30 40 50 60 6 4 30 40 50 60 70 8 Is there a way I can use uniq command or awk to do this? ...

2. Shell Programming and Scripting

awk compare and keep uniq

Hi all I was wondering if you may help me in resolving an issue. In particular I have a file like this: the ... represent different string and what I wrote Cur or Ent are the constant. Well, what I would like to obtain is a file in which are reported only the ID in which the second column...

3. Shell Programming and Scripting

Sort uniq or awk

Hi again, I have files with the following contents datetime,ip1,port1,ip2,port2,number How would I find out how many times ip1 field shows up a particular file? Then how would I find out how many time ip1 and port 2 shows up? Please mind the file may contain 100k lines.

4. Shell Programming and Scripting

Rewriting GNU uniq in awk

Within a shell script I use uniq -w 16 -D in order to process all lines in which the first 16 characters are duplicated. Now I want to also run that script on a BSD based system where the included version of uniq does not support the -w (--check-chars) option. To get around this I have...

5. Shell Programming and Scripting

awk uniq and longest string of a column as index

I met a challenge to filter ~70 millions of sequence rows and I want using awk with conditions: 1) longest string of each pattern in column 2, ignore any sub-string, as the index; 2) all the unique patterns after 1); 3) print the whole row; input: 1 ABCDEFGHI longest_sequence1 2 ABCDEFGH...

6. Shell Programming and Scripting

awk - getting uniq count on multiple col

Hi My file have 7 column, FIle is pipe delimed Col1|Col2|col3|Col4|col5|Col6|Col7 I want to find out uniq record count on col3, col4 and col2 ( same order) how can I achieve it. ex 1|3|A|V|C|1|1 1|3|A|V|C|1|1 1|4|A|V|C|1|1 Output should be FREQ|A|V|3|2 FREQ|A|V|4|1 Here...

7. Shell Programming and Scripting

[uniq + awk?] How to remove duplicate blocks of lines in files?

Hello again, I am wanting to remove all duplicate blocks of XML code in a file. This is an example: input: <string-array name="threeItems"> <item>item1</item> <item>item2</item> <item>item3</item> </string-array> <string-array name="twoItems"> <item>item1</item> <item>item2</item>...

8. Shell Programming and Scripting

Text Proccessing with sort,uniq,awk

Hello, I have a log file with the following input: X , ID , Date, Time, Y 01,01368,2010-12-02,09:07:00,Pass 01,01368,2010-12-02,10:54:00,Pass 01,01368,2010-12-02,13:07:04,Pass 01,01368,2010-12-02,18:54:01,Pass 01,01368,2010-12-03,09:02:00,Pass 01,01368,2010-12-03,13:53:00,Pass...

9. Shell Programming and Scripting

Help with uniq or awk??

Hi, my dilemna is this: example i got a file of fruit.txt which contains: Apple 6 Apple_new 7 old_orange 9 orange 10 Is there any way for me to have an output of Apple 13 Orange 19 using shell script:

10. Shell Programming and Scripting

How to replicate data using Uniq or awk

Hi, I have this scenario; where there are two classes:- apple and orange. 1,2,3,4,5,6,apple 1,1,0,4,2,3,apple 1,3,3,3,3,4,apple 1,1,1,1,1,1,orange 1,2,3,1,1,1,orange Basically for apple, i have 3 entries in the file, and for orange, I have 2 entries. Im trying to edit the file and find...

Login or Register to Ask a Question