The UNIX and Linux Forums  

Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
Google UNIX.COM



View Single Post in UNIX Forums - Click on the Thread or Permalink to View Entire Thread -->
  #4 (permalink)  
Old 05-16-2008
Annihilannic Annihilannic is online now
Registered User
 

Join Date: May 2008
Location: Sydney, Australia
Posts: 685
Did you know it's spelt 'amateur'?

Some comments regarding your existing code:

Code:
awk '{print $1}' HITS

# you need to specify the column separator because awk uses 
# white space (spaces/tabs) by default , e.g.

awk -F: '{print $1}' HITS
Code:
awk '{print $2}' HITS  | sort -n | wc -l

# no need for awk and sort, as it doesn't change the number of lines of data, just:

wc -l < HITS
Code:
 awk '{print $2}' HITS | uniq | wc -l

# if the data is unsorted uniq does not identify matching lines, better to use:

awk -F: '{print $2}' HITS | sort -u | wc -l
Personally I would use one awk script to generate all of the results, something like:

Code:
sort -t : -k 1,1 -k 2,2 HITS | awk -F: '
        # assign values to variables for readability, count a hit
        { file=$1; ip=$2; hits[file]++ }
        # initialise prevfile when reading the first line
        NR==1 { prevfile=file }
        # if it is a new file, reset the previous IP
        file != prevfile { previp="" }
        # if the ip is different to the previous IP, count a unique hit
        ip != previp { uniquehits[file]++ }
        # save previous ip and file name for future reference
        { previp=ip; prevfile=file }
        # output the results
        END { for (file in hits) { print file,hits[file],uniquehits[file] } }
'
This won't output in exactly the format you wanted, you can use printf() for that, but I'll leave that part as an exercise for you!
Reply With Quote