How to select only the most frequent instances of a variable string in a file?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to select only the most frequent instances of a variable string in a file?
# 1  
Old 09-23-2009
Error How to select only the most frequent instances of a variable string in a file?

I've got a web access file that I want to grep (or awk or perl or whatever will work!) out the most frequent instances of unique IP entries. Meaning the file looks something like this:

Quote:
1.1.1.1 home.do
1.1.1.1 home.do/category1
1.1.1.1 home.do/category2
1.1.1.1 home.do/category3
2.2.2.2 home.do
3.3.3.3 home.do
3.3.3.3 home.do/file
4.4.4.4 home.do
4.4.4.4 home.do/category1
4.4.4.4 home.do/category2
4.4.4.4 home.do/category3
I'd like to run a sort or grep (or whatever) that will only select out the lines from IP's that had the most hits......which in this example case would've been the 1.1.1.1 and 4.4.4.4 entries.

So something that sorts the entire file numerically, counts the instances of lines that start with the exact same (IP) number, and then outputs the results of only the MOST frequent occurances. So obviously the matching IP string is going to change each time it's run based on who is hitting the web server. Is this possible??
# 2  
Old 09-23-2009
Code:
#!/bin/sh

for ip in `awk '{print $1}' access_log | sort -u`
do
  ip_count=`grep -c $ip access_log`
  echo $ip $ip_count
done | sort -rn +1 | head -1

# 3  
Old 09-23-2009
Thanks for the quick suggestion.......but I can't seem to get that to work. I replaced the access.log file name with my file name. But when I run it, it just hangs with no output. I tried moving the "echo $ip" up higher in the script to be right after the awk (and before the 'do'), but it still wouldn't print out that variable either.

And I can already get the file to sort by IP, since the ip-address is the leading entry in every newline ('sort -n' works).

So now I just need it to scan the entire log, count the number of entries that start with the same IP number, and print out the lines for let's say the Top-5 IP's that appear the most times in the file (5 highest "hitters" of the webserver). Can you provide any further help or advice?? Please.....??
# 4  
Old 09-23-2009
How big is your access file? If it is very big then the times the input file gets read is proportional to the amount of ip addresses. Which may take time for very large files.

This script will ony read the file once:

Code:
#!/bin/ksh
typeset -A ACCESS
while read ipaddr dummy; do
  (( ACCESS[$ipaddr]++ ))
done<access_log
for ip in ${!ACCESS[@]}; do
  echo $ip ${ACCESS[$ip]}
done|sort -rn -k2|head -10

or the equivalent awk:
Code:
awk '{access[$1]++} END { for ( i in access ) print i " " access[i] }' access_log |sort -rn -k2|head -10

Note that sort options "-rn -k2" stands for "reverse numerical sort on the second field". The syntax may vary per Unix platform; use "man sort" to find out the appropriate options. Head determines the number of IP addresses to list.

Last edited by Scrutinizer; 09-24-2009 at 07:24 PM..
# 5  
Old 09-23-2009
Code:
awk '{print $1}' urfile |sort |uniq -c |sort -n


Last edited by rdcwayx; 09-28-2009 at 12:01 AM..
# 6  
Old 09-24-2009
RDCWAYX....?? Shouldn't there be a closing single-quote somewhere in that awk line??

---------- Post updated at 10:12 AM ---------- Previous update was at 09:42 AM ----------

SCRUTINIZER...?? Your 'awk' line worked well for obtaining the Top-10 "heavy hitter" IP's and listing them out (with counts). Thanks for that.

But instead of just the IP and it's number of instances in the log file.........I need to return/save the entire log file entry line for each and every hit. So if 1.1.1.1 has 10 entries in the file, and 2.2.2.2 has 8 entries, instead of output that looks like this:

1.1.1.1 10
2.2.2.2 8

I instead need output that looks like this:

1.1.1.1 - [23/Sep/2009:14:18:41 -0700] "GET /home.do"
1.1.1.1 - [23/Sep/2009:14:18:51 -0700] "GET /home.do/category1"
1.1.1.1 - [23/Sep/2009:14:18:55 -0700] "GET /home.do/category2"
2.2.2.2 - [23/Sep/2009:14:19:31 -0700] "GET /home.do"
2.2.2.2 - [23/Sep/2009:14:19:33 -0700] "GET /home.do/file1"

Etc, etc. IE: the entire line from the original file, including date/time stamp, URL, etc. And not just the IP and a summary count.

Can your awk line be easily modified to save all that info....??
# 7  
Old 09-28-2009
Quote:
Originally Posted by kevinmccallum
RDCWAYX....?? Shouldn't there be a closing single-quote somewhere in that awk line??[COLOR="#738fbf"]
updated.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Printing most frequent string in column

I am trying to put together an script that will output the most frequent string in a column. This is what I have: awk '{count++} END {for ( i in count ) print i, count }' Of course, my script is outputting all different strings and counts. However, I just need the most frequent one (there... (7 Replies)
Discussion started by: Xterra
7 Replies

2. Shell Programming and Scripting

Counting Instances of a String with AWK

I have a list of URLs and I want to be able to count the number of instances of addresses ending in a certain TLD and output and sort it like so. 5 bdcc.com 48 zrtzr.com 49 rvo.com Input is as so ync.org sduzj.edu sduzj.edu sduzj.edu sduzj.edu sduzj.edu sduzj.edu sduzj.edu... (1 Reply)
Discussion started by: Pjstaab
1 Replies

3. Shell Programming and Scripting

replace (sed?) a string in file with multiple lines (string) from variable

Can someone tell me how I can do this? e.g: a=$(echo -e wert trewt ertert ertert ertert erttert erterte rterter tertertert ert) How do i replace the STRING with $a? I try this: sed -i 's/STRING/'"$a"'/g' filename.ext but this don' t work (2 Replies)
Discussion started by: jforce
2 Replies

4. Shell Programming and Scripting

how to test input variable is a string in a select loop

Okay -- I hope I ask this correctly. I'm working on my little shell script to write vendor names and aliases to files from user input. If a user choose to add to a file, he can do that as well. I'm using a select loop for this function to list all the possible files the user can choose from.... (7 Replies)
Discussion started by: Straitsfan
7 Replies

5. Shell Programming and Scripting

Appending string, variable to file at the start and string at end

Hi , I have below file with 13 columns. I need 2-13 columns seperated by comma and I want to append each row with a string "INSERT INTO xxx" in the begining as 1st column and then a variable "$node" and then $2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13 and at the end another string " ; COMMIT;" ... (4 Replies)
Discussion started by: Vaddadi
4 Replies

6. Shell Programming and Scripting

Using sed to replace a string in file with a string in a variable that contains spaces

Hi, i call my shell like: my_shell "my project name" my script: #!/bin/bash -vx projectname=$1 sed s/'PROJECT_NAME ='/'PROJECT_NAME = '$projectname/ <test_config_doxy >temp cp temp test_config_doxy the following error occurres: sed s/'PROJECT_NAME ... (2 Replies)
Discussion started by: vivelafete
2 Replies

7. Programming

Optimizing frequent file transfer?

Hi I have written a simple client/server(socket programming) application using TCP/IP. My server code runs on Linux and client is on windows. The concept is that the client request for files(on demand basis) to the server and the server sends it back to the client. As the client is attached to... (3 Replies)
Discussion started by: akilan
3 Replies

8. Shell Programming and Scripting

How to select the path that contains a certain string from a certain file?

Hi, I am new to this world of shell programming. I am facing a problem that is : I have directory which has many sub directories at different depth. say A/B/C/files A/B/files A/B/C/D/files In this directory structure there exists a file called ".project" in some of the sub... (2 Replies)
Discussion started by: bhaskar_m
2 Replies

9. Shell Programming and Scripting

Replacing string in all instances (both filenames and file contents) in a directory

Hi, I have a set of files stored in a single directory that I use to set parameters for a physics code, and I would like to streamline the process of updating them all when I change a parameter. For instance, if the files are called A2000p300ini, A2000p300sub, A2000p300run, and the text in each... (3 Replies)
Discussion started by: BlueChris
3 Replies

10. Shell Programming and Scripting

How to replace all string instances found by find+grep

Hello all Im performing find + grep operation that looks like this : find . -name "*.dsp" | xargs grep -on Project.lib | grep -v ':0' and I like to add to this one liner the possibility to replace the string " Project.lib" that found ( more then once in file ) with "Example.lib" how can I do... (0 Replies)
Discussion started by: umen
0 Replies
Login or Register to Ask a Question