Parse a Web Server Access Log


 
Thread Tools Search this Thread
Homework and Emergencies Homework & Coursework Questions Parse a Web Server Access Log
# 1  
Old 03-24-2010
Parse a Web Server Access Log

1. The problem statement, all variables and given/known data:

Write a parser for a web server access log that will provide the statistics outlined below. Remember to format your output in a neat form. You may complete this assignment with one Awk script or a shell script using a combination of Awk scripts.

Obtain the file located at http://users.csc.tntech.edu/~elbrown/access_log.bz2. For full credit, you must not save this data file to disk. You must process the file by reading directly from the url above using bash commands.

Please submit this problem's script(s) and output combined as a separate zip file. (15 points)

Your script should address each of the following items:

1. List the top 10 web sites from which requests came (non-404 status, external addresses looking in).
2. List the top 10 local web pages requested (non-404 status).
3. List the top 10 web browsers used to access the site. It is not necessary to get fancy and parse out all of the browser string. Simply print out the information that is there. Display the percentage of all browser types that each line represents.
4. List the number of 404 errors that were reported in the log.
5. List the number of 500 errors that were reported in the log.
6. Add any other important information that you deem appropriate.


2. Relevant commands, code, scripts, algorithms:

Awk will be used.

3. The attempts at a solution (include all code and scripts):

I don't have a problem at all with the 1 - 6 part. I understand how to use awk. The problem I'm having is how to parse a .bz2 file without downloading and decompressing it. I don't even have an idea how to begin accessing the file without decompressing it.

4. Complete Name of School (University), City (State), Country, Name of Professor, and Course Number (Link to Course):

Tennessee Technological University, Cookeville, TN, USA, Eric Brown, CSC 2500 Unix Laboratory

Note: Without school/professor/course information, you will be banned if you post here! You must complete the entire template (not just parts of it).
# 2  
Old 03-24-2010
look into 'man wget'
# 3  
Old 03-24-2010
I tried man wget and it said no manual entry for wget. Just running wget said command not found.

I'm using bash on Mac OS 10.6.2 with the latest version of the Apple Developer Tools installed.
# 4  
Old 03-24-2010
Quote:
Originally Posted by codyhazelwood
I tried man wget and it said no manual entry for wget. Just running wget said command not found.

I'm using bash on Mac OS 10.6.2 with the latest version of the Apple Developer Tools installed.
I'm not really familiar with what MacOS provide, but what you need is a utility (wget, curl, lftp, lynx etc..) that can down load a file via http.
maybe other would have better ideas.
# 5  
Old 03-25-2010
You use a script/program with a regex parser.

Why are you using awk?

If I had to do this, I would use PHP or Perl.
# 6  
Old 03-31-2010
The assignment says we have to use awk. I don't know anything about PHP or Perl.
# 7  
Old 04-01-2010
To download a file without saving it, you've got 2 options:
  1. Use a utility like wget or curl. If you don't have it, install it.
  2. Use the network ability of bash itself.
Once you've got that, just pipe it into bzip2 with the appropriate switched to decompress to the console.
Login or Register to Ask a Question

Previous Thread | Next Thread

5 More Discussions You Might Find Interesting

1. Proxy Server

How to use Squid on Linux to control certain IP to access Web Server and certain IP cannot access?

Dear all experts here, :) I would like to install a proxy server on Linux server to perform solely to control the access of Web server. In this case, some of my vendor asked me to try Squid and I have installed it onto my Linux server. I would like know how can I set the configuration to... (1 Reply)
Discussion started by: kwliew999
1 Replies

2. Web Development

Cannot access Apache web server from Wan side, only Lan side.

I have installed WAMPSERVER 2.0 on my windows vista x64 system but still am having issues with getting the webserver to be seen outside my local network. It is working fine within my local network. Been through several setup tutorials so far, no dice still. For testing purposes I have... (1 Reply)
Discussion started by: davidmanvell
1 Replies

3. UNIX for Advanced & Expert Users

WEB Server Log File Analysis using awk/sed/grep

I'm trying to find a way to show large page sizes (page size in K) from multiple web server log files. Essentially I want to show only rows from a file where a specific column is larger than some value. Has anyone ever done this type of log analysis? If so, a snippet of code would be very... (2 Replies)
Discussion started by: mike_cataldo@ad
2 Replies

4. UNIX for Advanced & Expert Users

remote web server access (apache)

Hi, I have web server (apache) installed in server-1 and i want to view the web pages from diferent servers also while the web server is running only in one server ....(all the servers are connected to office LAN) right now all the servers have apache running......and CPU utilzation is at its... (2 Replies)
Discussion started by: aditya.ece1985
2 Replies

5. UNIX for Dummies Questions & Answers

Ways to Access Files on Unix Server via Web

Hi all! I'm a web developer with a question. We have a contractor that is working on a project that requires the user to access a ton of files on the clients Unix server. He has plans to built a VB interface for on site windows users to access those files and wants us to develop a web based... (4 Replies)
Discussion started by: Imhotep1963
4 Replies
Login or Register to Ask a Question