03-24-2010
6,
0
Join Date: Mar 2010
Last Activity: 2 April 2010, 5:26 PM EDT
Posts: 6
Thanks Given: 0
Thanked 0 Times in 0 Posts
Parse a Web Server Access Log
1. The problem statement, all variables and given/known data:
Write a parser for a web server access log that will provide the statistics outlined below. Remember to format your output in a neat form. You may complete this assignment with one Awk script or a shell script using a combination of Awk scripts.
Obtain the file located at http://users.csc.tntech.edu/~elbrown/access_log.bz2. For full credit, you must not save this data file to disk. You must process the file by reading directly from the url above using bash commands.
Please submit this problem's script(s) and output combined as a separate zip file. (15 points)
Your script should address each of the following items:
1. List the top 10 web sites from which requests came (non-404 status, external addresses looking in).
2. List the top 10 local web pages requested (non-404 status).
3. List the top 10 web browsers used to access the site. It is not necessary to get fancy and parse out all of the browser string. Simply print out the information that is there. Display the percentage of all browser types that each line represents.
4. List the number of 404 errors that were reported in the log.
5. List the number of 500 errors that were reported in the log.
6. Add any other important information that you deem appropriate.
2. Relevant commands, code, scripts, algorithms:
Awk will be used.
3. The attempts at a solution (include all code and scripts):
I don't have a problem at all with the 1 - 6 part. I understand how to use awk. The problem I'm having is how to parse a .bz2 file without downloading and decompressing it. I don't even have an idea how to begin accessing the file without decompressing it.
4. Complete Name of School (University), City (State), Country, Name of Professor, and Course Number (Link to Course):
Tennessee Technological University, Cookeville, TN, USA, Eric Brown, CSC 2500 Unix Laboratory
Note: Without school/professor/course information, you will be banned if you post here! You must complete the entire template (not just parts of it).