Sponsored Content
Top Forums UNIX for Dummies Questions & Answers Filtering similar lines in a big list Post 302403706 by Andrew9191 on Sunday 14th of March 2010 01:27:54 AM
Old 03-14-2010
Question Filtering similar lines in a big list

I received this question for homework:
Quote:
The School of IT webmaster has noticed that visitors sometimes can't find web pages because they use incorrect capitalisation in the URL.
How often does this happen?

You can tell when URL is incorrect because the HTTP response code (the second last field in access_log.txt) has the value 404, which means not found.
A successful request has the response code 200.

You must print requests which only differ by their capitalisation where one was successful (code 200) and the other was not (code 404).
For example, if the two lines appear in access_log.txt:
65.214.44.112 -... "GET /~user1/HELLO/world HTTP/1.0" 404 -
...
68.142.251.190 ... "GET /~user1/hello/WORLD HTTP/1.0" 200 10

then your program should print out each URL converted to lowercase, in alphabetical order:
/~user1/hello/world

You can assume that no requested files appeared or disappeared during the period covered by access_log.txt.
We have to write our program into a .sh file, with "#!/bin/bash" as the first line. And we have the list of access logs in a file, looking like this (it's nearly 10,000 lines long):

Code:
65.214.44.112 - - [01/Feb/2008:00:06:44 +1100] "GET /~user0/cgg/msg08400.html HTTP/1.0" 304 -
203.109.124.175 - - [01/Feb/2008:00:15:17 +1100] "GET /~user39/ss_logo.jpg HTTP/1.1" 304 -
68.142.249.191 - - [01/Feb/2008:00:18:42 +1100] "GET /~user11/resnik/fsbt_viewreview?PERSON=Naz&BT_ID=9 HTTP/1.0" 200 7729
81.156.30.154 - - [01/Feb/2008:00:26:12 +1100] "POST /~user0/cgi-bin/hail_stone_coker.cgi HTTP/1.1" 200 185
65.214.44.112 - - [01/Feb/2008:00:31:11 +1100] "GET /~user0/cgg/msg00211.html HTTP/1.0" 304 -
68.142.249.127 - - [01/Feb/2008:00:31:34 +1100] "GET /~user44/examples/chapter5/example5.py HTTP/1.0" 200 334
145.64.134.244 - - [01/Feb/2008:00:32:18 +1100] "GET /~user20/ppgraph/ppg_create.2bp HTTP/1.0" 200 1070

How should I do this question? I know how to use 'cut' to get the section required as the answer, and i used 'egrep' to filter out the lines that do not contain neither 200 or 404. What should I do next. I'm completely lost in terms of matching the requests with same URLs but different response codes.

Thanks for the help guys.
 

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

Filtering duplicate lines

Does anybody know a command that filters duplicate lines out of a file. Similar to the uniq command but can handle duplicate lines no matter where they occur in a file? (9 Replies)
Discussion started by: AreaMan
9 Replies

2. Shell Programming and Scripting

Deleting the similar lines

Dear Friends myself Avinash working in bash shell The problem goes like this I have a file called work.txt assume that first colum=mac address second colum= IP third colum = port number ---------------------------------------- 00:12:23:34 192.168.50.1 2 00:12:23:35 192.168.50.1 5... (2 Replies)
Discussion started by: avi.skynet
2 Replies

3. Infrastructure Monitoring

Remove Similar Lines from a File

I have a log file "logreport" that contains several lines as seen below: 04:20:00 /usr/lib/snmp/snmpdx: Agent snmpd appeared dead but responded to ping 06:38:08 /usr/lib/snmp/snmpdx: Agent snmpd appeared dead but responded to ping 07:11:05 /usr/lib/snmp/snmpdx: Agent snmpd appeared dead... (4 Replies)
Discussion started by: Nysif Steve
4 Replies

4. Shell Programming and Scripting

merging similar lines

Greetings, I have been trying to merge the following lines: Sat. May 9 8:00 PM Sat. May 9 8:00 PM CW Sat. May 9 8:00 PM CW Cursed Sat. May 9 9:00 PM Sat. May 9 9:00 PM CW Sat. May 9 9:00 PM CW Sanctuary Sat. May 16 8:00 PM Sat. May 16 8:00 PM CW Sat. May 16 8:00 PM CW Sanctuary Sat. May... (2 Replies)
Discussion started by: adambot
2 Replies

5. Shell Programming and Scripting

Counting similar lines

Hi, I have a little problem with counting lines. I know similar topics from this forum, but they don't resolve my problem. I have file with lines like this: 2009-05-25 16:55:32,143 some text some regular expressions ect. 2009-05-25 16:55:32,144 some text. 2009-05-28 18:15:12,148 some... (4 Replies)
Discussion started by: marcinnnn
4 Replies

6. Homework & Coursework Questions

Filtering Unique Lines

Use and complete the template provided. The entire template must be completed. If you don't, your post may be deleted! 1. The problem statement, all variables and given/known data: The uniq command excludes consecutive duplicate lines. It has a -c option to display a count of the number... (1 Reply)
Discussion started by: billydeanmak
1 Replies

7. Shell Programming and Scripting

Maximum Value of similar lines

Hi, Pretty new to scripting sed awk etc. I'm trying to speed up calculations of disk space allocation. I've extracted the data i want and cleaned it up but i cant figure out the final step. I need to discover a Maximum value of 1 field where the value of another field is the same using awk so... (4 Replies)
Discussion started by: imarcs
4 Replies

8. Shell Programming and Scripting

Filtering a list

Hi all When I run the system command "rpm -qa |grep xmpp" it lists many files like xmpp-3.2.10.20111024-3 xmpp-3.2.10.201110_asd xmpp-3.2.10.201 and I want to uninstall all the rpms by using rpm -e. How can I do this??? NOTE: The number of rpms will vary... (3 Replies)
Discussion started by: Ananthdoss
3 Replies

9. Shell Programming and Scripting

extracting lines from a file with similar first name

consider i have two files cat onlyviews1.sql CREATE VIEW V11 AS SELECT id, name, FROM etc etc WHERE etc etc; CREATE VIEW V22 AS SELECT id, name, FROM etc etc WHERE etc etc; CREATE VIEW V33 AS (10 Replies)
Discussion started by: vivek d r
10 Replies

10. Solaris

Getting similar lines in two files

Hi, I need to compare the /etc/passwd files from 2 servers, and extract the users that are similar in these two files. I sorted the 2 files based on the user IDs (UID) (3rd column). I first sorted the files using the username (1st column), however when I use comm to compare the files there is no... (1 Reply)
Discussion started by: anaigini45
1 Replies
DJVUSERVE(1)							   DjVuLibre-3.5						      DJVUSERVE(1)

NAME
djvuserve - Generate indirect DjVu documents on the fly. DESCRIPTION
Program djvuserve is a CGI program that can be executed by a HTTP server for serving DjVu documents. This program is able to convert a bundled multi-page document into an indirect document on the fly. USING DJVUSERVE
Program djvuserve must first be installed as a CGI program for your web server. There are several ways to achieve this. The Apache web server, for instance, often defines a specific directory for CGI programs using the ScriptAlias directive. Assume that the file httpd.conf contains the following line: ScriptAlias /cgi-bin/ "/var/www/cgi-bin" It is then sufficient to create a small executable shell script /var/www/cgi-bin/djvuserve containing the following lines: #!/bin/sh exec /full/path/to/djvuserve Suppose that a large bundled multi-page DjVu document is available at the following URL. http://server/dir/doc.djvu The CGI program djvuserve lets you access this same document as an indirect multi-page DjVu document using the following URL. http://server/cgi-bin/djvuserve/dir/doc.djvu/index.djvu Serving indirect multi-page DjVu documents provides for efficiently browsing large document without transferring unnecessary pages over the network. See djvu(1) for more information. Furthermore djvuserve searches certain keywords among the CGI arguments of the URL. The keyword bundled forces serving a bundled document using http://server/cgi-bin/djvuserve/dir/doc.djvu?bundled The keyword download inserts a content disposition HTTP header that suggests to display a save dialog instead of displaying the document. http://server/cgi-bin/djvuserve/dir/doc.djvu?download USING DJVUSERVE AS A HANDLER
The Apache web server provides a way to automatically execute djvuserve for all DjVu documents. This can be achieved using the following directives in either the Apache configuration file or the .htaccess files. Action djvu-server /cgi-bin/djvuserve/ AddHandler djvu-server .djvu Apache then executes program djvuserve for serving all DjVu files. Providing the URL of DjVu file serves this DjVu file as usual, except that bundled multipage documents are converted to indirect documents on the fly. This convenience comes at the expense of the computa- tional cost of executing djvuserve whenever a DjVu file is requested. TECHNICAL DETAILS
Program djvuserve provides a mean to directly access any component of a bundled multi-page DjVu document can be accessed using an extended URL. Suppose that the component file representing page 1 is named p0001.djvu. The following URL provides a direct access to this page: http://server/cgi-bin/djvuserve/dir/doc.djvu/p0001.djvu It is preferred however to access individual pages using the CGI style arguments described in nsdejavu(1), as in the following URL. http://server/cgi-bin/djvuserve/dir/doc.djvu?djvuopts&page=12 The special component file name index.djvu is recognized as a request for the index of the corresponding indirect multi-page document. In fact, when you access a bundled document using djvuserve, the browser gets redirected to the following URL: http://server/cgi-bin/djvuserve/dir/doc.djvu/index.djvu and then behaves as if the bundled file was a directory containing the various component files of an equivalent indirect document. ACCESS CONTROL
Program djvuserve, like many CGI programs, bypasses a number of access protections established in a web server. Assume for instance that your web site contains DjVu files protected by a password. Program djvuserve knows nothing about this protection and will happily serve any DjVu file associated with a valid URL. Access control with djvuserve can be implemented by first remembering that the web server always executes program djvuserve via shell script /var/www/cgi-bin/djvuserve. This script can decide to execute the real program djvuserve on the basis of the target filename available in the environment variable PATH_TRANSLATED. There can be several such scripts providing access to various collections of DjVu files. Each of these scripts can be password protected using the usual methods supported by your web server. KNOWN BUGS
Hyperlinks specified using a relative URL may not work with djvuserve. These URLs are relative to the URL of the DjVu document. Yet djvuserve changes the apparent document URL http://server/dir/doc.djvu into the more complicated URL http://server/cgi-bin/djvuserve/dir/doc.djvu/index.djvu. The extra components change the interpretation of relative URLs. CREDITS
This program was written by Leon Bottou <leonb@users.sourceforge.com>. SEE ALSO
djvu(1), djvmcvt(1), nsdejavu(1) DjVuLibre-3.5 01/22/2002 DJVUSERVE(1)
All times are GMT -4. The time now is 10:38 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy