Filtering similar lines in a big list Post: 302403706

Sponsored Content

Top Forums UNIX for Dummies Questions & Answers Filtering similar lines in a big list Post 302403706 by Andrew9191 on Sunday 14th of March 2010 01:27:54 AM

03-14-2010

Registered User

Filtering similar lines in a big list

I received this question for homework:

Quote:

The School of IT webmaster has noticed that visitors sometimes can't find web pages because they use incorrect capitalisation in the URL.
How often does this happen?

You can tell when URL is incorrect because the HTTP response code (the second last field in access_log.txt) has the value 404, which means not found.
A successful request has the response code 200.

You must print requests which only differ by their capitalisation where one was successful (code 200) and the other was not (code 404).
For example, if the two lines appear in access_log.txt:
65.214.44.112 -... "GET /~user1/HELLO/world HTTP/1.0" 404 -
...
68.142.251.190 ... "GET /~user1/hello/WORLD HTTP/1.0" 200 10
then your program should print out each URL converted to lowercase, in alphabetical order:
/~user1/hello/world

You can assume that no requested files appeared or disappeared during the period covered by access_log.txt.

We have to write our program into a .sh file, with "#!/bin/bash" as the first line. And we have the list of access logs in a file, looking like this (it's nearly 10,000 lines long):

Code:

65.214.44.112 - - [01/Feb/2008:00:06:44 +1100] "GET /~user0/cgg/msg08400.html HTTP/1.0" 304 -
203.109.124.175 - - [01/Feb/2008:00:15:17 +1100] "GET /~user39/ss_logo.jpg HTTP/1.1" 304 -
68.142.249.191 - - [01/Feb/2008:00:18:42 +1100] "GET /~user11/resnik/fsbt_viewreview?PERSON=Naz&BT_ID=9 HTTP/1.0" 200 7729
81.156.30.154 - - [01/Feb/2008:00:26:12 +1100] "POST /~user0/cgi-bin/hail_stone_coker.cgi HTTP/1.1" 200 185
65.214.44.112 - - [01/Feb/2008:00:31:11 +1100] "GET /~user0/cgg/msg00211.html HTTP/1.0" 304 -
68.142.249.127 - - [01/Feb/2008:00:31:34 +1100] "GET /~user44/examples/chapter5/example5.py HTTP/1.0" 200 334
145.64.134.244 - - [01/Feb/2008:00:32:18 +1100] "GET /~user20/ppgraph/ppg_create.2bp HTTP/1.0" 200 1070

How should I do this question? I know how to use 'cut' to get the section required as the answer, and i used 'egrep' to filter out the lines that do not contain neither 200 or 404. What should I do next. I'm completely lost in terms of matching the requests with same URLs but different response codes.

Thanks for the help guys.

Andrew9191

View Public Profile for Andrew9191

Find all posts by Andrew9191

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

Filtering duplicate lines

Does anybody know a command that filters duplicate lines out of a file. Similar to the uniq command but can handle duplicate lines no matter where they occur in a file?

2. Shell Programming and Scripting

Deleting the similar lines

Dear Friends myself Avinash working in bash shell The problem goes like this I have a file called work.txt assume that first colum=mac address second colum= IP third colum = port number ---------------------------------------- 00:12:23:34 192.168.50.1 2 00:12:23:35 192.168.50.1 5...

3. Infrastructure Monitoring

Remove Similar Lines from a File

I have a log file "logreport" that contains several lines as seen below: 04:20:00 /usr/lib/snmp/snmpdx: Agent snmpd appeared dead but responded to ping 06:38:08 /usr/lib/snmp/snmpdx: Agent snmpd appeared dead but responded to ping 07:11:05 /usr/lib/snmp/snmpdx: Agent snmpd appeared dead...

4. Shell Programming and Scripting

merging similar lines

Greetings, I have been trying to merge the following lines: Sat. May 9 8:00 PM Sat. May 9 8:00 PM CW Sat. May 9 8:00 PM CW Cursed Sat. May 9 9:00 PM Sat. May 9 9:00 PM CW Sat. May 9 9:00 PM CW Sanctuary Sat. May 16 8:00 PM Sat. May 16 8:00 PM CW Sat. May 16 8:00 PM CW Sanctuary Sat. May...

5. Shell Programming and Scripting

Counting similar lines

Hi, I have a little problem with counting lines. I know similar topics from this forum, but they don't resolve my problem. I have file with lines like this: 2009-05-25 16:55:32,143 some text some regular expressions ect. 2009-05-25 16:55:32,144 some text. 2009-05-28 18:15:12,148 some...

6. Homework & Coursework Questions

Filtering Unique Lines

Use and complete the template provided. The entire template must be completed. If you don't, your post may be deleted! 1. The problem statement, all variables and given/known data: The uniq command excludes consecutive duplicate lines. It has a -c option to display a count of the number...

7. Shell Programming and Scripting

Maximum Value of similar lines

Hi, Pretty new to scripting sed awk etc. I'm trying to speed up calculations of disk space allocation. I've extracted the data i want and cleaned it up but i cant figure out the final step. I need to discover a Maximum value of 1 field where the value of another field is the same using awk so...

8. Shell Programming and Scripting

Filtering a list

Hi all When I run the system command "rpm -qa |grep xmpp" it lists many files like xmpp-3.2.10.20111024-3 xmpp-3.2.10.201110_asd xmpp-3.2.10.201 and I want to uninstall all the rpms by using rpm -e. How can I do this??? NOTE: The number of rpms will vary...

9. Shell Programming and Scripting

extracting lines from a file with similar first name

consider i have two files cat onlyviews1.sql CREATE VIEW V11 AS SELECT id, name, FROM etc etc WHERE etc etc; CREATE VIEW V22 AS SELECT id, name, FROM etc etc WHERE etc etc; CREATE VIEW V33 AS

10. Solaris

Getting similar lines in two files

Hi, I need to compare the /etc/passwd files from 2 servers, and extract the users that are similar in these two files. I sorted the 2 files based on the user IDs (UID) (3rd column). I first sorted the files using the username (1st column), however when I use comm to compare the files there is no...

LEARN ABOUT DEBIAN

bio::popgen::taghaplotype

Bio::PopGen::TagHaplotype(3pm)				User Contributed Perl Documentation			    Bio::PopGen::TagHaplotype(3pm)

NAME

       Bio::PopGen::TagHaplotype.pm - Haplotype tag object.

SYNOPSIS

	   use Bio::PopGen::TagHaplotype;

	   my $obj = Bio::PopGen::TagHaplotype -> new($hap);

DESCRIPTION

       This module take as input a haplotype and try toe get the minimal set of SNP that define the haplotype. This module can be use alone.  But
       due to the tagging haplotype process is exponential one. My suggestion is that before to use this module you pass your data under Select.mp
       module also on this folder.  In any case if, you provide an haplotype the module will try to find the answer to your question.

CONSTRUCTORS

	   my $obj = Bio::PopGen::TagHaplotype -> new($hap);

	   were $hap is the reference to an array of array with the haplotype.

	   $hap= [[0, 0, 0],
		  [1, 0, 0],
		  [0, 1, 1]
		 ];

FEEDBACK

   Mailing Lists
       User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to the
       Bioperl mailing list.  Your participation is much appreciated.

	 bioperl-l@bioperl.org			- General discussion
	 http://bioperl.org/wiki/Mailing_lists	- About the mailing lists

   Support
       Please direct usage questions or support issues to the mailing list:

       bioperl-l@bioperl.org

       rather than to the module maintainer directly. Many experienced and reponsive experts will be able look at the problem and quickly address
       it. Please include a thorough description of the problem with code and data examples if at all possible.

   Reporting Bugs
       Report bugs to the Bioperl bug tracking system to help us keep track of the bugs and their resolution. Bug reports can be submitted via the
       web:

	 https://redmine.open-bio.org/projects/bioperl/

AUTHOR - Pedro M. Gomez-Fabre
       Email pgf18872-at-gsk-dot-com

   new
	Title	: new
	Function: constructor of the class.
	Returns : self hash
	Args	: input haplotype (array of array)
	Status	: public

   haplotype_block
	Title	: haplotype_block
	Usage	: my $haplotype_block = $TagHaplotype->haplotype_block();
	Function: Get the haplotype block for a haplotype tagging selection
	Returns : reference of array
	Args	: reference of array with haplotype pattern

   input_block
	Title	: input_block
	Usage	: $obj->input_block()
	Function: returns haplotype block. By now will produce the same output than
		  $self->haplotype_block. but for compatiblity, this method is kept.
		  This method is deprecated.
	Returns : reference to array of array with the haplotype input value
	Args	: none
	Status	: public

   tag_list
	Title	: tag_list
	Usage	: $obj->tag_list()
	Function: returns the list of SNPs combination that identify the
		  haplotype. All combinations are displayed as arrays
	Returns : reference to array of array.
	Args	: none
	Status	: public

   tag_length
	Title	: tag_length
	Usage	: $obj->tag_length()
	Function: returns the length of the tag.
	Returns : scalar
	Args	: none
	Status	: public

   _scan_snp
	Title	: _scan_snp
	Usage	: internal
	Function: scan sets increasing the length until find a non degenerated
		  pattern.
	Returns : scalar
	Args	: none
	Status	: private

   _gen_comb
	Title	: _gen_comb
	Usage	: internal
	Function: we supply the length of the haplotype and the length of the
		  word we need to find and the functions returns the possible
		  list of combinations.
	Returns : scalar
	Args	: none
	Status	: private

   _generateCombinations
	Title	: _generateCombinations
	Usage	: internal
	Function: Recursive function that produce all combinations for a set

		  i.e.:

		  1, 2, 3, 4

		  and word of B<3> will produce:

		  1, 2, 3
		  1, 2, 4
		  1, 3, 4
		  2, 3, 4

	Returns :
	Args	: none
	Status	: private

   _scan_combinations
	Title	: _scan_combinations
	Usage	: internal
	Function: take the haplotype and a list of possible combination
		  for that length. Generate a subset and scan it to find if
		  the information is enought to define the haplotype set.
	Returns :
	Args	: none
	Status	: private

perl v5.14.2							    2012-03-02					    Bio::PopGen::TagHaplotype(3pm)

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

Filtering duplicate lines

Discussion started by: AreaMan

2. Shell Programming and Scripting

Deleting the similar lines

Discussion started by: avi.skynet

3. Infrastructure Monitoring

Remove Similar Lines from a File

Discussion started by: Nysif Steve

4. Shell Programming and Scripting

merging similar lines

Discussion started by: adambot