A Novel Traffic Analysis for Identifying Search Fields in the Long Tail of Web Sites

Special Forums News, Links, Events and Announcements UNIX and Linux RSS News A Novel Traffic Analysis for Identifying Search Fields in the Long Tail of Web Sites

02-22-2010

Registered User

26,240, 27

Join Date: Sep 2000

Last Activity: 1 August 2008, 3:09 PM EDT

Posts: 26,240

Thanks Given: 0

Thanked 27 Times in 26 Posts

A Novel Traffic Analysis for Identifying Search Fields in the Long Tail of Web Sites

HPL-2010-27 A Novel Traffic Analysis for Identifying Search Fields in the Long Tail of Web Sites - Forman, George; Kirshenbaum, Evan; Rajaram, Shyamsundar
Keyword(s): web data mining, clickstream analysis, machine learning classification, active learning
Abstract: Using a clickstream sample of 2 billion URLs from many thousand volunteer Web users, we wish to analyze typical usage of keyword searches across the Web. In order to do this, we need to be able to determine whether a given URL represents a keyword search and, if so, which field contains the query. A ...
Full Report

More...

Linux Bot

View Public Profile for Linux Bot

Find all posts by Linux Bot

Previous Thread | Next Thread

7 More Discussions You Might Find Interesting

1. What is on Your Mind?

Your Favorite Tech Support Web Sites and Why?

Where do you go to participate in technical discussions besides UNIX.COM and why? Personally, I do not really participate in other forums and discussion boards, but I do ask questions from time to time on Stack sites. The problem I have with Stack is that my questions are never answered on any...

2. Red Hat

Web sites

Hi, I can't view web portal in my intranet from linux RHE, and neither to web application. My network configuration /etc/sysconfig/network-scripts/fcfg-eth0 is ok, what is happen?, can you help me please.

3. Shell Programming and Scripting

Identifying entries based on 2 fields in a string.

Hi Guys, I’m struggling to use two fields to do a duplicate/ unique by output. I want to look IP addresses assigned to more than one account during a given period in the logs. So duplicate IP and account > 1 then print all the logs for that IP. I have been Using AWK (just as its installed...

4. Shell Programming and Scripting

Identifying specific fields in a Row

Hi, I am new to UNIX. Can some one help me to solve the below. I have a requirement to to identify the specific fields in row and also some part of the field. In my file I have a record as sundra;10.44.48.65;10thstreet TCP packet out of state: First packet isn't SYN;telno:...

5. Web Development

How do you make web sites?

:confused: I've read how on some websites but I still don't get it. I need specific details. I want to make a website for my photography. Please help!:D

6. OS X (Apple)

Use UNIX to track web sites viewed?

I'm on OSX 10.4. I was wondering if you can use UNIX terminal to track what web sites have been viewed on this Mac... Thank you!

7. Solaris

Identifying new fields of data

i have hundreds of lines of formatted data with 10 different fields per line. the data is refreshed every few minutes and some fields in some lines may reflect new data. i'm looking for a sample of code that help me to identify those new fields so that i can write them to a file to indicate that...

LEARN ABOUT DEBIAN

kinosearch1::analysis::stopalizer

KinoSearch1::Analysis::Stopalizer(3pm)			User Contributed Perl Documentation		    KinoSearch1::Analysis::Stopalizer(3pm)

NAME

       KinoSearch1::Analysis::Stopalizer - suppress a "stoplist" of common words

SYNOPSIS

	   my $stopalizer = KinoSearch1::Analysis::Stopalizer->new(
	       language => 'fr',
	   );
	   my $polyanalyzer = KinoSearch1::Analysis::PolyAnalyzer->new(
	       analyzers => [ $lc_normalizer, $tokenizer, $stopalizer, $stemmer ],
	   );

DESCRIPTION

       A "stoplist" is collection of "stopwords": words which are common enough to be of little value when determining search results.	For
       example, so many documents in English contain "the", "if", and "maybe" that it may improve both performance and relevance to block them.

	   # before
	   @token_texts = ('i', 'am', 'the', 'walrus');

	   # after
	   @token_texts = ('',	'',   '',    'walrus');

CONSTRUCTOR

   new
	   my $stopalizer = KinoSearch1::Analysis::Stopalizer->new(
	       language => 'de',
	   );

	   # or...
	   my $stopalizer = KinoSearch1::Analysis::Stopalizer->new(
	       stoplist => \%stoplist,
	   );

       new() takes two possible parameters, "language" and "stoplist".	If "stoplist" is supplied, it will be used, overriding the behavior
       indicated by the value of "language".

       o   stoplist - must be a hashref, with stopwords as the keys of the hash and values set to 1.

       o   language - must be the ISO code for a language.  Loads a default stoplist supplied by Lingua::StopWords.

SEE ALSO

       Lingua::StopWords

COPYRIGHT

       Copyright 2005-2010 Marvin Humphrey

LICENSE, DISCLAIMER, BUGS, etc.
       See KinoSearch1 version 1.00.

perl v5.14.2							    2011-11-15				    KinoSearch1::Analysis::Stopalizer(3pm)

UNIX and Linux RSS News