kinosearch1::analysis::tokenizer(3pm) [debian man page]

KinoSearch1::Analysis::Tokenizer(3pm)			User Contributed Perl Documentation		     KinoSearch1::Analysis::Tokenizer(3pm)

NAME

       KinoSearch1::Analysis::Tokenizer - customizable tokenizing

SYNOPSIS

	   my $whitespace_tokenizer
	       = KinoSearch1::Analysis::Tokenizer->new( token_re => qr/S+/, );

	   # or...
	   my $word_char_tokenizer
	       = KinoSearch1::Analysis::Tokenizer->new( token_re => qr/w+/, );

	   # or...
	   my $apostrophising_tokenizer = KinoSearch1::Analysis::Tokenizer->new;

	   # then... once you have a tokenizer, put it into a PolyAnalyzer
	   my $polyanalyzer = KinoSearch1::Analysis::PolyAnalyzer->new(
	       analyzers => [ $lc_normalizer, $word_char_tokenizer, $stemmer ], );

DESCRIPTION

       Generically, "tokenizing" is a process of breaking up a string into an array of "tokens".

	   # before:
	   my $string = "three blind mice";

	   # after:
	   @tokens = qw( three blind mice );

       KinoSearch1::Analysis::Tokenizer decides where it should break up the text based on the value of "token_re".

	   # before:
	   my $string = "Eats, Shoots and Leaves.";

	   # tokenized by $whitespace_tokenizer
	   @tokens = qw( Eats, Shoots and Leaves. );

	   # tokenized by $word_char_tokenizer
	   @tokens = qw( Eats Shoots and Leaves   );

METHODS

   new
	   # match "O'Henry" as well as "Henry" and "it's" as well as "it"
	   my $token_re = qr/
		   	     # start with a word boundary
		   w+	     # Match word chars.
		   (?:	     # Group, but don't capture...
		      'w+   # ... an apostrophe plus word chars.
		   )?	     # Matching the apostrophe group is optional.
		   	     # end with a word boundary
	       /xsm;
	   my $tokenizer = KinoSearch1::Analysis::Tokenizer->new(
	       token_re => $token_re, # default: what you see above
	   );

       Constructor.  Takes one hash style parameter.

       o   token_re - must be a pre-compiled regular expression matching one token.

COPYRIGHT

       Copyright 2005-2010 Marvin Humphrey

LICENSE, DISCLAIMER, BUGS, etc.
       See KinoSearch1 version 1.00.

perl v5.14.2							    2011-11-15				     KinoSearch1::Analysis::Tokenizer(3pm)

Check Out this Related Man Page

KinoSearch1::Search::SearchClient(3pm)			User Contributed Perl Documentation		    KinoSearch1::Search::SearchClient(3pm)

Make a remote procedure call.  For every call that does not close/terminate the socket connection, expect a response back that's been serialized
using Storable.

NAME

       KinoSearch1::Search::SearchClient - connect to a remote SearchServer

SYNOPSIS

	   my $client = KinoSearch1::Search::SearchClient->new(
	       peer_address => 'searchserver1:7890',
	       password     => $pass,
	       analyzer     => $analyzer,
	   );
	   my $hits = $client->search( query => $query );

DESCRIPTION

       SearchClient is a subclass of KinoSearch1::Searcher which can be used to search an index on a remote machine made accessible via
       SearchServer.

METHODS

   new
       Constructor.  Takes hash-style params.

       o   peer_address - The name/IP and the port number which the client should attempt to connect to.

       o   password - Password to be supplied to the SearchServer when initializing socket connection.

       o   analyzer - An object belonging to a subclass of KinoSearch1::Analysis::Analyzer

LIMITATIONS

       Limiting search results with a QueryFilter is not yet supported.

COPYRIGHT

       Copyright 2006-2010 Marvin Humphrey

LICENSE, DISCLAIMER, BUGS, etc.
       See KinoSearch1 version 1.00.

perl v5.14.2							    2011-11-15				    KinoSearch1::Search::SearchClient(3pm)

Linux and UNIX Man Pages

kinosearch1::analysis::tokenizer(3pm) [debian man page]

Check Out this Related Man Page

6 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to strip apostrophe from a file

Discussion started by: aquimby

2. Shell Programming and Scripting

PERL question

Discussion started by: mark_nsx

3. UNIX for Advanced & Expert Users

Core Dump Analysis Using PStack and PMAP

Discussion started by: kimblebee

4. UNIX for Dummies Questions & Answers

breaking a word into chars

Discussion started by: arunsubbhian

5. Shell Programming and Scripting

Help with tokenizer

Discussion started by: sbasetty

6. Shell Programming and Scripting

Perl - Title Case after apostrophe

Discussion started by: mjmtaiwan