kinosearch1::analysis::tokenizer(3pm) [debian man page]
KinoSearch1::Analysis::Tokenizer(3pm) User Contributed Perl Documentation KinoSearch1::Analysis::Tokenizer(3pm)NAME
KinoSearch1::Analysis::Tokenizer - customizable tokenizing
SYNOPSIS
my $whitespace_tokenizer
= KinoSearch1::Analysis::Tokenizer->new( token_re => qr/S+/, );
# or...
my $word_char_tokenizer
= KinoSearch1::Analysis::Tokenizer->new( token_re => qr/w+/, );
# or...
my $apostrophising_tokenizer = KinoSearch1::Analysis::Tokenizer->new;
# then... once you have a tokenizer, put it into a PolyAnalyzer
my $polyanalyzer = KinoSearch1::Analysis::PolyAnalyzer->new(
analyzers => [ $lc_normalizer, $word_char_tokenizer, $stemmer ], );
DESCRIPTION
Generically, "tokenizing" is a process of breaking up a string into an array of "tokens".
# before:
my $string = "three blind mice";
# after:
@tokens = qw( three blind mice );
KinoSearch1::Analysis::Tokenizer decides where it should break up the text based on the value of "token_re".
# before:
my $string = "Eats, Shoots and Leaves.";
# tokenized by $whitespace_tokenizer
@tokens = qw( Eats, Shoots and Leaves. );
# tokenized by $word_char_tokenizer
@tokens = qw( Eats Shoots and Leaves );
METHODS
new
# match "O'Henry" as well as "Henry" and "it's" as well as "it"
my $token_re = qr/
# start with a word boundary
w+ # Match word chars.
(?: # Group, but don't capture...
'w+ # ... an apostrophe plus word chars.
)? # Matching the apostrophe group is optional.
# end with a word boundary
/xsm;
my $tokenizer = KinoSearch1::Analysis::Tokenizer->new(
token_re => $token_re, # default: what you see above
);
Constructor. Takes one hash style parameter.
o token_re - must be a pre-compiled regular expression matching one token.
COPYRIGHT
Copyright 2005-2010 Marvin Humphrey
LICENSE, DISCLAIMER, BUGS, etc.
See KinoSearch1 version 1.00.
perl v5.14.2 2011-11-15 KinoSearch1::Analysis::Tokenizer(3pm)
Check Out this Related Man Page
KinoSearch1::Search::SearchClient(3pm) User Contributed Perl Documentation KinoSearch1::Search::SearchClient(3pm)Make a remote procedure call. For every call that does not close/terminate the socket connection, expect a response back that's been serialized
using Storable.NAME
KinoSearch1::Search::SearchClient - connect to a remote SearchServer
SYNOPSIS
my $client = KinoSearch1::Search::SearchClient->new(
peer_address => 'searchserver1:7890',
password => $pass,
analyzer => $analyzer,
);
my $hits = $client->search( query => $query );
DESCRIPTION
SearchClient is a subclass of KinoSearch1::Searcher which can be used to search an index on a remote machine made accessible via
SearchServer.
METHODS
new
Constructor. Takes hash-style params.
o peer_address - The name/IP and the port number which the client should attempt to connect to.
o password - Password to be supplied to the SearchServer when initializing socket connection.
o analyzer - An object belonging to a subclass of KinoSearch1::Analysis::Analyzer
LIMITATIONS
Limiting search results with a QueryFilter is not yet supported.
COPYRIGHT
Copyright 2006-2010 Marvin Humphrey
LICENSE, DISCLAIMER, BUGS, etc.
See KinoSearch1 version 1.00.
perl v5.14.2 2011-11-15 KinoSearch1::Search::SearchClient(3pm)
I am trying to remove or replace various extraneous characters from a file so that subsequent processes work correctly. The characters that is giving me trouble is the apostrophe '.
The command I 'm trying is
sed 's/\'//g' ${IN_WRK_DIR}/file1 > ${IN_WRK_DIR}/file2
in a Korn script on HP... (8 Replies)
im trying to retrieve text in between <title> tags of a tagged document, store the text in a temporary file, feed that file into a porter stemmer program, capture the stemmer's output by piping it into an output file, then reconstructing the original tagged document but with the text in... (1 Reply)
Hello,
I'm new to the group and this is my first post. I'm hoping someone can help me out. I have a core dump that I need to analyze from a Unix box and I've never done this sort of thing before. I was told to run a pmap and pstack on the core file which provided two different output files. ... (3 Replies)
Hello folks,
Can you help me with this issue:
I have a flatfile says:
line#1:
HARRY WENT TO SCHOOL|SALLY JOINED HIM|AT SAINT ANN|THEY ARE GOING ON PICNIC |CAN YOU GUESS|
line#2:
HELLO SAM|HOW IS IT GOING|DID YOU WATCH THE FOOTBALL LAST SUNDAY|
I would like to have the output... (1 Reply)
I've got:
$string =~ s/(\w+)/\u\L$1/g;
Which capitalizes each word in the string. The problem is if I have a string with an apostrophe the first letter after it gets capitalized as well.
So Bob's becomes Bob'S.
Thanks for any quick fixes! (4 Replies)