Linux and UNIX Man Pages

Linux & Unix Commands - Search Man Pages

kinosearch1::analysis::tokenizer(3pm) [debian man page]

KinoSearch1::Analysis::Tokenizer(3pm)			User Contributed Perl Documentation		     KinoSearch1::Analysis::Tokenizer(3pm)

NAME
KinoSearch1::Analysis::Tokenizer - customizable tokenizing SYNOPSIS
my $whitespace_tokenizer = KinoSearch1::Analysis::Tokenizer->new( token_re => qr/S+/, ); # or... my $word_char_tokenizer = KinoSearch1::Analysis::Tokenizer->new( token_re => qr/w+/, ); # or... my $apostrophising_tokenizer = KinoSearch1::Analysis::Tokenizer->new; # then... once you have a tokenizer, put it into a PolyAnalyzer my $polyanalyzer = KinoSearch1::Analysis::PolyAnalyzer->new( analyzers => [ $lc_normalizer, $word_char_tokenizer, $stemmer ], ); DESCRIPTION
Generically, "tokenizing" is a process of breaking up a string into an array of "tokens". # before: my $string = "three blind mice"; # after: @tokens = qw( three blind mice ); KinoSearch1::Analysis::Tokenizer decides where it should break up the text based on the value of "token_re". # before: my $string = "Eats, Shoots and Leaves."; # tokenized by $whitespace_tokenizer @tokens = qw( Eats, Shoots and Leaves. ); # tokenized by $word_char_tokenizer @tokens = qw( Eats Shoots and Leaves ); METHODS
new # match "O'Henry" as well as "Henry" and "it's" as well as "it" my $token_re = qr/  # start with a word boundary w+ # Match word chars. (?: # Group, but don't capture... 'w+ # ... an apostrophe plus word chars. )? # Matching the apostrophe group is optional.  # end with a word boundary /xsm; my $tokenizer = KinoSearch1::Analysis::Tokenizer->new( token_re => $token_re, # default: what you see above ); Constructor. Takes one hash style parameter. o token_re - must be a pre-compiled regular expression matching one token. COPYRIGHT
Copyright 2005-2010 Marvin Humphrey LICENSE, DISCLAIMER, BUGS, etc. See KinoSearch1 version 1.00. perl v5.14.2 2011-11-15 KinoSearch1::Analysis::Tokenizer(3pm)

Check Out this Related Man Page

KinoSearch1::Search::SearchClient(3pm)			User Contributed Perl Documentation		    KinoSearch1::Search::SearchClient(3pm)

Make a remote procedure call.  For every call that does not close/terminate the socket connection, expect a response back that's been serialized
using Storable.

NAME
KinoSearch1::Search::SearchClient - connect to a remote SearchServer SYNOPSIS
my $client = KinoSearch1::Search::SearchClient->new( peer_address => 'searchserver1:7890', password => $pass, analyzer => $analyzer, ); my $hits = $client->search( query => $query ); DESCRIPTION
SearchClient is a subclass of KinoSearch1::Searcher which can be used to search an index on a remote machine made accessible via SearchServer. METHODS
new Constructor. Takes hash-style params. o peer_address - The name/IP and the port number which the client should attempt to connect to. o password - Password to be supplied to the SearchServer when initializing socket connection. o analyzer - An object belonging to a subclass of KinoSearch1::Analysis::Analyzer LIMITATIONS
Limiting search results with a QueryFilter is not yet supported. COPYRIGHT
Copyright 2006-2010 Marvin Humphrey LICENSE, DISCLAIMER, BUGS, etc. See KinoSearch1 version 1.00. perl v5.14.2 2011-11-15 KinoSearch1::Search::SearchClient(3pm)
Man Page

6 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to strip apostrophe from a file

I am trying to remove or replace various extraneous characters from a file so that subsequent processes work correctly. The characters that is giving me trouble is the apostrophe '. The command I 'm trying is sed 's/\'//g' ${IN_WRK_DIR}/file1 > ${IN_WRK_DIR}/file2 in a Korn script on HP... (8 Replies)
Discussion started by: aquimby
8 Replies

2. Shell Programming and Scripting

PERL question

im trying to retrieve text in between <title> tags of a tagged document, store the text in a temporary file, feed that file into a porter stemmer program, capture the stemmer's output by piping it into an output file, then reconstructing the original tagged document but with the text in... (1 Reply)
Discussion started by: mark_nsx
1 Replies

3. UNIX for Advanced & Expert Users

Core Dump Analysis Using PStack and PMAP

Hello, I'm new to the group and this is my first post. I'm hoping someone can help me out. I have a core dump that I need to analyze from a Unix box and I've never done this sort of thing before. I was told to run a pmap and pstack on the core file which provided two different output files. ... (3 Replies)
Discussion started by: kimblebee
3 Replies

4. UNIX for Dummies Questions & Answers

breaking a word into chars

to break a word into characters.. Eg UNIX to U N I X (6 Replies)
Discussion started by: arunsubbhian
6 Replies

5. Shell Programming and Scripting

Help with tokenizer

Hello folks, Can you help me with this issue: I have a flatfile says: line#1: HARRY WENT TO SCHOOL|SALLY JOINED HIM|AT SAINT ANN|THEY ARE GOING ON PICNIC |CAN YOU GUESS| line#2: HELLO SAM|HOW IS IT GOING|DID YOU WATCH THE FOOTBALL LAST SUNDAY| I would like to have the output... (1 Reply)
Discussion started by: sbasetty
1 Replies

6. Shell Programming and Scripting

Perl - Title Case after apostrophe

I've got: $string =~ s/(\w+)/\u\L$1/g; Which capitalizes each word in the string. The problem is if I have a string with an apostrophe the first letter after it gets capitalized as well. So Bob's becomes Bob'S. Thanks for any quick fixes! (4 Replies)
Discussion started by: mjmtaiwan
4 Replies