debian man page for kinosearch1::analysis::token

Query: kinosearch1::analysis::token

OS: debian

Section: 3pm

Format: Original Unix Latex Style Formatted with HTML and a Horizontal Scroll Bar

KinoSearch1::Analysis::Token(3pm)			User Contributed Perl Documentation			 KinoSearch1::Analysis::Token(3pm)

NAME
KinoSearch1::Analysis::Token - unit of text
SYNOPSIS
# private class - no public API
PRIVATE CLASS
You can't actually instantiate a Token object at the Perl level -- however, you can affect individual Tokens within a TokenBatch by way of TokenBatch's (experimental) API.
DESCRIPTION
Token is the fundamental unit used by KinoSearch1's Analyzer subclasses. Each Token has 4 attributes: text, start_offset, end_offset, and pos_inc (for position increment). The text of a token is a string. A Token's start_offset and end_offset locate it within a larger text, even if the Token's text attribute gets modified -- by stemming, for instance. The Token for "beating" in the text "beating a dead horse" begins life with a start_offset of 0 and an end_offset of 7; after stemming, the text is "beat", but the end_offset is still 7. The position increment, which defaults to 1, is a an advanced tool for manipulating phrase matching. Ordinarily, Tokens are assigned consecutive position numbers: 0, 1, and 2 for "three blind mice". However, if you set the position increment for "blind" to, say, 1000, then the three tokens will end up assigned to positions 0, 1, and 1001 -- and will no longer produce a phrase match for the query '"three blind mice"'.
COPYRIGHT
Copyright 2006-2010 Marvin Humphrey LICENSE, DISCLAIMER, BUGS, etc. See KinoSearch1 version 1.00. perl v5.14.2 2011-11-15 KinoSearch1::Analysis::Token(3pm)
Related Man Pages
ppi::token::magic5.18(3) - mojave
kinosearch1::analysis::tokenbatch(3pm) - debian
kinosearch1::search::searchclient(3pm) - debian
ppi::token::magic(3pm) - debian
ppix::regexp::token(3pm) - debian
Similar Topics in the Unix Linux Community
Operational Analysis of Parallel Servers
Numerical Simulation and Analysis of Commercial Print Production Systems
An Empirical Approach to Modeling Uncertainty in Intrusion Analysis
Reliability Analysis of Deduplicated and Erasure-Coded Storage
Application with Accessibility for Blind People