kinosearch1::analysis::analyzer(3pm) [debian man page]
KinoSearch1::Analysis::Analyzer(3pm) User Contributed Perl Documentation KinoSearch1::Analysis::Analyzer(3pm)NAME
KinoSearch1::Analysis::Analyzer - base class for analyzers
SYNOPSIS
# abstract base class -- you probably want PolyAnalyzer, not this.
DESCRIPTION
In KinoSearch1, an Analyzer is a filter which processes text, transforming it from one form into another. For instance, an analyzer might
break up a long text into smaller pieces (Tokenizer), or it might convert text to lowercase (LCNormalizer).
METHODS
analyze (EXPERIMENTAL)
$token_batch = $analyzer->analyze($token_batch);
All Analyzer subclasses provide an "analyze" method. "analyze()" takes a single TokenBatch as input, and it returns a TokenBatch, either
the same one (probably transformed in some way), or a new one.
COPYRIGHT
Copyright 2005-2010 Marvin Humphrey
LICENSE, DISCLAIMER, BUGS, etc.
See KinoSearch1 version 1.00.
perl v5.14.2 2011-11-15 KinoSearch1::Analysis::Analyzer(3pm)
Check Out this Related Man Page
KinoSearch1::Analysis::Tokenizer(3pm) User Contributed Perl Documentation KinoSearch1::Analysis::Tokenizer(3pm)NAME
KinoSearch1::Analysis::Tokenizer - customizable tokenizing
SYNOPSIS
my $whitespace_tokenizer
= KinoSearch1::Analysis::Tokenizer->new( token_re => qr/S+/, );
# or...
my $word_char_tokenizer
= KinoSearch1::Analysis::Tokenizer->new( token_re => qr/w+/, );
# or...
my $apostrophising_tokenizer = KinoSearch1::Analysis::Tokenizer->new;
# then... once you have a tokenizer, put it into a PolyAnalyzer
my $polyanalyzer = KinoSearch1::Analysis::PolyAnalyzer->new(
analyzers => [ $lc_normalizer, $word_char_tokenizer, $stemmer ], );
DESCRIPTION
Generically, "tokenizing" is a process of breaking up a string into an array of "tokens".
# before:
my $string = "three blind mice";
# after:
@tokens = qw( three blind mice );
KinoSearch1::Analysis::Tokenizer decides where it should break up the text based on the value of "token_re".
# before:
my $string = "Eats, Shoots and Leaves.";
# tokenized by $whitespace_tokenizer
@tokens = qw( Eats, Shoots and Leaves. );
# tokenized by $word_char_tokenizer
@tokens = qw( Eats Shoots and Leaves );
METHODS
new
# match "O'Henry" as well as "Henry" and "it's" as well as "it"
my $token_re = qr/
# start with a word boundary
w+ # Match word chars.
(?: # Group, but don't capture...
'w+ # ... an apostrophe plus word chars.
)? # Matching the apostrophe group is optional.
# end with a word boundary
/xsm;
my $tokenizer = KinoSearch1::Analysis::Tokenizer->new(
token_re => $token_re, # default: what you see above
);
Constructor. Takes one hash style parameter.
o token_re - must be a pre-compiled regular expression matching one token.
COPYRIGHT
Copyright 2005-2010 Marvin Humphrey
LICENSE, DISCLAIMER, BUGS, etc.
See KinoSearch1 version 1.00.
perl v5.14.2 2011-11-15 KinoSearch1::Analysis::Tokenizer(3pm)
I have a huge matrix file containing some 1.5 million rows and 6000 columns. The matrix looks something like this:
1 2 3
4 5 6
7 8 9
3 4 5
I want to add all the numbers in the columns of this matrix and display the result to my stdout. This means that the numbers in the first column are:
... (2 Replies)
Hi,
Just trying to get to grips with sed and awk for some reporting for work and I need some assistance:
I have a file that lists policy names on the first line and then on the second line whether the policy is active or not.
Policy Name: Policy1
Active: yes
Policy... (8 Replies)
version info :
vi availabe with RHEL 5.4
I have a text file with 10,000 lines. I want to copy lines from 5000th line to 7000th and redirect to a file. Any idea how I can do this?
Note:
The above scenario is just an example. In my actual requirement, the file has 14 million lines and I want... (9 Replies)
Hi everyone,
I know the following questions are noobish questions but I am asking them because I am confused about the basics of history behind UNIX and LINUX.
Ok onto business, my questions are-:
Was/Is UNIX ever an open source operating system ?
If UNIX was... (21 Replies)
Hello,
I couldn't find an actual introduction thread, so I decided to just put this here.
I go by d0wngrade online. I have been programming in multiple languages for about 15+ years. I started with standard web design languages like HTML and CSS, but I then advanced from design to development... (2 Replies)
Hi guys...
The first active code line in AudioScope.sh is set -u .
This causes a complete exit if a variable is used/found but has not been allocated at the start of the program.
However, apart from writing code to do the task, is there a switch to to check which variables have been... (17 Replies)
Hi.
In thread https://www.unix.com/shell-programming-and-scripting/267833-grouping-counting.html rovf and I had a mini-discussion on grep and awk.
Here is a demo script that compares the awk and grep approaches for this single problem:
#!/usr/bin/env bash
# @(#) s2 Demonstrate group... (1 Reply)
Hello,
I have to fish out some specific columns from a file based on the header value. I have the list of columns I need in a different file. I thought I could read in the list of headers I need,
# file with header names of required columns in required order
headers_file=$2
# read contents... (11 Replies)
For those interested in installing dash shell on OSX Lion to help test POSIX compliancy of shell scripts, it is quite easy. I did it like this:
If you don't have gcc on your system:
0. Download and install the Command Line Tools for Xcode package from Sign In - Apple *
1. Download the dash... (2 Replies)
Hello and thanks in advance for any help anyone can offer me
I'm trying to learn the find command and thought I was understanding it... Apparently I was wrong. I was doing compound searches and I started getting weird results with the -size test. I was trying to do a search on a 1G file owned by... (14 Replies)
I have data of an excel files as given below,
file1
org1_1 1 1 2.5 100
org1_2 1 2 5.5 98
org1_3 1 3 7.2 88
file2
org2_1 1 1 2.5 100
org2_2 1 2 5.5 56
org2_3 1 3 7.2 70
I have multiple excel files as above shown.
I have to copy column 1, column 4 and paste into a new excel file as... (26 Replies)
Dear All,
Taking a break from Vue.js coding for the site, SEO and YT videos; and hopefully addressing some well deserved criticism from some here that I have been too focused on the visual aspects of the forums versus the substance and the community....
While the "current generation... (9 Replies)
Hi all...
Well guys and gals, I jumped in at the deep end and found things that PERL cannot do by default.
Many tricky terminal escape codes are not catered for so I had to create workarounds.
One thing I searched for was this:
Passing perl variable to shell command
AND, @Neo this was... (15 Replies)