debian man page for ucto

Query: ucto

OS: debian

Section: 1

Format: Original Unix Latex Style Formatted with HTML and a Horizontal Scroll Bar

ucto(1) 						      General Commands Manual							   ucto(1)

NAME
ucto - Unicode Tokenizer
SYNOPSYS
ucto [[options]] [input-file] [[output-file]]
DESCRIPTION
ucto ucto tokenizes text files: it separates words from punctuation, splits sentences (and optionally paragraphs), and finds paired quotes. Ucto is preconfigured with tokenisation rules for several languages.
OPTIONS
-c configfile read settings from a file -d value set debug mode to 'value' -e value set input encoding. (default UTF8) -f disable filtering of special characters -L language Automatically selects a configuration file by language code. e.g. 'fr' will select the file tokconfig-fr from the installation directory -l Convert to all lowercase -u Convert to all uppercase -n Assume one sentence per line on input -m Emit one sentence per line on output --passthru Don't tokenize, but perform input decoding and simple token role detection -P Disable Paragraph Detection -Q Enable Quote Detection. (this is experimental and may lead to unexpected results) -S Disable Sentence Detection -s <string> Set End-of-sentence marker. (Default <utt>) -V Show version information -v set Verbose mode -x <DocId> Output FoLiA XML, use the specified Document ID. (this disables usage of most other options: -nulPQvsS) -F Read a FoLiA XML document, tokenize it, and output the modified doc. (this disables usage of most other options: -nulPQvsS)
BUGS
likely
AUTHORS
Maarten van Gompel proycon@anaproy.nl Ko van der Sloot Timbl@uvt.nl 2011 november 28 ucto(1)
Related Man Pages
dimbl(1) - debian
frog(1) - debian
mbt(1) - debian
timblserver(1) - debian
timblclient(1) - debian
Similar Topics in the Unix Linux Community
Routine Fraud Detection Fingered Spitzer
How to match all array contents and display all highest matched sentences in perl?
Network Worm Detection using Markov's and Cantelli's Inequalities
counting number of sentence
Identifying a sentence and putting it on a new line