ucto(1) General Commands Manual ucto(1)NAME
ucto - Unicode Tokenizer
SYNOPSYS
ucto [[options]] [input-file] [[output-file]]
DESCRIPTION
ucto ucto tokenizes text files: it separates words from punctuation, splits sentences (and optionally paragraphs), and finds paired quotes.
Ucto is preconfigured with tokenisation rules for several languages.
OPTIONS -c configfile
read settings from a file
-d value
set debug mode to 'value'
-e value
set input encoding. (default UTF8)
-f
disable filtering of special characters
-L language
Automatically selects a configuration file by language code. e.g. 'fr' will select the file tokconfig-fr from the installation
directory
-l
Convert to all lowercase
-u
Convert to all uppercase
-n
Assume one sentence per line on input
-m
Emit one sentence per line on output
--passthru
Don't tokenize, but perform input decoding and simple token role detection
-P
Disable Paragraph Detection
-Q
Enable Quote Detection. (this is experimental and may lead to unexpected results)
-S
Disable Sentence Detection
-s <string>
Set End-of-sentence marker. (Default <utt>)
-V
Show version information
-v
set Verbose mode
-x <DocId>
Output FoLiA XML, use the specified Document ID. (this disables usage of most other options: -nulPQvsS)
-F
Read a FoLiA XML document, tokenize it, and output the modified doc. (this disables usage of most other options: -nulPQvsS)
BUGS
likely
AUTHORS
Maarten van Gompel proycon@anaproy.nl
Ko van der Sloot Timbl@uvt.nl
2011 november 28 ucto(1)
Check Out this Related Man Page
timblserver(1) General Commands Manual timblserver(1)NAME
timblserver - Tilburg Memory Based Learner Server
SYNOPSYS
timblserver [TiMBL options] [Server options]
timblserver --config=configfile [--pidfile=pfile] [--logfile=lfile] [--daemonize=val]
DESCRIPTION
timblserver extends simbl with a server layer. It provides the possibility to access one timbl instance from multiple sessions. It also
allows running and accessing different timbl instances in parallel.
OPTIONS
The server options are
--config=file
read server settings from file
--pidfile=file
store the pid of the main server process in file
--logfile=file
log server actions to file
--daemonize=[yes|no]
run the server as a daemon. Default is yes.
-S <port>
run the server on 'port' (deprecated)
-C <num>
set maximum number of parallel connections to 'num' (deprecated)
all timbl options are documented in timbl(1)BUGS
possibly
AUTHORS
Ko van der Sloot Timbl@uvt.nl
Antal van den Bosch Timbl@uvt.nl
SEE ALSO timbl(1)
2011 march 21 timblserver(1)
Can you grep for a sentence. I have to search logs everyday at work and I was wondering if I could search for a string of words instead of just one.
for example, if I had to find this sentence:
"Received HTTP message type"
How would I grep it (2 Replies)
I am a student. And need help on following program. I want to make a c program.
I have to scan a sentence and I have to interchange a word from that sentence.
Example: Scan the sentence is " Drilling machine and Milling machine " . Replace the word "machine" by "operation". And output should... (2 Replies)
Hi,
I have an array with 3 words in it and i have to match all the array contents and display the exact matched sentence i.e all 3 words should match with the sentence.
Here are sentences.
$arr1="Our data suggests that epithelial shape and growth control are unequally affected depending... (5 Replies)
Hi all
I want to count total numbers of sentences separated by fullstop (.) in different files under a directory at one go. Any help is appreciated. (3 Replies)
Hello All ,
i am a newbie in korn shell scripting trying to trim a sentence that is parsed into a variable . The format of the sentence has three words that are separated from other by a
" : " colon and "." period . Format of the sentence looks like
... (5 Replies)
hi ,
i was trying a small script in sed as my main motive is to apppnd a sentence before a line and after a line.
e.g this is the sentence which is present in many files and i want to append 2 more sentences in this sentence of mine.
tanvi is good girl
the result should come out as
... (1 Reply)
I have files with many different formats and breaks in odd places. now I want to normalize them to be able to count the sentence in each file
1: I want to count the sentences is they finish with ! . ?
2: but I don't want it to count if there is no space after the Full stop. e.g. S.O.L
I have... (6 Replies)
I am revisiting the problem of sentence splitting. I have a Perl Script which splits a para into sentences, but acronyms and short forms create an issue
#!/usr/bin/perl
use feature qw/say/;
use strict;
use warnings;
my $s;
my @arr;
while(<>) {
chomp $_;
$s .= $_ . " ";
}
@arr... (2 Replies)