Linux and UNIX Man Pages

Linux & Unix Commands - Search Man Pages

ucto(1) [debian man page]

ucto(1) 						      General Commands Manual							   ucto(1)

NAME
ucto - Unicode Tokenizer SYNOPSYS
ucto [[options]] [input-file] [[output-file]] DESCRIPTION
ucto ucto tokenizes text files: it separates words from punctuation, splits sentences (and optionally paragraphs), and finds paired quotes. Ucto is preconfigured with tokenisation rules for several languages. OPTIONS
-c configfile read settings from a file -d value set debug mode to 'value' -e value set input encoding. (default UTF8) -f disable filtering of special characters -L language Automatically selects a configuration file by language code. e.g. 'fr' will select the file tokconfig-fr from the installation directory -l Convert to all lowercase -u Convert to all uppercase -n Assume one sentence per line on input -m Emit one sentence per line on output --passthru Don't tokenize, but perform input decoding and simple token role detection -P Disable Paragraph Detection -Q Enable Quote Detection. (this is experimental and may lead to unexpected results) -S Disable Sentence Detection -s <string> Set End-of-sentence marker. (Default <utt>) -V Show version information -v set Verbose mode -x <DocId> Output FoLiA XML, use the specified Document ID. (this disables usage of most other options: -nulPQvsS) -F Read a FoLiA XML document, tokenize it, and output the modified doc. (this disables usage of most other options: -nulPQvsS) BUGS
likely AUTHORS
Maarten van Gompel proycon@anaproy.nl Ko van der Sloot Timbl@uvt.nl 2011 november 28 ucto(1)

Check Out this Related Man Page

timblserver(1)						      General Commands Manual						    timblserver(1)

NAME
timblserver - Tilburg Memory Based Learner Server SYNOPSYS
timblserver [TiMBL options] [Server options] timblserver --config=configfile [--pidfile=pfile] [--logfile=lfile] [--daemonize=val] DESCRIPTION
timblserver extends simbl with a server layer. It provides the possibility to access one timbl instance from multiple sessions. It also allows running and accessing different timbl instances in parallel. OPTIONS
The server options are --config=file read server settings from file --pidfile=file store the pid of the main server process in file --logfile=file log server actions to file --daemonize=[yes|no] run the server as a daemon. Default is yes. -S <port> run the server on 'port' (deprecated) -C <num> set maximum number of parallel connections to 'num' (deprecated) all timbl options are documented in timbl(1) BUGS
possibly AUTHORS
Ko van der Sloot Timbl@uvt.nl Antal van den Bosch Timbl@uvt.nl SEE ALSO
timbl(1) 2011 march 21 timblserver(1)
Man Page

8 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

grepping for a sentence

Can you grep for a sentence. I have to search logs everyday at work and I was wondering if I could search for a string of words instead of just one. for example, if I had to find this sentence: "Received HTTP message type" How would I grep it (2 Replies)
Discussion started by: eloquent99
2 Replies

2. Programming

hint on ansi c

I am a student. And need help on following program. I want to make a c program. I have to scan a sentence and I have to interchange a word from that sentence. Example: Scan the sentence is " Drilling machine and Milling machine " . Replace the word "machine" by "operation". And output should... (2 Replies)
Discussion started by: dhaval chevli
2 Replies

3. Shell Programming and Scripting

How to match all array contents and display all highest matched sentences in perl?

Hi, I have an array with 3 words in it and i have to match all the array contents and display the exact matched sentence i.e all 3 words should match with the sentence. Here are sentences. $arr1="Our data suggests that epithelial shape and growth control are unequally affected depending... (5 Replies)
Discussion started by: vanitham
5 Replies

4. Shell Programming and Scripting

counting number of sentence

Hi all I want to count total numbers of sentences separated by fullstop (.) in different files under a directory at one go. Any help is appreciated. (3 Replies)
Discussion started by: my_Perl
3 Replies

5. Shell Programming and Scripting

Trim the sentence containing colon and period to extract a word in between

Hello All , i am a newbie in korn shell scripting trying to trim a sentence that is parsed into a variable . The format of the sentence has three words that are separated from other by a " : " colon and "." period . Format of the sentence looks like ... (5 Replies)
Discussion started by: venu
5 Replies

6. Shell Programming and Scripting

shell scripting and programmming

hi , i was trying a small script in sed as my main motive is to apppnd a sentence before a line and after a line. e.g this is the sentence which is present in many files and i want to append 2 more sentences in this sentence of mine. tanvi is good girl the result should come out as ... (1 Reply)
Discussion started by: kullu
1 Replies

7. UNIX Desktop Questions & Answers

Normalizing files for sentence count

I have files with many different formats and breaks in odd places. now I want to normalize them to be able to count the sentence in each file 1: I want to count the sentences is they finish with ! . ? 2: but I don't want it to count if there is no space after the Full stop. e.g. S.O.L I have... (6 Replies)
Discussion started by: A-V
6 Replies

8. Shell Programming and Scripting

Identifying a sentence and putting it on a new line

I am revisiting the problem of sentence splitting. I have a Perl Script which splits a para into sentences, but acronyms and short forms create an issue #!/usr/bin/perl use feature qw/say/; use strict; use warnings; my $s; my @arr; while(<>) { chomp $_; $s .= $_ . " "; } @arr... (2 Replies)
Discussion started by: gimley
2 Replies