Linux and UNIX Man Pages

Linux & Unix Commands - Search Man Pages

ucto(1) [debian man page]

ucto(1) 						      General Commands Manual							   ucto(1)

NAME
ucto - Unicode Tokenizer SYNOPSYS
ucto [[options]] [input-file] [[output-file]] DESCRIPTION
ucto ucto tokenizes text files: it separates words from punctuation, splits sentences (and optionally paragraphs), and finds paired quotes. Ucto is preconfigured with tokenisation rules for several languages. OPTIONS
-c configfile read settings from a file -d value set debug mode to 'value' -e value set input encoding. (default UTF8) -f disable filtering of special characters -L language Automatically selects a configuration file by language code. e.g. 'fr' will select the file tokconfig-fr from the installation directory -l Convert to all lowercase -u Convert to all uppercase -n Assume one sentence per line on input -m Emit one sentence per line on output --passthru Don't tokenize, but perform input decoding and simple token role detection -P Disable Paragraph Detection -Q Enable Quote Detection. (this is experimental and may lead to unexpected results) -S Disable Sentence Detection -s <string> Set End-of-sentence marker. (Default <utt>) -V Show version information -v set Verbose mode -x <DocId> Output FoLiA XML, use the specified Document ID. (this disables usage of most other options: -nulPQvsS) -F Read a FoLiA XML document, tokenize it, and output the modified doc. (this disables usage of most other options: -nulPQvsS) BUGS
likely AUTHORS
Maarten van Gompel proycon@anaproy.nl Ko van der Sloot Timbl@uvt.nl 2011 november 28 ucto(1)

Check Out this Related Man Page

dimbl(1)						      General Commands Manual							  dimbl(1)

NAME
dimbl - Distributed Timbl SYNOPSYS
dimbl [timbl options] [-S <num of threads>] DESCRIPTION
dimbl extends timbl with the possibility to run the classification task on multiple threads. This is done by splitting up the Instancebase in parts that are handled in parallel. Every test instance is tested against each partial Instancebase. The results are merged and then the k Nearest Neighbours are calculated. NOTES
dimbl only works for the IB1 variants of timbl. Not all timbl options are fully supported. Documentation is lacking. OPTIONS
-S <threads> run the server on 'threads' parallel threads. all timbl options are documenented in timbl(1) dimbl handles most of them in the same way as timbl except for -i filename and -I filename -I will create 'threads' Instancebase files and store their names in 'filename' together with the name of the weighting file. -i will use such a file to read back in 'threads' Instancebases for a classifying task. The -S option is ignored in that case. dimbl will use the number of files found in 'filename'. BUGS
possibly AUTHORS
Ko van der Sloot Timbl@uvt.nl Antal van den Bosch Timbl@uvt.nl SEE ALSO
timbl(1) 2010 december 09 dimbl(1)
Man Page

8 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

grepping for a sentence

Can you grep for a sentence. I have to search logs everyday at work and I was wondering if I could search for a string of words instead of just one. for example, if I had to find this sentence: "Received HTTP message type" How would I grep it (2 Replies)
Discussion started by: eloquent99
2 Replies

2. Programming

hint on ansi c

I am a student. And need help on following program. I want to make a c program. I have to scan a sentence and I have to interchange a word from that sentence. Example: Scan the sentence is " Drilling machine and Milling machine " . Replace the word "machine" by "operation". And output should... (2 Replies)
Discussion started by: dhaval chevli
2 Replies

3. Shell Programming and Scripting

How to match all array contents and display all highest matched sentences in perl?

Hi, I have an array with 3 words in it and i have to match all the array contents and display the exact matched sentence i.e all 3 words should match with the sentence. Here are sentences. $arr1="Our data suggests that epithelial shape and growth control are unequally affected depending... (5 Replies)
Discussion started by: vanitham
5 Replies

4. Shell Programming and Scripting

counting number of sentence

Hi all I want to count total numbers of sentences separated by fullstop (.) in different files under a directory at one go. Any help is appreciated. (3 Replies)
Discussion started by: my_Perl
3 Replies

5. Shell Programming and Scripting

Trim the sentence containing colon and period to extract a word in between

Hello All , i am a newbie in korn shell scripting trying to trim a sentence that is parsed into a variable . The format of the sentence has three words that are separated from other by a " : " colon and "." period . Format of the sentence looks like ... (5 Replies)
Discussion started by: venu
5 Replies

6. Shell Programming and Scripting

shell scripting and programmming

hi , i was trying a small script in sed as my main motive is to apppnd a sentence before a line and after a line. e.g this is the sentence which is present in many files and i want to append 2 more sentences in this sentence of mine. tanvi is good girl the result should come out as ... (1 Reply)
Discussion started by: kullu
1 Replies

7. UNIX Desktop Questions & Answers

Normalizing files for sentence count

I have files with many different formats and breaks in odd places. now I want to normalize them to be able to count the sentence in each file 1: I want to count the sentences is they finish with ! . ? 2: but I don't want it to count if there is no space after the Full stop. e.g. S.O.L I have... (6 Replies)
Discussion started by: A-V
6 Replies

8. Shell Programming and Scripting

Identifying a sentence and putting it on a new line

I am revisiting the problem of sentence splitting. I have a Perl Script which splits a para into sentences, but acronyms and short forms create an issue #!/usr/bin/perl use feature qw/say/; use strict; use warnings; my $s; my @arr; while(<>) { chomp $_; $s .= $_ . " "; } @arr... (2 Replies)
Discussion started by: gimley
2 Replies