Linux and UNIX Man Pages

Linux & Unix Commands - Search Man Pages

lt-proc(1) [debian man page]

.TH lt-proc 1 2006-03-23 "" "" lt-proc - This application is part
of the lexical processing modules and tools ( ) This tool is part
of     the    apertium	  machine    translation    architecture:
http://www.apertium.org.  [ ] fst_file [input_file [output_file]]
[  ]  fst_file	[input_file [output_file]] is the application re-
sponsible for providing the four lexical processing  functionali-
ties

o morphological analyser  ( option -a ) o lexical transfer  ( op-
tion -n ) o morphological generator  ( option -g ) o post-genera-
tor   ( option -p ) It accomplishes these tasks by reading binary
files containing a compact and efficient representation  of  dic-
tionaries  (a  class of finite-state transducers called augmented
letter transducers). These files are generated by lt-comp(1).  It
is  worth  to  mention	that some characters (`[', `]', `$', `^',
`/', `+') are special chars used for  format  and  encapsulation.
They should be escaped if they have to be used literally, for in-
stance: `['...`]' are ignored and the format  of  a  linefeed  is
`^...$'.   Tokenizes  the text in surface forms (lexical units as
they appear in texts) and delivers, for each surface form, one or
more lexical forms consisting of lemma, lexical category and mor-
phological inflection information. Tokenization is not	straight-
forward  due  to the existence, on the one hand, of contractions,
and, on the other hand, of multi-word lexical units. For contrac-
tions, the system reads in a single surface form and delivers the
corresponding sequence of lexical forms. Multi-word surface forms
are  analysed  in  a left-to-right, longest-match fashion. Multi-
word surface forms may be invariable (such as a multi-word prepo-
sition or conjunction) or inflected (for example, in es, ";echaban
de menos";, "they missed", is a form of the  imperfect  indicative
tense  of  the verb ";echar de menos", "to miss"). Limited support
for some kinds of discontinuous multi-word units is  also  avail-
able. Single-word surface forms analysis produces output like the
one in these examples:	"cantar" -> `^cantar/cantar<vblex><inf>$'
or		   `"daba"		  ->		    `^da-
ba/dar<vblex><pii><p1><sg>/dar<vblex><pii><p3><sg>$'.	Use   the
literal  case  of  the incoming characters Delivers a target-lan-
guage surface form for	each  target-language  lexical	form,  by
suitably  inflecting  it.  Morphological generation (like -g) but
without unknown word marks (asterisk `*').  Performs orthographi-
cal  operations  such  as  contractions  and apostrophations. The
post-generator is usually dormant (just copies the input  to  the
output)  until	a  special alarm symbol contained in some target-
language surface forms wakes it up to perform a particular string
transformation	if  necessary; then it goes back to sleep.  Input
processing is in orthoepikon (previously `sao') annotation system
format:  http://orthoepikon.sf.net.  Apply a transliteration dic-
tionary Display the version number.  Display this help.  The  in-
put compiled dictionary.  Lots of...lurking in the dark and wait-
ing for you!  (c) 2005,2006 Universitat d'Alacant  /  Universidad
de Alicante. All rights reserved.

Check Out this Related Man Page

apertium(1)															       apertium(1)

NAME
apertium - This application is part of ( apertium ) This tool is part of the apertium machine translation architecture: http://apertium.sf.net. SYNOPSIS
apertium [-d datadir] [-f format] [-u] [-a] {language-pair} [infile [outfile]] DESCRIPTION
apertium is the application that most people will be using as it simplifies the use of apertium/lt-toolbox tools for machine translation purposes. This tool tries to ease the use of lt-toolbox (which contains all the lexical processing modules and tools) and apertium (which contains the rest of the engine) by providing a unique front-end to the end-user. The different modules behind the apertium machine translation architecture are in order: o de-formatter: Separates the text to be translated from the format information. o morphological-analyser: Tokenizes the text in surface forms. o part-of-speech tagger: Chooses one surface forms among homographs. o lexical transfer module: Reads each source-language lexical form and delivers a corresponding target-language lexical form. o structural transfer module: Detects fixed-length patterns of lexical forms (chunks or phrases) needing special processing due to grammatical divergences between the two languages and performs the corresponding transformations. o morphological generator: Delivers a target-language surface form for each target-language lexical form, by suitably inflecting it. o post-generator: Performs orthographical operations such as contractions and apostrophations. o re-formatter: Restores the format information encapsulated by the de-formatter into the translated text and removes the encapsula- tion sequences used to protect certain characters in the source text. OPTIONS
-d datadir The directory holding the linguistic data. By default it will used the expected installation path. language-pair The language pair: LANG1-LANG2 (for instance es-ca or ca-es). -f format Specifies the format of the input and output files which can have these values: o txt (default value) Input and output files are in text format. o html Input and output files are in "html" format. This "html" is the one acceptd by the vast majority of web browsers. o rtf Input and output files are in "rtf" format. The accepted "rtf" is the one generated by Microsoft WordPad (C) and Microsoft Office (C) up to and including Office-97. -u Disable marking of unknown words with the '*' character. -a Enable marking of disambiguated words with the '=' character. FILES
These are the two files that can be used with this command: infile Input file (stdin by default). outfile Output file (stdout by default). SEE ALSO
lt-proc(1), lt-comp(1), lt-expand(1), apertium-tagger(1). BUGS
Lots of...lurking in the dark and waiting for you! AUTHOR
(c) 2005,2006 Universitat d'Alacant / Universidad de Alicante. All rights reserved. 2006-03-08 apertium(1)
Man Page