.TH lt-proc 1 2006-03-23 "" "" lt-proc - This application is part
of the lexical processing modules and tools ( ) This tool is part
of the apertium machine translation architecture:
http://www.apertium.org. [ ] fst_file [input_file [output_file]]
[ ] fst_file [input_file [output_file]] is the application re-
sponsible for providing the four lexical processing functionali-
ties
o morphological analyser ( option -a ) o lexical transfer ( op-
tion -n ) o morphological generator ( option -g ) o post-genera-
tor ( option -p ) It accomplishes these tasks by reading binary
files containing a compact and efficient representation of dic-
tionaries (a class of finite-state transducers called augmented
letter transducers). These files are generated by lt-comp(1). It
is worth to mention that some characters (`[', `]', `$', `^',
`/', `+') are special chars used for format and encapsulation.
They should be escaped if they have to be used literally, for in-
stance: `['...`]' are ignored and the format of a linefeed is
`^...$'. Tokenizes the text in surface forms (lexical units as
they appear in texts) and delivers, for each surface form, one or
more lexical forms consisting of lemma, lexical category and mor-
phological inflection information. Tokenization is not straight-
forward due to the existence, on the one hand, of contractions,
and, on the other hand, of multi-word lexical units. For contrac-
tions, the system reads in a single surface form and delivers the
corresponding sequence of lexical forms. Multi-word surface forms
are analysed in a left-to-right, longest-match fashion. Multi-
word surface forms may be invariable (such as a multi-word prepo-
sition or conjunction) or inflected (for example, in es, "echaban
de menos", "they missed", is a form of the imperfect indicative
tense of the verb "echar de menos", "to miss"). Limited support
for some kinds of discontinuous multi-word units is also avail-
able. Single-word surface forms analysis produces output like the
one in these examples: "cantar" -> `^cantar/cantar<vblex><inf>$'
or `"daba" -> `^da-
ba/dar<vblex><pii><p1><sg>/dar<vblex><pii><p3><sg>$'. Use the
literal case of the incoming characters Delivers a target-lan-
guage surface form for each target-language lexical form, by
suitably inflecting it. Morphological generation (like -g) but
without unknown word marks (asterisk `*'). Performs orthographi-
cal operations such as contractions and apostrophations. The
post-generator is usually dormant (just copies the input to the
output) until a special alarm symbol contained in some target-
language surface forms wakes it up to perform a particular string
transformation if necessary; then it goes back to sleep. Input
processing is in orthoepikon (previously `sao') annotation system
format: http://orthoepikon.sf.net. Apply a transliteration dic-
tionary Display the version number. Display this help. The in-
put compiled dictionary. Lots of...lurking in the dark and wait-
ing for you! (c) 2005,2006 Universitat d'Alacant / Universidad
de Alicante. All rights reserved.