Linux and UNIX Man Pages

Linux & Unix Commands - Search Man Pages

apertium-preprocess-corpus-lextor(1) [debian man page]

apertium-preprocess-corpus-lextor(1)									      apertium-preprocess-corpus-lextor(1)

NAME
apertium-preprocess-corpus-lextor - This application is part of ( apertium ) This tool is part of the apertium machine translation architecture: http://apertium.org. SYNOPSIS
apertium-preprocess-corpus-lextor data_dir translation_dir input_file output_file DESCRIPTION
apertium-preprocess-corpus-lextor is the application responsible for preprocessing the training corpus for the lexical selector training. OPTIONS
This tool currently has no options. FILES
These are the kinds of files and directories used with this tool: data_dir the path to the linguistic data to use. translation_dir the translation direction to use. input_file contains a large corpus in raw format. output_file The file which gets the preprocessed corpus. SEE ALSO
apertium-gen-lextorbil(1), apertium-gen-lextormono(1), apertium-gen-lextor-eval(1), apertium-gen-stopwords-lextor(1), aper- tium-gen-wlist-lextor(1), apertium-gen-wlist-lextor-translation(1), apertium-lextor(1). BUGS
Lots of...lurking in the dark and waiting for you! AUTHOR
(c) 2005,2006 Universitat d'Alacant / Universidad de Alicante. All rights reserved. 2006-12-12 apertium-preprocess-corpus-lextor(1)

Check Out this Related Man Page

apertium-gen-wlist-lextor-translation(1)								  apertium-gen-wlist-lextor-translation(1)

NAME
apertium-gen-wlist-lextor-translation - This application is part of ( apertium ) This tool is part of the apertium machine translation architecture: http://apertium.org. SYNOPSIS
apertium-gen-wlist-lextor-translation --mono|-m dic.bin --bil|-b bildic.bin --wlist|-w wlistfile DESCRIPTION
apertium-gen-wlist-lextor-translation is the application responsible for generating all the possible translations of polysemous words. OPTIONS
--mono|-m dic.bin Specifies the monolingual lexical selection dictionary to use (see apertium-gen-lextormono). --bil|-b bildic.bin Specifies the bilingual lexical selection ditionary to use (see apertium-gen-lextorbil). --wlist|-w wlistfile Specifies the list of words to translate (see apertium-gen-wlist-lextor). --help|-h Shows a brief usage help. --version|-v Shows the version string of this tool and it's license. FILES
This tool uses no files apart from the ones associated to each option. SEE ALSO
apertium-gen-lextorbil(1), apertium-preprocess-corpus-lextor(1), apertium-gen-stopwords-lextor(1), apertium-gen-wlist-lextor(1), aper- tium-gen-lextormono(1), apertium-lextor-eval(1), apertium-lextor(1). BUGS
Lots of...lurking in the dark and waiting for you! AUTHOR
(c) 2005,2006 Universitat d'Alacant / Universidad de Alicante. All rights reserved. 2006-12-12 apertium-gen-wlist-lextor-translation(1)
Man Page

10 More Discussions You Might Find Interesting

1. Programming

Client and Server program gen by Makefile

I created a "ebanking.x" file and run it as " rpcgen -a ebaning.x" It gen a few of files to me which is - "ebanking.h", "ebanking_server.c", "ebanking_svc.c", "ebanking_client.c", "ebanking_clnt.c", "ebanking_xdr.c" and "Makefile" The content of "ebanking.x" : struct bankargs { ... (0 Replies)
Discussion started by: wongalan48
0 Replies

2. Shell Programming and Scripting

Problem with tr command

Hi, I have getting problem with tr command, here is my problem I Have file CntryCd_100302.wk When i do like this it works fine dt=`date +%y%m%d` data_dir=/usr/apps/mp/script/UMTS/CRE2/data tr "(){}\"'<>;&@" " " < CntryCd_100302.wk > CntryCd_100302.temp But... (14 Replies)
Discussion started by: raghavendra.cse
14 Replies

3. Programming

make fails with "undefined reference to..."

i am compiling a program called vasp on suse and get the following error. there are many more preprocess and ifort commands prior so i just grabbed the tail of the log file: ./preprocess <main.F | /usr/bin/cpp -P -C -traditional >main.f90 -DMPI -DHOST=\"LinuxIFC\" -DIFC -Dkind8 -DNGZhalf... (6 Replies)
Discussion started by: crimso
6 Replies

4. Shell Programming and Scripting

Remove duplicate files

Hi, In a directory, e.g. ~/corpus is a lot of files and subdirectories. Some of the files are named: 12345___PP___0902___AA.txt 12346___PP___0902___AA. txt 12347___PP___0902___AA. txt The amount of files varies. I need to keep the highest (12347___PP___0902___AA. txt) and remove... (5 Replies)
Discussion started by: corfuitl
5 Replies

5. Shell Programming and Scripting

Linguistic project: extract co-occurrences from text corpus

Hello guys, I've got a big corpus (a huge text file in which words are separated by one or several spaces). I would like to know if there is a simple way - using awk for instance - to extract any co-occurrence appearing at least 3times through the whole corpus for a given word. By co-occurrence,... (7 Replies)
Discussion started by: bobylapointe
7 Replies

6. What is on Your Mind?

Do You Own a Kindle?

The Kindle 3 is making headlines as Amazon's #1 bestseller. According to Amazon: Do you own a Kindle? What do you think about them? (43 Replies)
Discussion started by: Neo
43 Replies

7. Shell Programming and Scripting

Creating Frequency of words from a file by accessing a corpus

Hello, I have a large file of syllables /strings in Urdu. Each word is on a separate line. Example in English: be at for if being attract I need to identify the frequency of each of these strings from a large corpus (which I cannot attach unfortunately because of size limitations) and... (7 Replies)
Discussion started by: gimley
7 Replies

8. UNIX for Dummies Questions & Answers

Replacing stopwords based on a list

Dear all, I have Files with lines of text in them, I want to replace the stopwords in them with ",". I have create a file which contain the stopwords... I have been trying for last 3 hours but no success I have managed to replace one using "sed" and delete the line containing them using... (3 Replies)
Discussion started by: A-V
3 Replies

9. Shell Programming and Scripting

Grepping verbal forms from a large corpus

I want to extract verbal forms from a large corpus of English. I have identified a certain number of patterns. Each pattern has the following structure SPACE word_CATEGORY where word refers to the verbal form and CATEGORY refers to the class of the verb The categories are identified as per the... (4 Replies)
Discussion started by: gimley
4 Replies

10. Shell Programming and Scripting

Alignment tool to join text files in 2 directories to create a parallel corpus

I have two directories called English and Hindi. Each directory contains the same number of files with the only difference being that in the case of the English Directory the tag is .english and in the Hindi one the tag is .Hindi The file may contain either a single text or more than one text... (7 Replies)
Discussion started by: gimley
7 Replies