Linguistic project: extract co-occurrences from text corpus

6 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Alignment tool to join text files in 2 directories to create a parallel corpus

I have two directories called English and Hindi. Each directory contains the same number of files with the only difference being that in the case of the English Directory the tag is .english and in the Hindi one the tag is .Hindi The file may contain either a single text or more than one text...

2. Shell Programming and Scripting

Remove duplicate occurrences of text pattern

Hi folks! I have a file which contains a 1000 lines. On each line i have multiple occurrences ( 26 to be exact ) of pattern folder#/folder#. # is depicting the line number in the file some text here folder1/folder1 some text here folder1/folder1 some text here folder1/folder1 some text...

3. Shell Programming and Scripting

Grepping verbal forms from a large corpus

I want to extract verbal forms from a large corpus of English. I have identified a certain number of patterns. Each pattern has the following structure SPACE word_CATEGORY where word refers to the verbal form and CATEGORY refers to the class of the verb The categories are identified as per the...

4. Shell Programming and Scripting

Creating Frequency of words from a file by accessing a corpus

Hello, I have a large file of syllables /strings in Urdu. Each word is on a separate line. Example in English: be at for if being attract I need to identify the frequency of each of these strings from a large corpus (which I cannot attach unfortunately because of size limitations) and...

5. Shell Programming and Scripting

Text Substitution Project

History: large open source PHP project, school management program. Comprises about 200 scripts. Had another developer for awhile, and he wanted a version in German, so he edited all the scripts and replaced text that would show up in the browser with variables (i.e. instead of "Click Here",...

6. Programming

c program to extract text between two delimiters from some text file

needa c program to extract text between two delimiters from some text file. and then storing them in to diffrent variables ? text file like 0: abc.txt ========= aaaaaa|11111111|sssssssssss|333333|ddddddddd|34343454564|asass aaaaaa|11111111|sssssssssss|333333|ddddddddd|34343454564|asass...

LEARN ABOUT DEBIAN

apertium-tagger

apertium-tagger(1)														apertium-tagger(1)

NAME

       apertium-tagger - This application is part of  ( apertium )

       This tool is part of the apertium open-source machine translation architecture: http://www.apertium.org.

SYNOPSIS

       apertium-tagger --train|-t {n} DIC CRP TSX PROB [--debug|-d]

       apertium-tagger --supervised|-s {n} DIC CRP TSX PROB HTAG UNTAG [--debug|-d]

       apertium-tagger --retrain|-r {n} CRP PROB [--debug|-d]

       apertium-tagger --tagger|-g [--first|-f] PROB [--debug|-d] [INPUT [OUTPUT]]

DESCRIPTION

       apertium-tagger	is  the  application  responsible  for	the  apertium  part-of-speech tagger training or tagging, depending on the calling
       options.  This command only reads from the standard input if the option --tagger or -g is used.

OPTIONS

       -t {n}, --train {n}
	      Initializes parameters through the Kupiec's method (unsupervised), then performs n iterations of the Baum-Welch  training  algorithm
	      (unsupervised).

       -s {n}, --supervised {n}
	      Initializes parameters against a hand-tagged text (supervised) through the maximum likelihood estimate method, then performs n iter-
	      ations of the Baum-Welch training algorithm (unsupervised)

       -r {n}, --retrain {n}
	      Retrains the model with n additional Baum-Welch iterations (unsupervised).

       -g, --tagger
	      Tags input text by means of Viterbi algorithm.

       -p, --show-superficial
	      Prints the superficial form of the word along side the lexical form in the output stream.

       -f, --first
	      Used if conjuntion with -g (--tagger) makes the tagger to give all lexical forms of each word, being the choosen one  in	the  first
	      place (after the lemma)

       -d, --debug
	      Print error (if any) or debug messages while operating.

       -m, --mark
	      Mark disambiguated words.

       -h, --help
	      Display a help message.

FILES

       These are the kinds of files used with each option:

       DIC Full expanded dictionary file

       CRP Training text corpus file

       TSX Tagger specification file, in XML format

       PROB Tagger data file, built in the training and used while tagging

       HTAG Hand-tagged text corpus

       UNTAG Untagged text corpus, morphological analysis of HTAG corpus to use both jointly with -s option

       INPUT Input file, stdin by default

       OUTPUT Output file, stdout by default

SEE ALSO

       lt-proc(1), lt-comp(1), lt-expand(1), apertium-translator(1), apertium(1).

BUGS

       Lots of...lurking in the dark and waiting for you!

AUTHOR

       Copyright  (c) 2005, 2006 Universitat d'Alacant / Universidad de Alicante.  This is free software.  You may redistribute copies of it under
       the terms of the GNU General Public License <http://www.gnu.org/licenses/gpl.html>.

								    2006-08-30							apertium-tagger(1)

Shell Programming and Scripting