Linguistic project: extract co-occurrences from text corpus

6 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Alignment tool to join text files in 2 directories to create a parallel corpus

I have two directories called English and Hindi. Each directory contains the same number of files with the only difference being that in the case of the English Directory the tag is .english and in the Hindi one the tag is .Hindi The file may contain either a single text or more than one text...

2. Shell Programming and Scripting

Remove duplicate occurrences of text pattern

Hi folks! I have a file which contains a 1000 lines. On each line i have multiple occurrences ( 26 to be exact ) of pattern folder#/folder#. # is depicting the line number in the file some text here folder1/folder1 some text here folder1/folder1 some text here folder1/folder1 some text...

3. Shell Programming and Scripting

Grepping verbal forms from a large corpus

I want to extract verbal forms from a large corpus of English. I have identified a certain number of patterns. Each pattern has the following structure SPACE word_CATEGORY where word refers to the verbal form and CATEGORY refers to the class of the verb The categories are identified as per the...

4. Shell Programming and Scripting

Creating Frequency of words from a file by accessing a corpus

Hello, I have a large file of syllables /strings in Urdu. Each word is on a separate line. Example in English: be at for if being attract I need to identify the frequency of each of these strings from a large corpus (which I cannot attach unfortunately because of size limitations) and...

5. Shell Programming and Scripting

Text Substitution Project

History: large open source PHP project, school management program. Comprises about 200 scripts. Had another developer for awhile, and he wanted a version in German, so he edited all the scripts and replaced text that would show up in the browser with variables (i.e. instead of "Click Here",...

6. Programming

c program to extract text between two delimiters from some text file

needa c program to extract text between two delimiters from some text file. and then storing them in to diffrent variables ? text file like 0: abc.txt ========= aaaaaa|11111111|sssssssssss|333333|ddddddddd|34343454564|asass aaaaaa|11111111|sssssssssss|333333|ddddddddd|34343454564|asass...

LEARN ABOUT DEBIAN

mbtg

mbtg(1) 						      General Commands Manual							   mbtg(1)

NAME

       MBTG - Memory Based Tagger generator

SYNOPSYS

       mbtg -T <filename> -s <setting filename>

       or

       mbtg [options]

DESCRIPTION

       This programs generates, based on a tagged corpus, all the files needed to be able to tag a text with mbt.

OPTIONS

       -h or --help
	      show help

       -T <tagged training corpus file>

       or

       -E <enriched tagged training corpus file>

       All further options have reasonable defaults, so using them is only needed for the experienced user. See the mbt manual for more details.

       -s settingsfile
	      mbtg creates this file, which can be used to run mbt with minimal effort. (like mbt -s settings -T somefile)

       -p pattern
	      the pattern for known words (default ddfa)

       -P pattern
	      the pattern for unknown words (default dFapsss)

       -% <number>
	      filter threshold for ambitag construction (default 5%)

       -l <lexiconfile>

       -L <file with list of frequent words>

       -r <ambitagfile>

       -k <known words case base>

       -u <unknown words case base>

       -K <known words instances file>

       -U <unknown words instances file>

       -V or --version
	      show version info

       -e <sentence delimiter> (default '<utt>')

       -X
	      keep the intermediate files

       -Otimbl options
	       (Note: there is NO SPACE between O and the options)
		<options>   classifier options for both known and unknown words instances bases
		K: <options>   classifier options for known words instance base
		U: <options>   classifier options for unknown words case base
		valid timbl options are: a d k m q v w x -

BUGS

       possibly

AUTHORS

       Ko van der Sloot Timbl@uvt.nl

       Antal van den Bosch Timbl@uvt.nl

SEE ALSO

       timbl(1) mbt(1) mbtserver(1)

								   2011 march 21							   mbtg(1)

Shell Programming and Scripting