Linux and UNIX Man Pages

Linux & Unix Commands - Search Man Pages

cntlist(5) [centos man page]

CNTLIST(5)						      WordNettm File Formats							CNTLIST(5)

NAME
cntlist - file listing number of times each tagged sense occurs in a semantic concordance, sorted most to least frequently tagged cntlist.rev - file listing number of times each tagged sense occurs in a semantic concordance, sorted by sense key DESCRIPTION
A cntlist file for a semantic concordance lists the number of times each semantically tagged sense occurs in the concordance and its sense number in the WordNet database. Each line in the file corresponds to a sense in the WordNet database to which at least one semantic tag points. Only senses that are tagged in a concordance are in the concordance's cntlist file. WordNet Database cntlist File In the WordNet database, words are assigned sense numbers based on frequency of use in semantically tagged corpora. The cntlist file used by grind(1) to build the WordNet database and assign the sense numbers is a union of the cntlist files from the various semantic concor- dances that were formerly released by Princeton University. This combined cntlist file is provided with the WordNet package and is found in the WNSEARCHDIR directory. The cntlist.rev file is used at run-time by the WordNet library code and browser interfaces to print in the output display the number of times each sense has been tagged. File Format Each line in a cntlist file contains information for one sense. The file is ordered from most to least frequently tagged sense. The fields are separated by one space, and each line is terminated with a newline character. Senses having the same tag_cnt value are listed in reverse alphabetical order of the lemma field of the sense_key. Each line in cntlist is of the form: tag_cnt sense_key sense_number where tag_cnt is the decimal number of times the sense is tagged in the corresponding semantic concordance. sense_key is a WordNet sense encoding and sense_number is a WordNet sense number as described in The cntlist.rev file contains the same fields described above, in the following order: sense_key sense_number tag_cnt NOTES
Princeton no longer maintains or releases the Semantic Concordance files. The cntlist file used to order the senses in WordNet 3.0 was generated from the Semantic Concordance files at the point that they were last updated in 2001. In general, the order of senses presented usually reflects what the user would expect, however sense ordering is now less reliable than in prior releases and should not be construed as an accurate indicator of frequency of use. ENVIRONMENT VARIABLES (UNIX) WNHOME Base directory for WordNet. Default is /usr/local/WordNet-3.0. WNSEARCHDIR Directory in which the WordNet database has been installed. Default is WNHOME/dict. REGISTRY (WINDOWS) HKEY_LOCAL_MACHINESOFTWAREWordNet3.0WNHome Base directory for WordNet. Default is C:Program FilesWordNet3.0. HKEY_CURRENT_USERSOFTWAREWordNet3.0wnres User's default browser options. FILES
cntlist, cntlist.rev file of combined semantic concordance cntlist files. Used to assign sense numbers in WordNet database SEE ALSO
grind(1), wnintro(5), senseidx(5). WordNet 3.0 Dec 2006 CNTLIST(5)

Check Out this Related Man Page

GRIND(1)						      WordNettm User Commands							  GRIND(1)

NAME
grind - process WordNet lexicographer files SYNOPSIS
grind [ -v ] [ -s ] [ -Llogfile ] [ -a ] [ -d ] [ -i ] [ -o ] [ -n ] filename [ filename... ] DESCRIPTION
grind() processes WordNet lexicographer files, producing database files suitable for use with the WordNet search and interface code and other applications. The syntactic and structural integrity of the input files is verified. Warnings and errors are reported via stderr and a run-time log is produced on stdout. A database is generated only if there are no errors. Input Files Input files correspond to the syntactic categories implemented in WordNet - noun, verb, adjective and adverb. Each input lexicographer file consists of a list of synonym sets (synsets) for one part of speech. Although the basic synset syntax is the same for all of the parts of speech, some parts of the syntax only apply to a particular part of speech. See wninput(5WN) for a description of the input file format. Each filename specified is of the form: pathname/pos.suffix where pathname is optional and pos is either noun, verb, adj or adv. suffix may be used to separate groups of synsets into different files, for example noun.animal and noun.plant. One or more input files, in any combination of syntactic categories, may be specified. See lexnames(5WN) for a list of the lexicographer files used to build the complete WordNet database. Output Files grind() produces the following output files: +------------+----------------------------------------+ | Filename | Description | +------------+----------------------------------------+ |index.pos | Index file for each syntactic category | |data.pos | Data file for each syntactic category | |index.sense | Sense index | +------------+----------------------------------------+ See wndb(5WN) for a description of the database file formats. Each time grind() is run, any existing database files are overwritten with the database files generated from the specified input files. If no input files from a syntactic category are specified, the corresponding database files are not overwritten. Sense Numbers Senses are generally ordered from most to least frequently used, with the most common sense numbered 1. Frequency of use is determined by the number of times a sense is tagged in the various semantic concordance texts. Senses that are not semantically tagged follow the ordered senses in an arbitrary order. Note that this ordering is only an estimate based on usage in a small corpus. The tagsense_cnt field for each entry in the index.pos files indicates how many of the senses in the list have been tagged. The cntlist file provided with the database lists the number of times each sense is tagged in the semantic concordances. grind() uses the data from cntlist to order the senses of each word. When the index.pos files are generated, the synset_offsets are output in sense number order, with sense 1 first in the list. Senses with the same number of semantic tags are assigned unique but consecutive sense numbers. The WordNet OVERVIEW search displays all senses of the specified word, in all syntactic categories, and indicates which of the senses are represented in the semantically tagged texts. OPTIONS
-v Verify integrity of input without generating database. -s Suppress generation of warning messages. Usually grind is run with this option until all syntactic and structural errors are corrected since the warning messages may make it difficult to spot error messages. -Llogfile Write all messages to logfile instead of stderr. -a Generate statistical report on input files processed. -d Generate distribution of senses by string length report on input files processed. -i Generate sense index file. -o Order senses using cntlist. -n Generate nominalization (derivational morphology) links in database. filename Input file of the form described in Input Files. FILES
pos.* lexicographer files to use to build database cntlist file of combined semantic concordance cntlist files. Used to assign sense numbers in WordNet database SEE ALSO
cntlist(5WN), lexnames(5WN), senseidx(5WN), wndb(5WN), wninput(5WN), uniqbeg(7WN), wngloss(7WN). DIAGNOSTICS
Exit status is normally 0. Exit status is -1 if non-specific error occurs. If syntactic or structural errors exist, exit status is number of errors detected. usage: grind [-v] [-s] [-Llogfile] [-a ] [-d] [-i] [-o] [-n] filename [filename...] Invalid options were specified on the command line. No input files processed. None of the filenames specified were of the appropriate form. n syntactic errors found. Syntax errors were found while parsing the input files. n structural errors found. Pointer errors were found that could not be automatically corrected. BUGS
Please report bugs to wordnet@princeton.edu. WordNet 3.0 Dec 2006 GRIND(1)
Man Page