Linux and UNIX Man Pages

Linux & Unix Commands - Search Man Pages

wordlist2dawg(1) [debian man page]

WORDLIST2DAWG(1)														  WORDLIST2DAWG(1)

NAME
wordlist2dawg - convert a wordlist to a DAWG for Tesseract SYNOPSIS
wordlist2dawg WORDLIST DAWG lang.unicharset wordlist2dawg -t WORDLIST DAWG lang.unicharset wordlist2dawg -r 1 WORDLIST DAWG lang.unicharset wordlist2dawg -r 2 WORDLIST DAWG lang.unicharset wordlist2dawg -l <short> <long> WORDLIST DAWG lang.unicharset DESCRIPTION
wordlist2dawg(1) converts a wordlist to a Directed Acyclic Word Graph (DAWG) for use with Tesseract. A DAWG is a compressed, space and time efficient representation of a word list. OPTIONS
-t Verify that a given dawg file is equivalent to a given wordlist. -r 1 Reverse a word if it contains an RTL character. -r 2 Reverse all words. -l <short> <long> Produce a file with several dawgs in it, one each for words of length <short>, <short+1>,... <long> ARGUMENTS
WORDLIST A plain text file in UTF-8, one word per line. DAWG The output DAWG to write. lang.unicharset The unicharset of the language. This is the unicharset generated by mftraining(1). SEE ALSO
tesseract(1), combine_tessdata(1), dawg2wordlist(1) http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3 COPYING
Copyright (C) 2006 Google, Inc. Licensed under the Apache License, Version 2.0 AUTHOR
The Tesseract OCR engine was written by Ray Smith and his research groups at Hewlett Packard (1985-1995) and Google (2006-present). 02/09/2012 WORDLIST2DAWG(1)

Check Out this Related Man Page

DAWG2WORDLIST(1)														  DAWG2WORDLIST(1)

NAME
dawg2wordlist - convert a Tesseract DAWG to a wordlist SYNOPSIS
dawg2wordlist UNICHARSET DAWG WORDLIST DESCRIPTION
dawg2wordlist(1) converts a Tesseract Directed Acyclic Word Graph (DAWG) to a list of words using a unicharset as key. OPTIONS
UNICHARSET The unicharset of the language. This is the unicharset generated by mftraining(1). DAWG The input DAWG, created by wordlist2dawg(1) WORDLIST Plain text (output) file in UTF-8, one word per line SEE ALSO
tesseract(1), mftraining(1), wordlist2dawg(1), unicharset(5), combine_tessdata(1) http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3 COPYING
Copyright (C) 2012 Google, Inc. Licensed under the Apache License, Version 2.0 AUTHOR
The Tesseract OCR engine was written by Ray Smith and his research groups at Hewlett Packard (1985-1995) and Google (2006-present). 02/09/2012 DAWG2WORDLIST(1)
Man Page

4 More Discussions You Might Find Interesting

1. Programming

Conpressed, Direct Child Info, Word Tracking, Lexicon Data Structure, ADTDAWG?

Hello, Back in late August 2009, I decided to start working on a modification of the traditional Directed Acyclic Word Graph data structure. End Of Word Nodes did not match up with single words, and Child Information had to be discovered through list scrolling. These were a heavy price to... (0 Replies)
Discussion started by: HeavyJ
0 Replies

2. Programming

TWL06 Lexicon DAWG Engine

Hello UNIX, I wrote a Java Web-Start application based on my C code for the Directed Acyclic Word Graph or DAWG. It is primarily an effective and convenient tool for training to be an expert Scrabble player. Beyond that, it should be more accessible than my low-level C code. It works when I... (0 Replies)
Discussion started by: HeavyJ
0 Replies

3. Programming

The World's Most Advanced Lexicon-Data-Structure

Hello, Over the past few years, I've conducted some rather thorough R&D in the field of lexicon-data-structure optimization. A Trie is a good place to start, followed by a traditional DAWG. Smaller means faster, but a traditional DAWG encoding operates as a Boolean-graph, unable to index... (1 Reply)
Discussion started by: HeavyJ
1 Replies

4. UNIX for Dummies Questions & Answers

UTF-8 in xterm

I need to use sort, uniq, grep, wc,... and the like to work with lists of words in UTF-8 (the "words" being phonetic transcriptions using the IPA). I have been using Google a lot and I even found at least one previous post on this topic, but it didn't help. I tried following the instructions... (2 Replies)
Discussion started by: mregine
2 Replies