10-25-2009
Conpressed, Direct Child Info, Word Tracking, Lexicon Data Structure, ADTDAWG?
Hello,
Back in late August 2009, I decided to start working on a modification of the traditional Directed Acyclic Word Graph data structure.
End Of Word Nodes did not match up with single words, and Child Information had to be discovered through list scrolling. These were a heavy price to pay for a small data structure.
I made a series of observations that led me to the ADTDAWG.
- The Adamovsky Direct Tracking Directed Acyclic Word Graph -
I documented the project with full disclosure on the web page below...
Adamovsky Direct Tracking Directed Acyclic Word Graph - An Advanced Lexicon Data Structure
The construct has proved its value in a series of rigorous tests that I designed for it.
Do you have any comments about the structure?
All the Best,
HeavyJ
9 More Discussions You Might Find Interesting
1. Programming
Hello, guys
Anyone had experiences to express polynomial using c language. I want to output the polynomial formula after I solve the question. Not to count the value of a polynomial.
That means I have to output the polynomial formula to screen.
such as:
f :=... (0 Replies)
Discussion started by: xli3
0 Replies
2. Shell Programming and Scripting
Hello,
I have a file of the following information ( first field parent item, second field child item)
PM01 PM02
PM01 PM1A
PM02 PM03
PM03 PM04
PM03 PM05
PM03 PM06
PM05 PM10
PM1A PM2A
PM2A PM3B
PM2A PM3C
The output should be like this :
PM01 PM02 PM03 PM04
... (2 Replies)
Discussion started by: ThobiasVakayil
2 Replies
3. Shell Programming and Scripting
File having data in following format :
file name : file.txt
--------------------
111111;name1
222222;name2
333333;name3
I want to read this file so that I can split these into two paramaters i.e. 111111 & name1 into two different variables(say value1 & value2).
i.e val1=11111 &... (2 Replies)
Discussion started by: sjoshi98
2 Replies
4. Shell Programming and Scripting
Hi All,
I want to create a data structure like this
$VAR1 = {
'testsuite' => {
'DHCP' => {
'failures' => '0',
'errors' => '0',
'time' =>... (3 Replies)
Discussion started by: Damon_Qu
3 Replies
5. Programming
Hello,
Over the past few years, I've conducted some rather thorough R&D in the field of lexicon-data-structure optimization.
A Trie is a good place to start, followed by a traditional DAWG.
Smaller means faster, but a traditional DAWG encoding operates as a Boolean-graph, unable to index... (1 Reply)
Discussion started by: HeavyJ
1 Replies
6. Shell Programming and Scripting
I am working with an undocumented feature of a software product (BladeLogic). It is returning the below string in response to a query. It is enclosed with square brackets, "records" are separated with commas and "fields" separated with semicolons. My thought was that this might be some basic... (1 Reply)
Discussion started by: dshcs
1 Replies
7. Shell Programming and Scripting
Input file:
bv|111259484|pir||T49736_real_data
bv|159484|pir||T9736_data_figure
bv|113584|prf|T4736|truth
bv|113584|pir||T4736_truth
Desired output:
bv|111259484|pir|T49736|real_data
bv|159484|pir|T9736|data_figure
bv|113584|prf|T4736|truth
bv|113584|pir|T4736|truth
Once the... (8 Replies)
Discussion started by: perl_beginner
8 Replies
8. Shell Programming and Scripting
Hi guys,
been scratching round the forums and my mountain of resources.
Maybe I havn't read deep enough
My question is not how sed edits a stream and outputs it to a file, rather something like this below:
I have a .txt with some text in it :rolleyes:
abc:123:xyz
123:abc:987... (7 Replies)
Discussion started by: the0nion
7 Replies
9. UNIX for Advanced & Expert Users
Hello All,
I am using Linux. I have two scripts:
inner_script.ksh
main_wrapper_calling_inner.ksh
Below is the code snippet of the main_wrapper_calling_inner.ksh:
#!/bin/ksh
ppids=() ---> Main array for process ids.
fppids=() ---> array to capture failed process ids.
pcnt=0 --->... (5 Replies)
Discussion started by: dmukherjee
5 Replies
LEARN ABOUT DEBIAN
wordlist2dawg
WORDLIST2DAWG(1) WORDLIST2DAWG(1)
NAME
wordlist2dawg - convert a wordlist to a DAWG for Tesseract
SYNOPSIS
wordlist2dawg WORDLIST DAWG lang.unicharset
wordlist2dawg -t WORDLIST DAWG lang.unicharset
wordlist2dawg -r 1 WORDLIST DAWG lang.unicharset
wordlist2dawg -r 2 WORDLIST DAWG lang.unicharset
wordlist2dawg -l <short> <long> WORDLIST DAWG lang.unicharset
DESCRIPTION
wordlist2dawg(1) converts a wordlist to a Directed Acyclic Word Graph (DAWG) for use with Tesseract. A DAWG is a compressed, space and time
efficient representation of a word list.
OPTIONS
-t Verify that a given dawg file is equivalent to a given wordlist.
-r 1 Reverse a word if it contains an RTL character.
-r 2 Reverse all words.
-l <short> <long> Produce a file with several dawgs in it, one each for words of length <short>, <short+1>,... <long>
ARGUMENTS
WORDLIST A plain text file in UTF-8, one word per line.
DAWG The output DAWG to write.
lang.unicharset The unicharset of the language. This is the unicharset generated by mftraining(1).
SEE ALSO
tesseract(1), combine_tessdata(1), dawg2wordlist(1)
http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3
COPYING
Copyright (C) 2006 Google, Inc. Licensed under the Apache License, Version 2.0
AUTHOR
The Tesseract OCR engine was written by Ray Smith and his research groups at Hewlett Packard (1985-1995) and Google (2006-present).
02/09/2012 WORDLIST2DAWG(1)