Sponsored Content
Top Forums Programming Conpressed, Direct Child Info, Word Tracking, Lexicon Data Structure, ADTDAWG? Post 302364979 by HeavyJ on Saturday 24th of October 2009 11:32:40 PM
Old 10-25-2009
Conpressed, Direct Child Info, Word Tracking, Lexicon Data Structure, ADTDAWG?

Hello,

Back in late August 2009, I decided to start working on a modification of the traditional Directed Acyclic Word Graph data structure.

End Of Word Nodes did not match up with single words, and Child Information had to be discovered through list scrolling. These were a heavy price to pay for a small data structure.

I made a series of observations that led me to the ADTDAWG.

- The Adamovsky Direct Tracking Directed Acyclic Word Graph -

I documented the project with full disclosure on the web page below...

Adamovsky Direct Tracking Directed Acyclic Word Graph - An Advanced Lexicon Data Structure

The construct has proved its value in a series of rigorous tests that I designed for it.

Do you have any comments about the structure?

All the Best,

HeavyJ
 

9 More Discussions You Might Find Interesting

1. Programming

what data structure for polinomial

Hello, guys Anyone had experiences to express polynomial using c language. I want to output the polynomial formula after I solve the question. Not to count the value of a polynomial. That means I have to output the polynomial formula to screen. such as: f :=... (0 Replies)
Discussion started by: xli3
0 Replies

2. Shell Programming and Scripting

tree structure of the data

Hello, I have a file of the following information ( first field parent item, second field child item) PM01 PM02 PM01 PM1A PM02 PM03 PM03 PM04 PM03 PM05 PM03 PM06 PM05 PM10 PM1A PM2A PM2A PM3B PM2A PM3C The output should be like this : PM01 PM02 PM03 PM04 ... (2 Replies)
Discussion started by: ThobiasVakayil
2 Replies

3. Shell Programming and Scripting

To read data word by word from given file & storing in variables

File having data in following format : file name : file.txt -------------------- 111111;name1 222222;name2 333333;name3 I want to read this file so that I can split these into two paramaters i.e. 111111 & name1 into two different variables(say value1 & value2). i.e val1=11111 &... (2 Replies)
Discussion started by: sjoshi98
2 Replies

4. Shell Programming and Scripting

perl data structure

Hi All, I want to create a data structure like this $VAR1 = { 'testsuite' => { 'DHCP' => { 'failures' => '0', 'errors' => '0', 'time' =>... (3 Replies)
Discussion started by: Damon_Qu
3 Replies

5. Programming

The World's Most Advanced Lexicon-Data-Structure

Hello, Over the past few years, I've conducted some rather thorough R&D in the field of lexicon-data-structure optimization. A Trie is a good place to start, followed by a traditional DAWG. Smaller means faster, but a traditional DAWG encoding operates as a Boolean-graph, unable to index... (1 Reply)
Discussion started by: HeavyJ
1 Replies

6. Shell Programming and Scripting

Do you recognize this data structure?

I am working with an undocumented feature of a software product (BladeLogic). It is returning the below string in response to a query. It is enclosed with square brackets, "records" are separated with commas and "fields" separated with semicolons. My thought was that this might be some basic... (1 Reply)
Discussion started by: dshcs
1 Replies

7. Shell Programming and Scripting

Help with reformat data structure

Input file: bv|111259484|pir||T49736_real_data bv|159484|pir||T9736_data_figure bv|113584|prf|T4736|truth bv|113584|pir||T4736_truth Desired output: bv|111259484|pir|T49736|real_data bv|159484|pir|T9736|data_figure bv|113584|prf|T4736|truth bv|113584|pir|T4736|truth Once the... (8 Replies)
Discussion started by: perl_beginner
8 Replies

8. Shell Programming and Scripting

Manipulating sed Direct Input to Direct Output

Hi guys, been scratching round the forums and my mountain of resources. Maybe I havn't read deep enough My question is not how sed edits a stream and outputs it to a file, rather something like this below: I have a .txt with some text in it :rolleyes: abc:123:xyz 123:abc:987... (7 Replies)
Discussion started by: the0nion
7 Replies

9. UNIX for Advanced & Expert Users

Issue with tracking successful completion of Child process running in background

Hello All, I am using Linux. I have two scripts: inner_script.ksh main_wrapper_calling_inner.ksh Below is the code snippet of the main_wrapper_calling_inner.ksh: #!/bin/ksh ppids=() ---> Main array for process ids. fppids=() ---> array to capture failed process ids. pcnt=0 --->... (5 Replies)
Discussion started by: dmukherjee
5 Replies
WORDLIST2DAWG(1)														  WORDLIST2DAWG(1)

NAME
wordlist2dawg - convert a wordlist to a DAWG for Tesseract SYNOPSIS
wordlist2dawg WORDLIST DAWG lang.unicharset wordlist2dawg -t WORDLIST DAWG lang.unicharset wordlist2dawg -r 1 WORDLIST DAWG lang.unicharset wordlist2dawg -r 2 WORDLIST DAWG lang.unicharset wordlist2dawg -l <short> <long> WORDLIST DAWG lang.unicharset DESCRIPTION
wordlist2dawg(1) converts a wordlist to a Directed Acyclic Word Graph (DAWG) for use with Tesseract. A DAWG is a compressed, space and time efficient representation of a word list. OPTIONS
-t Verify that a given dawg file is equivalent to a given wordlist. -r 1 Reverse a word if it contains an RTL character. -r 2 Reverse all words. -l <short> <long> Produce a file with several dawgs in it, one each for words of length <short>, <short+1>,... <long> ARGUMENTS
WORDLIST A plain text file in UTF-8, one word per line. DAWG The output DAWG to write. lang.unicharset The unicharset of the language. This is the unicharset generated by mftraining(1). SEE ALSO
tesseract(1), combine_tessdata(1), dawg2wordlist(1) http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3 COPYING
Copyright (C) 2006 Google, Inc. Licensed under the Apache License, Version 2.0 AUTHOR
The Tesseract OCR engine was written by Ray Smith and his research groups at Hewlett Packard (1985-1995) and Google (2006-present). 02/09/2012 WORDLIST2DAWG(1)
All times are GMT -4. The time now is 05:49 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy