The World's Most Advanced Lexicon-Data-Structure Post: 302512818

Sponsored Content

Top Forums Programming The World's Most Advanced Lexicon-Data-Structure Post 302512818 by DGPickett on Monday 11th of April 2011 04:44:23 PM

04-11-2011

Registered User

Some of the early NAT language packages for C used compression exploiting the null terminated string, finding short strings that were suffixes of other strings, so "1234" might be stored but "234", "34", "4" and "" were just offset pointers into "1234". While not that great for compressing long strings, it was great for sets with many short strings.

I was working on high performance container since a while back, and came up with a byte-tree, where the first byte was a lookup into an array of pointers, or similar structure, to quickly travers an invariant tree one byte of key at a time. Various alternate nodes dealt with compression, like a 'next-n-bytes-must-be' to swallow invariant areas in a key, or a truncated array of less than 256 cells, with a base and size, or a dumb list lookup leveraging strchr(), a string of random key letters, and a like-length array of pointers, or a N-copies-of for duplicates. The advantages: quick insert, sorted access, no rebalancing, quick access. Linear hash is cute, but if you are not sure of the data's key distribution, it is dicey to go all the way to one key per bucket, so how much linear search do you want?

DGPickett

View Public Profile for DGPickett

Find all posts by DGPickett

8 More Discussions You Might Find Interesting

1. Programming

what data structure for polinomial

Hello, guys Anyone had experiences to express polynomial using c language. I want to output the polynomial formula after I solve the question. Not to count the value of a polynomial. That means I have to output the polynomial formula to screen. such as: f :=...

2. News, Links, Events and Announcements

Mac OS X - Tiger - Meet the world�s most advanced operating system.

Tiger Unleased Advanced UNIX-Based Technology

3. Filesystems, Disks and Memory

inode data structure

the superblock has the offset for inode table. My question is 1) whether it starts relative to the start of the first cylinder group or is it relative to the start of filesystem??? 2)and also which entry corresponds to the root(/) inode?? is it second or third entry??? My questions are...

4. Shell Programming and Scripting

tree structure of the data

Hello, I have a file of the following information ( first field parent item, second field child item) PM01 PM02 PM01 PM1A PM02 PM03 PM03 PM04 PM03 PM05 PM03 PM06 PM05 PM10 PM1A PM2A PM2A PM3B PM2A PM3C The output should be like this : PM01 PM02 PM03 PM04 ...

5. Programming

Conpressed, Direct Child Info, Word Tracking, Lexicon Data Structure, ADTDAWG?

Hello, Back in late August 2009, I decided to start working on a modification of the traditional Directed Acyclic Word Graph data structure. End Of Word Nodes did not match up with single words, and Child Information had to be discovered through list scrolling. These were a heavy price to...

6. Shell Programming and Scripting

perl data structure

Hi All, I want to create a data structure like this $VAR1 = { 'testsuite' => { 'DHCP' => { 'failures' => '0', 'errors' => '0', 'time' =>...

7. Shell Programming and Scripting

Do you recognize this data structure?

I am working with an undocumented feature of a software product (BladeLogic). It is returning the below string in response to a query. It is enclosed with square brackets, "records" are separated with commas and "fields" separated with semicolons. My thought was that this might be some basic...

8. Shell Programming and Scripting

Help with reformat data structure

Input file: bv|111259484|pir||T49736_real_data bv|159484|pir||T9736_data_figure bv|113584|prf|T4736|truth bv|113584|pir||T4736_truth Desired output: bv|111259484|pir|T49736|real_data bv|159484|pir|T9736|data_figure bv|113584|prf|T4736|truth bv|113584|pir|T4736|truth Once the...

LEARN ABOUT ULTRIX

firstkey

dbm(3x) 																   dbm(3x)

Name
       dbminit, fetch, store, delete, firstkey, nextkey - data base subroutines

Syntax
       typedef struct {
	    char *dptr;
	    int dsize;
       } datum;

       dbminit(file)
       char *file;

       datum fetch(key)
       datum key;

       store(key, content)
       datum key, content;

       delete(key)
       datum key;

       datum firstkey()

       datum nextkey(key)
       datum key;

Description
       These  functions  maintain  key/content	pairs  in a data base.	The functions will handle very large (a billion blocks) databases and will
       access a keyed item in one or two file system accesses.	The functions are obtained with the loader option -ldbm.

       Keys and contents are described by the datum typedef.  A datum specifies a string of dsize bytes pointed  to  by  dptr.	 Arbitrary  binary
       data,  as  well	as normal ASCII strings, are allowed.  The data base is stored in two files.  One file is a directory containing a bit map
       and has `.dir' as its suffix.  The second file contains all data and has `.pag' as its suffix.

       Before a database can be accessed, it must be opened by At the time of this call, the files file.dir and file.pag must  exist.	(An  empty
       database is created by creating zero-length `.dir' and `.pag' files.)

       Once open, the data stored under a key is accessed by and data is placed under a key by A key (and its associated contents) is deleted by A
       linear pass through all keys in a database may be made, in an (apparently) random order, by use of and The will return the first key in the
       database.  With any key will return the next key in the database.  This code will traverse the data base:
       for (key = firstkey(); key.dptr != NULL; key = nextkey(key))

Restrictions
       The  four  times  its  actual content.  Older UNIX systems may create real file blocks for these holes when touched.  These files cannot be
       copied by normal means without filling in the holes.

       The dptr pointers returned by these subroutines point into static storage that is changed by subsequent calls.

       The sum of the sizes of a key/content pair must not exceed the internal block size (currently 1024 bytes).  Moreover all key/content  pairs
       that hash together must fit on a single block.  The will return an error in the event that a disk block fills with inseparable data.

       The does not physically reclaim file space, although it does make it available for reuse.

Return Values
       Routines  that return a datum indicate errors with a null(0) dptr.  All functions that return an int indicate errors with negative values.
       A zero return indicates a successful completion.

																	   dbm(3x)