bsd dbz man page on unix.com

DBZ(3)							     Library Functions Manual							    DBZ(3)

NAME
       dbzinit,  dbzfresh,  dbzagain, dbzclose, dbzexists, dbzfetch, dbzstore, dbzsync, dbzsize, dbzgetoptions, dbzsetoptions, dbzdebug - database
       routines

SYNOPSIS
       #include <inn/dbz.h>

       bool dbzinit(const char *base)

       bool dbzclose(void)

       bool dbzfresh(const char *base, long size)

       bool dbzagain(const char *base, const char *oldbase)

       bool dbzexists(const HASH key)

       off_t dbzfetch(const HASH key)
       bool dbzfetch(const HASH key, void *ivalue)

       DBZSTORE_RESULT dbzstore(const HASH key, off_t offset)
       DBZSTORE_RESULT dbzstore(const HASH key, void *ivalue)

       bool dbzsync(void)

       long dbzsize(long nentries)

       void dbzgetoptions(dbzoptions *opt)

       void dbzsetoptions(const dbzoptions opt)

DESCRIPTION
       These functions provide an indexing system for rapid random access to a text file (the base file).

       Dbz stores offsets into the base text file for rapid retrieval.	All retrievals are keyed on a hash value that is generated by the HashMes-
       sageID() function.

       Dbzinit	opens  a  database, an index into the base file base, consisting of files base.dir , base.index , and base.hash which must already
       exist.  (If the database is new, they should be zero-length files.)  Subsequent accesses go to that database until dbzclose  is	called	to
       close the database.

       Dbzfetch  searches  the database for the specified key, returning the corresponding value if any, if <--enable-tagged-hash at configure> is
       specified.  If <--enable-tagged-hash at configure> is not specified, it returns true and content of ivalue is set.  Dbzstore stores the key
       -  value  pair  in the database, if <--enable-tagged-hash at configure> is specified.  If <--enable-tagged-hash at configure> is not speci-
       fied, it stores the content of ivalue.  Dbzstore will fail unless the database files are writable.  Dbzexists will verify  whether  or  not
       the given hash exists or not.  Dbz is optimized for this operation and it may be significantly faster than dbzfetch().

       Dbzfresh is a variant of dbzinit for creating a new database with more control over details.

       Dbzfresh's  size parameter specifies the size of the first hash table within the database, in key-value pairs.  Performance will be best if
       the number of key-value pairs stored in the database does not exceed about 2/3 of size.	(The dbzsize function, given the  expected  number
       of  key-value  pairs,  will  suggest a database size that meets these criteria.)  Assuming that an fseek offset is 4 bytes, the .index file
       will be 4 * size bytes.	The .hash file will be DBZ_INTERNAL_HASH_SIZE * size bytes (the .dir file is tiny and roughly  constant  in  size)
       until  the  number of key-value pairs exceeds about 80% of size.  (Nothing awful will happen if the database grows beyond 100% of size, but
       accesses will slow down quite a bit and the .index and .hash files will grow somewhat.)

       Dbz stores up to DBZ_INTERNAL_HASH_SIZE bytes of the message-id's hash in the .hash file to confirm a hit.  This  eliminates  the  need	to
       read the base file to handle collisions.  This replaces the tagmask feature in previous dbz releases.

       A  size	of  ``0'' given to dbzfresh is synonymous with the local default; the normal default is suitable for tables of 5,000,000 key-value
       pairs.  Calling dbzinit(name) with the empty name is equivalent to calling dbzfresh(name, 0).

       When databases are regenerated periodically, as in news, it is simplest to pick the parameters for a new database based	on  the  old  one.
       This  also permits some memory of past sizes of the old database, so that a new database size can be chosen to cover expected fluctuations.
       Dbzagain is a variant of dbzinit for creating a new database as a new generation of an old database.  The database files for  oldbase  must
       exist.	Dbzagain is equivalent to calling dbzfresh with a size equal to the result of applying dbzsize to the largest number of entries in
       the oldbase database and its previous 10 generations.

       When many accesses are being done by the same program, dbz is massively faster if its first hash table is in memory.  If the ``pag_incore''
       flag  is  set  to INCORE_MEM, an attempt is made to read the table in when the database is opened, and dbzclose writes it out to disk again
       (if it was read successfully and has been modified).  Dbzsetoptions can be used to set the pag_incore and exists_incore flag to	new  value
       which  should be ``INCORE_NO'', ``INCORE_MEM'', or ``INCORE_MMAP'' for the .hash and .index files separately; this does not affect the sta-
       tus of a database that has already been opened.	The default is ``INCORE_NO'' for the .index file and ``INCORE_MMAP'' for the  .hash  file.
       The  attempt  to  read the table in may fail due to memory shortage; in this case dbz fails with an error.  Stores to an in-memory database
       are not (in general) written out to the file until dbzclose or dbzsync, so if robustness in the presence of crashes or concurrent  accesses
       is crucial, in-memory databases should probably be avoided or the writethrough option should be set to ``true'';

       If  the	nonblock  option is ``true'', then writes to the .hash and .index files will be done using non-blocking I/O.  This can be signifi-
       cantly faster if your platform supports non-blocking I/O with files.

       Dbzsync causes all buffers etc. to be flushed out to the files.	It is typically  used  as  a  precaution  against  crashes  or	concurrent
       accesses when a dbz-using process will be running for a long time.  It is a somewhat expensive operation, especially for an in-memory data-
       base.

       Concurrent reading of databases is fairly safe, but there is no (inter)locking, so concurrent updating is not.

       An open database occupies three stdio streams and two file descriptors; Memory consumption is negligible (except for stdio buffers)  except
       for in-memory databases.

SEE ALSO
       dbm(3), history(5), libinn(3)

DIAGNOSTICS
       Functions  returning  bool values return ``true'' for success, ``false'' for failure.  Functions returning off_t values return a value with
       -1 for failure.	Dbzinit attempts to have errno set plausibly on return, but otherwise this is not  guaranteed.	 An  errno  of	EDOM  from
       dbzinit indicates that the database did not appear to be in dbz format.

       If DBZTEST is defined at compile-time then a main() function will be included.  This will do performance tests and integrity test.

HISTORY
       The  original dbz was written by Jon Zeeff (zeeff@b-tech.ann-arbor.mi.us).  Later contributions by David Butler and Mark Moraes.  Extensive
       reworking, including this documentation, by Henry Spencer (henry@zoo.toronto.edu) as part of the C News project.  MD5  code  borrowed  from
       RSA.  Extensive reworking to remove backwards compatibility and to add hashes into dbz files by Clayton O'Neill (coneill@oneill.net)

BUGS
       Unlike dbm, dbz will refuse to dbzstore with a key already in the database.  The user is responsible for avoiding this.

       The RFC5322 case mapper implements only a first approximation to the hideously-complex RFC5322 case rules.

       Dbz no longer tries to be call-compatible with dbm in any way.

								    6 Sep 1997								    DBZ(3)
bsd man page for dbz