dictzip(1) opensolaris man page

DICTZIP(1)																DICTZIP(1)

NAME
       dictzip, dictunzip - compress (or expand) files, allowing random access

SYNOPSIS
       dictzip [options] name
       dictunzip [options] name

DESCRIPTION
       dictzip	compresses  files  using  the  gzip(1)	algorithm (LZ77) in a manner which is completely compatible with the gzip file format.	An
       extension to the gzip file format (Extra Field, described in 2.3.1.1 of RFC 1952) allows extra data to be stored in the header  of  a  com-
       pressed	file.	Programs like gzip and zcat will ignore this extra data.  However, dictd(8), the DICT protocol dictionary server will make
       use of this data to perform pseudo-random access on the file.  Files in the dictzip format should end in ".dz" so that they may be  distin-
       guished from common gzip files that do not contain the special header information.

       From RFC 1952, the extra field is specified as follows:

	      If  the  FLG.FEXTRA bit is set, an "extra field" is present in the header, with total length XLEN bytes.	It consists of a series of
	      subfields, each of the form:

	      +---+---+---+---+==================================+
	      |SI1|SI2|  LEN  |... LEN bytes of subfield data ...|
	      +---+---+---+---+==================================+

	      SI1 and SI2 provide a subfield ID, typically two ASCII letters with some mnemonic value.	Jean-Loup Gailly <gzip@prep.ai.mit.edu> is
	      maintaining a registry of subfield IDs; please send him any subfield ID you wish to use.	Subfield IDs with SI2 = 0 are reserved for
	      future use.

	      LEN gives the length of the subfield data, excluding the 4 initial bytes.

       The dictzip program uses 'R' for SI1, and 'A' for SI2 (i.e., "Random Access").  After the LEN field, the data is arranged as follows:

       +---+---+---+---+---+---+===============================+
       |  VER  | CHLEN | CHCNT |  ... CHCNT words of data ...  |
       +---+---+---+---+---+---+===============================+

       As per RFC 1952, all data is stored least-significant byte first.  For VER 1 of the data, all values are 16-bits long (2  bytes),  and  are
       unsigned integers.

       XLEN  (which  is  specified earlier in the header) is a two byte integer, so the extra field can be 0xffff bytes long, 2 bytes of which are
       used for the subfield ID (SI1 and SI1), and 2 bytes of which are used for the subfield length (LEN).   This  leaves  0xfffb  bytes  (0x7ffd
       2-byte  entries	or  0x3ffe  4-byte entries).  Given that the zip output buffer must be 10% + 12 bytes larger than the input buffer, we can
       store 58969 bytes per entry, or about 1.8GB if the 2-byte entries are used.  If this becomes a limiting factor, another format version  can
       be selected and defined for 4-byte entries.

       For  compression,  the  file  is divided up into "chunks" of data, each chunk is less than 64kB, and can be compressed into an area that is
       also less than 64kB long (taking incompressible data into account -- usually the data is compressed into a block that is much smaller  than
       the  original).	 The CHLEN field specifies the length of a "chunk" of data.  The CHCNT field specifies how many chunks are preset, and the
       CHCNT words of data specifies how long each chunk is after compression (i.e., in the current compressed file).

       To perform random access on the data, the offset and length of the data are provided to library routines.   These  routines  determine  the
       chunk in which the desired data begins, and decompresses that chunk.  Consecutive chunks are decompressed as necessary.

TRADEOFFS
       Speed  True  random  file  access  is not realized, since any access, even for a single byte, requires that a 64kB chunk be read and decom-
	      pressed.	This is slower than accessing a flat text file, but is much, much faster than performing serial access	on  a  fully  com-
	      pressed file.

       Space  For  the textual dictionary databases we are working with, the use of 64kB chunks and maximal LZ77 compression realizes a file which
	      is only about 4% larger than the same file compressed all at once.

OPTIONS
       -d or --decompress
	      Decompress.  This is the default if the executable is called dictunzip.

       -c or --stdout
	      Write output on standard output; keep original files unchanged.  This is only available when decompressing  (because  parts  of  the
	      header must be updated after a write when compressing).

       -f or --force
	      Force compression or decompression even if the output file already exists.

       -h or --help
	      Display help.

       -k or --keep
	      Do not delete the original file.

       -l or --list
	      For each compressed file, list the following fields:

		  type: dzip, gzip, or text (includes files in unknown formats)
		  crc: CRC checksum
		  date and time: from header
		  chunks: number of chunks in file
		  size: size of each uncompressed chunk
		  compr.: compressed size
		  uncompr.: uncompressed size
		  ratio: compression ratio (0.0% if unknown)
		  name: name of uncompressed file

	      Unlike gzip, the compression method is not detected.

       -L or --license
	      Display the dictzip license and quit.

       -t or --test
	      Check the compressed file integrity.  This option is not implemented.  Instead, it will list the header information.

       -v or --verbose
	      Verbose. Display extra information during compression.

       -V or --version
	      Version. Display the version number and compilation options then quit.

       -s start or --start start
	      Specify the offer to start decompression, using decimal numbers.	The default is at the beginning of the file.

       -e size or --size size
	      Specify the size of the portion of the file to decompress, using decimal numbers.  The default is the whole file.

       -S start or --Start start
	      Specify the offer to start decompression, using base64 numbers.  The default is at the beginning of the file.

       -E size or --Size start
	      Specify the size of the portion of the file to decompress, using base64 numbers.	The default is the whole file.

       -p prefilter or --pre prefilter
	      Specify  a  shell command to execute as a filter before compression or decompression of a chunk.	The pre- and post-compression fil-
	      ters can be used to provide additional compression or output formatting.	The filters may not  increase  the  buffer  size  signifi-
	      cantly.  The pre- and post-compression filters were designed to provide the most general interface possible.

       -P postfilter or --post postfilter
	      Specify a shell command to execute as a filter after compression or decompression.

CREDITS
       dictzip	was  written by Rik Faith (faith@cs.unc.edu) and is distributed under the terms of the GNU General Public License.  If you need to
       distribute under other terms, write to the author.

       The main libraries used by this programs (zlib, regex, libmaa) are distributed under different terms,  so  you  may  be	able  to  use  the
       libraries  for  applications which are incompatible with the GPL -- please see the copyright notices and license information that come with
       the libraries for more information, and consult with your attorney to resolve these issues.

SEE ALSO
       dict(1), dictd(8), gzip(1), gunzip(1), zcat(1)

								    22 Jun 1997 							DICTZIP(1)
dictzip(1) opensolaris man page | unix.com