Linux and UNIX Man Pages

Linux & Unix Commands - Search Man Pages

bzz(1) [debian man page]

BZZ(1)								   DjVuLibre-3.5							    BZZ(1)

NAME
bzz - DjVu general purpose compression utility. SYNOPSIS
Encoding: bzz -e[blocksize] inputfile outputfile Decoding: bzz -d inputfile outputfile DESCRIPTION
The first form of the command line (option -e) compresses the data from file inputfile and writes the compressed data into outputfile. The second form of the command line (option -d) decompressed file inputfile and writes the output to outputfile. OPTIONS
-d Decoding mode. -e[blocksize] Encoding mode. The optional argument blocksize specifies the size of the input file blocks processed by the Burrows-Wheeler trans- form expressed in kilobytes. The default block sizes is 2048 KB. The maximal block size is 4096 KB. Specifying a larger block size usually produces higher compression ratios and increases the memory requirements of both the encoder and decoder. It is use- less to specify a block size that is larger than the input file. ALGORITHMS
The Burrows-Wheeler transform is performed using a combination of the Karp-Miller-Rosenberg and the Bentley-Sedgewick algorithms. This is comparable to (Sadakane, DCC 98) with a slightly more flexible ranking scheme. Symbols are then ordered according to a running estimate of their occurrence frequencies. The symbol ranks are then coded using a simple fixed tree and the ZP binary adaptive coder (Bottou, DCC 98). The Burrows-Wheeler transform is also used in the well known compressor bzip2. The originality of bzz is the use of the ZP adaptive coder. The adaptation noise can cost up to 5 percent in file size, but this penalty is usually offset by the benefits of adaptation. PERFORMANCE
The following table shows comparative results (in bits per character) on the Canterbury Corpus ( http://corpus.canterbury.ac.nz ). The very good bzz performance on the spreadsheet file excl puts the weighted average ahead of much more sophisticated compressors such as fsmx. +-------------------------------------------------------------------------------------------------------------+ | Compression performance | | text fax csrc excl sprc tech poem html lisp man play Weighted Average | +-------------------------------------------------------------------------------------------------------------+ | compress 3.27 0.97 3.56 2.41 4.21 3.06 3.38 3.68 3.90 4.43 3.51 2.55 3.31 | | gzip -9 2.85 0.82 2.24 1.63 2.67 2.71 3.23 2.59 2.65 3.31 3.12 2.08 2.53 | | bzip2 -9 2.27 0.78 2.18 1.01 2.70 2.02 2.42 2.48 2.79 3.33 2.53 1.54 2.23 | | ppmd 2.31 0.99 2.11 1.08 2.68 2.19 2.48 2.38 2.43 3.00 2.53 1.65 2.20 | | fsmx 2.10 0.79 1.89 1.48 2.52 1.84 2.21 2.24 2.29 2.91 2.35 1.63 2.06 | | bzz 2.25 0.76 2.13 0.78 2.67 2.00 2.40 2.52 2.60 3.19 2.52 1.44 2.16 | +-------------------------------------------------------------------------------------------------------------+ Note that DjVu contributors have several entries in this table. Program compress was written some time ago by Joe Orost. Program ppmd is an improvement of the PPM-C method invented by Paul Howard. CREDITS
Program bzz was written by Leon Bottou <leonb@users.sourceforge.net> and was then improved by Andrei Erofeev <andrew_erofeev@yahoo.com>, Bill Riemers <docbill@sourceforge.net> and many others. SEE ALSO
djvu(1), compress(1), gzip(1), bzip2(1) DjVuLibre-3.5 10/11/2001 BZZ(1)

Check Out this Related Man Page

DJVUTXT(1)							   DjVuLibre-3.5							DJVUTXT(1)

NAME
djvutxt - Extract the hidden text from DjVu documents. SYNOPSIS
djvutxt [options] inputdjvufile [outputtxtfile] DESCRIPTION
Program djvutxt decodes the hidden text layer of a DjVu document inputdjvufile and prints it into file outputtxtfile or on the standard output. The hidden text layer is usually generated with the help of an optical character recognition software. Without options -detail and -escape, this program simply outputs the UTF-8 text. Option -detail cause the output of S-expressions describ- ing the text and its location. Option -escape uses C-style escape sequences to represent nonprintable non-ASCII characters. OPTIONS
--page=pagespec Specify which pages should be processed. When this option is not specified, the text of all pages of the documents is concatenated into the output file. The page specification pagespec contains one or more comma-separated page ranges. A page range is either a page number, or two page numbers separated by a dash. For instance, specification 1-10 outputs pages 1 to 10, and specification 1,3,99999-4 outputs pages 1 and 3, followed by all the document pages in reverse order up to page 4. --detail=keyword This options causes djvutxt to output S-expressions specifying the position of the text in the page. See the manual page djvused(1) for a description of the output format. Argument keyword specifies the maximum level of detail for which text location is reported. The recognized values are: page, column, region, para, line, word, and char. All other values are interpreted as char. --escape Output escape sequences of the form "ooo" for all non ASCII or non printable UTF-8 characters and for the backslash character. REMARKS
Use program djvused(1) for more control over the text layer. CREDITS
This program was initially written by Andrei Erofeev <andrew_erofeev@yahoo.com> and was then improved Bill Riemers <docbill@source- forge.net> and many others. It was then rewritten to use the ddjvuapi by Leon Bottou <leonb@sourceforge.net>. SEE ALSO
djvu(1), djvused(1) DjVuLibre-3.5 10/11/2001 DJVUTXT(1)
Man Page