debian hocr2djvused man page on unix.com

HOCR2DJVUSED(1) 						hocr2djvused manual						   HOCR2DJVUSED(1)

NAME
       hocr2djvused - hOCR to djvused script converter

SYNOPSIS
       hocr2djvused [option...]

DESCRIPTION
       hocr2djvused reads a hOCR[1] file (as produced by OCRopus[2] or Cuneiform[3] or Tesseract[4]) from the standard input and converts it to a
       djvused script.

OPTIONS
   Text segmentation options
       -t lines, --details lines
	   Record location of every line. Don't record locations of particular words or characters.

       -t words, --details=words
	   Record location of every line and every word. Don't record locations of particular characters.

	   This is the default.

       -t chars, --details=chars
	   Record location of every line, every word and every character.

       --word-segmentation=simple
	   Consider each non-empty sequence of non-whitespace characters a single word.

	   This is the default, despite being linguistically incorrect.

       --word-segmentation=uax29
	   Use the Unicode Text Segmentation[5] algorithm to break lines into words.

	   This options break assumptions of some DjVu tools that words are separated by spaces, and therefore is it not recommended.

   Other options
       --rotation=n
	   Assume that DjVu pages are rotated by n degrees.

       --page-size=widthxheight
	   Specifies that page size is width pixels x height pixels.

	   This option is required for hOCR generated by Cuneiform (< 0.8) and superfluous otherwise.

       --html5
	   Use a HTML5 parser[6], which is more robust but slower than the default parser.

       --version
	   Output version information and exit.

       -h, --help
	   Display help and exit.

SEE ALSO
       ocrodjvu(1), djvused(1)

AUTHOR
       Jakub Wilk <jwilk@jwilk.net>
	   Author.

NOTES
	1. hOCR
	   http://docs.google.com/View?docid=dfxcv4vc_67g844kf

	2. OCRopus
	   http://ocropus.googlecode.com/

	3. Cuneiform
	   http://launchpad.net/cuneiform-linux

	4. Tesseract
	   http://tesseract-ocr.googlecode.com/

	5. Unicode Text Segmentation
	   http://unicode.org/reports/tr29/

	6. HTML5 parser
	   http://www.whatwg.org/specs/web-apps/current-work/#html-parser

hocr2djvused 0.7.9						    03/10/2012							   HOCR2DJVUSED(1)
debian man page for hocr2djvused