hocr2djvused(1) debian man page | unix.com

Man Page: hocr2djvused

Operating Environment: debian

Section: 1

HOCR2DJVUSED(1) 						hocr2djvused manual						   HOCR2DJVUSED(1)

NAME
hocr2djvused - hOCR to djvused script converter
SYNOPSIS
hocr2djvused [option...]
DESCRIPTION
hocr2djvused reads a hOCR[1] file (as produced by OCRopus[2] or Cuneiform[3] or Tesseract[4]) from the standard input and converts it to a djvused script.
OPTIONS
Text segmentation options -t lines, --details lines Record location of every line. Don't record locations of particular words or characters. -t words, --details=words Record location of every line and every word. Don't record locations of particular characters. This is the default. -t chars, --details=chars Record location of every line, every word and every character. --word-segmentation=simple Consider each non-empty sequence of non-whitespace characters a single word. This is the default, despite being linguistically incorrect. --word-segmentation=uax29 Use the Unicode Text Segmentation[5] algorithm to break lines into words. This options break assumptions of some DjVu tools that words are separated by spaces, and therefore is it not recommended. Other options --rotation=n Assume that DjVu pages are rotated by n degrees. --page-size=widthxheight Specifies that page size is width pixels x height pixels. This option is required for hOCR generated by Cuneiform (< 0.8) and superfluous otherwise. --html5 Use a HTML5 parser[6], which is more robust but slower than the default parser. --version Output version information and exit. -h, --help Display help and exit.
SEE ALSO
ocrodjvu(1), djvused(1)
AUTHOR
Jakub Wilk <jwilk@jwilk.net> Author.
NOTES
1. hOCR http://docs.google.com/View?docid=dfxcv4vc_67g844kf 2. OCRopus http://ocropus.googlecode.com/ 3. Cuneiform http://launchpad.net/cuneiform-linux 4. Tesseract http://tesseract-ocr.googlecode.com/ 5. Unicode Text Segmentation http://unicode.org/reports/tr29/ 6. HTML5 parser http://www.whatwg.org/specs/web-apps/current-work/#html-parser hocr2djvused 0.7.9 03/10/2012 HOCR2DJVUSED(1)
Related Man Pages
hocr2pdf(1) - debian
ocrodjvu(1) - debian
wc(1) - debian
wc(1) - centos
text::wrap(3pm) - redhat
Similar Topics in the Unix Linux Community
Retaining spaces between words
Aix xlc interesting SEGV on exit
GUI to Cuneiform?
counting lines that match pattern
Breaking lines which contains more than 50 characters in a file