Query: hocr2djvused
OS: debian
Section: 1
Format: Original Unix Latex Style Formatted with HTML and a Horizontal Scroll Bar
HOCR2DJVUSED(1) hocr2djvused manual HOCR2DJVUSED(1)NAMEhocr2djvused - hOCR to djvused script converterSYNOPSIShocr2djvused [option...]DESCRIPTIONhocr2djvused reads a hOCR[1] file (as produced by OCRopus[2] or Cuneiform[3] or Tesseract[4]) from the standard input and converts it to a djvused script.OPTIONSText segmentation options -t lines, --details lines Record location of every line. Don't record locations of particular words or characters. -t words, --details=words Record location of every line and every word. Don't record locations of particular characters. This is the default. -t chars, --details=chars Record location of every line, every word and every character. --word-segmentation=simple Consider each non-empty sequence of non-whitespace characters a single word. This is the default, despite being linguistically incorrect. --word-segmentation=uax29 Use the Unicode Text Segmentation[5] algorithm to break lines into words. This options break assumptions of some DjVu tools that words are separated by spaces, and therefore is it not recommended. Other options --rotation=n Assume that DjVu pages are rotated by n degrees. --page-size=widthxheight Specifies that page size is width pixels x height pixels. This option is required for hOCR generated by Cuneiform (< 0.8) and superfluous otherwise. --html5 Use a HTML5 parser[6], which is more robust but slower than the default parser. --version Output version information and exit. -h, --help Display help and exit.SEE ALSOocrodjvu(1), djvused(1)AUTHORJakub Wilk <jwilk@jwilk.net> Author.NOTES1. hOCR http://docs.google.com/View?docid=dfxcv4vc_67g844kf 2. OCRopus http://ocropus.googlecode.com/ 3. Cuneiform http://launchpad.net/cuneiform-linux 4. Tesseract http://tesseract-ocr.googlecode.com/ 5. Unicode Text Segmentation http://unicode.org/reports/tr29/ 6. HTML5 parser http://www.whatwg.org/specs/web-apps/current-work/#html-parser hocr2djvused 0.7.9 03/10/2012 HOCR2DJVUSED(1)
Related Man Pages |
---|
djvutxt(1) - debian |
hocr2djvused(1) - debian |
ocrodjvu(1) - debian |
tesseract(1) - debian |
djvutoxml(1) - suse |
Similar Topics in the Unix Linux Community |
---|
is /. superfluous? why not just say / ? |
Retaining spaces between words |
GUI to Cuneiform? |
segmentation problem with file |
counting lines that match pattern |