Linux and UNIX Man Pages

Linux & Unix Commands - Search Man Pages

hocr2djvused(1) [debian man page]

HOCR2DJVUSED(1) 						hocr2djvused manual						   HOCR2DJVUSED(1)

NAME
hocr2djvused - hOCR to djvused script converter SYNOPSIS
hocr2djvused [option...] DESCRIPTION
hocr2djvused reads a hOCR[1] file (as produced by OCRopus[2] or Cuneiform[3] or Tesseract[4]) from the standard input and converts it to a djvused script. OPTIONS
Text segmentation options -t lines, --details lines Record location of every line. Don't record locations of particular words or characters. -t words, --details=words Record location of every line and every word. Don't record locations of particular characters. This is the default. -t chars, --details=chars Record location of every line, every word and every character. --word-segmentation=simple Consider each non-empty sequence of non-whitespace characters a single word. This is the default, despite being linguistically incorrect. --word-segmentation=uax29 Use the Unicode Text Segmentation[5] algorithm to break lines into words. This options break assumptions of some DjVu tools that words are separated by spaces, and therefore is it not recommended. Other options --rotation=n Assume that DjVu pages are rotated by n degrees. --page-size=widthxheight Specifies that page size is width pixels x height pixels. This option is required for hOCR generated by Cuneiform (< 0.8) and superfluous otherwise. --html5 Use a HTML5 parser[6], which is more robust but slower than the default parser. --version Output version information and exit. -h, --help Display help and exit. SEE ALSO
ocrodjvu(1), djvused(1) AUTHOR
Jakub Wilk <jwilk@jwilk.net> Author. NOTES
1. hOCR http://docs.google.com/View?docid=dfxcv4vc_67g844kf 2. OCRopus http://ocropus.googlecode.com/ 3. Cuneiform http://launchpad.net/cuneiform-linux 4. Tesseract http://tesseract-ocr.googlecode.com/ 5. Unicode Text Segmentation http://unicode.org/reports/tr29/ 6. HTML5 parser http://www.whatwg.org/specs/web-apps/current-work/#html-parser hocr2djvused 0.7.9 03/10/2012 HOCR2DJVUSED(1)

Check Out this Related Man Page

HOCR2PDF(1)							 ExactImage Manual						       HOCR2PDF(1)

NAME
hocr2pdf - hOCR to PDF converter of the ExactImage toolkit SYNOPSIS
hocr2pdf [option...] {-i | --input} input-file {-o | --output} output-file hocr2pdf {-h | --help} DESCRIPTION
ExactImage is a fast C++ image processing library. Unlike many other library frameworks it allows operation in several color spaces and bit depths natively, resulting in low memory and computational requirements. hocr2pdf creates well layouted, searchable PDF files from hOCR (annotated HTML) input obtained from an OCR system. OPTIONS
-i file, --input file Read image from the specified file. Note that input hOCR is read from the standard input. -o file, --output file Save output PDF to the specified file. -n, --no-image Don't place the image over the text. By default the text layer is hidden behind the image. -s, --sloppy-text Sloppily place text, group words, do not draw single glyphs. -r n, --resolution n Override resolution of the input image to n dpi. The default resolution (if not specified in the input file) is 300 dpi. -h, --help Display help text and exit. EXAMPLE
$ hocr2pdf -i scan.tiff -o test.pdf < cuneiform-out.hocr SEE ALSO
exactimage(7) AUTHORS
Jakub Wilk <jwilk@debian.org> Wrote this manual page for the Debian system. http://www.exactcode.de/site/open_source/exactimage/ This manual page incorporates texts found on the ExactImage homepage. COPYRIGHT
This manual page was written for the Debian system (and may be used by others). Permission is granted to copy, distribute and/or modify this document under the terms of the GNU General Public License, Version 2 or (at your option) any later version published by the Free Software Foundation. On Debian systems, the complete text of the GNU General Public License can be found in /usr/share/common-licenses/GPL-2. hocr2pdf 09/09/2013 HOCR2PDF(1)
Man Page