hocr2djvused(1) [debian man page]

HOCR2DJVUSED(1) 						hocr2djvused manual						   HOCR2DJVUSED(1)

NAME

       hocr2djvused - hOCR to djvused script converter

SYNOPSIS

       hocr2djvused [option...]

DESCRIPTION

       hocr2djvused reads a hOCR[1] file (as produced by OCRopus[2] or Cuneiform[3] or Tesseract[4]) from the standard input and converts it to a
       djvused script.

OPTIONS

   Text segmentation options
       -t lines, --details lines
	   Record location of every line. Don't record locations of particular words or characters.

       -t words, --details=words
	   Record location of every line and every word. Don't record locations of particular characters.

	   This is the default.

       -t chars, --details=chars
	   Record location of every line, every word and every character.

       --word-segmentation=simple
	   Consider each non-empty sequence of non-whitespace characters a single word.

	   This is the default, despite being linguistically incorrect.

       --word-segmentation=uax29
	   Use the Unicode Text Segmentation[5] algorithm to break lines into words.

	   This options break assumptions of some DjVu tools that words are separated by spaces, and therefore is it not recommended.

   Other options
       --rotation=n
	   Assume that DjVu pages are rotated by n degrees.

       --page-size=widthxheight
	   Specifies that page size is width pixels x height pixels.

	   This option is required for hOCR generated by Cuneiform (< 0.8) and superfluous otherwise.

       --html5
	   Use a HTML5 parser[6], which is more robust but slower than the default parser.

       --version
	   Output version information and exit.

       -h, --help
	   Display help and exit.

SEE ALSO

       ocrodjvu(1), djvused(1)

AUTHOR

       Jakub Wilk <jwilk@jwilk.net>
	   Author.

NOTES

	1. hOCR
	   http://docs.google.com/View?docid=dfxcv4vc_67g844kf

	2. OCRopus
	   http://ocropus.googlecode.com/

	3. Cuneiform
	   http://launchpad.net/cuneiform-linux

	4. Tesseract
	   http://tesseract-ocr.googlecode.com/

	5. Unicode Text Segmentation
	   http://unicode.org/reports/tr29/

	6. HTML5 parser
	   http://www.whatwg.org/specs/web-apps/current-work/#html-parser

hocr2djvused 0.7.9						    03/10/2012							   HOCR2DJVUSED(1)

Check Out this Related Man Page

HOCR2PDF(1)							 ExactImage Manual						       HOCR2PDF(1)

NAME

       hocr2pdf - hOCR to PDF converter of the ExactImage toolkit

SYNOPSIS

       hocr2pdf [option...] {-i | --input} input-file  {-o | --output} output-file

       hocr2pdf {-h | --help}

DESCRIPTION

       ExactImage is a fast C++ image processing library. Unlike many other library frameworks it allows operation in several color spaces and bit
       depths natively, resulting in low memory and computational requirements.

       hocr2pdf creates well layouted, searchable PDF files from hOCR (annotated HTML) input obtained from an OCR system.

OPTIONS

       -i file, --input file
	   Read image from the specified file. Note that input hOCR is read from the standard input.

       -o file, --output file
	   Save output PDF to the specified file.

       -n, --no-image
	   Don't place the image over the text. By default the text layer is hidden behind the image.

       -s, --sloppy-text
	   Sloppily place text, group words, do not draw single glyphs.

       -r n, --resolution n
	   Override resolution of the input image to n dpi. The default resolution (if not specified in the input file) is 300 dpi.

       -h, --help
	   Display help text and exit.

EXAMPLE

	   $ hocr2pdf -i scan.tiff -o test.pdf < cuneiform-out.hocr

SEE ALSO

       exactimage(7)

AUTHORS

       Jakub Wilk <jwilk@debian.org>
	   Wrote this manual page for the Debian system.

       http://www.exactcode.de/site/open_source/exactimage/
	   This manual page incorporates texts found on the ExactImage homepage.

COPYRIGHT

       This manual page was written for the Debian system (and may be used by others).

       Permission is granted to copy, distribute and/or modify this document under the terms of the GNU General Public License, Version 2 or (at
       your option) any later version published by the Free Software Foundation.

       On Debian systems, the complete text of the GNU General Public License can be found in /usr/share/common-licenses/GPL-2.

hocr2pdf							    09/09/2013							       HOCR2PDF(1)

Linux and UNIX Man Pages

hocr2djvused(1) [debian man page]

Check Out this Related Man Page