Linux and UNIX Man Pages

Linux & Unix Commands - Search Man Pages

hocr2djvused(1) [debian man page]

HOCR2DJVUSED(1) 						hocr2djvused manual						   HOCR2DJVUSED(1)

NAME
hocr2djvused - hOCR to djvused script converter SYNOPSIS
hocr2djvused [option...] DESCRIPTION
hocr2djvused reads a hOCR[1] file (as produced by OCRopus[2] or Cuneiform[3] or Tesseract[4]) from the standard input and converts it to a djvused script. OPTIONS
Text segmentation options -t lines, --details lines Record location of every line. Don't record locations of particular words or characters. -t words, --details=words Record location of every line and every word. Don't record locations of particular characters. This is the default. -t chars, --details=chars Record location of every line, every word and every character. --word-segmentation=simple Consider each non-empty sequence of non-whitespace characters a single word. This is the default, despite being linguistically incorrect. --word-segmentation=uax29 Use the Unicode Text Segmentation[5] algorithm to break lines into words. This options break assumptions of some DjVu tools that words are separated by spaces, and therefore is it not recommended. Other options --rotation=n Assume that DjVu pages are rotated by n degrees. --page-size=widthxheight Specifies that page size is width pixels x height pixels. This option is required for hOCR generated by Cuneiform (< 0.8) and superfluous otherwise. --html5 Use a HTML5 parser[6], which is more robust but slower than the default parser. --version Output version information and exit. -h, --help Display help and exit. SEE ALSO
ocrodjvu(1), djvused(1) AUTHOR
Jakub Wilk <jwilk@jwilk.net> Author. NOTES
1. hOCR http://docs.google.com/View?docid=dfxcv4vc_67g844kf 2. OCRopus http://ocropus.googlecode.com/ 3. Cuneiform http://launchpad.net/cuneiform-linux 4. Tesseract http://tesseract-ocr.googlecode.com/ 5. Unicode Text Segmentation http://unicode.org/reports/tr29/ 6. HTML5 parser http://www.whatwg.org/specs/web-apps/current-work/#html-parser hocr2djvused 0.7.9 03/10/2012 HOCR2DJVUSED(1)

Check Out this Related Man Page

Text::Wrap(3pm) 					 Perl Programmers Reference Guide					   Text::Wrap(3pm)

NAME
Text::Wrap - line wrapping to form simple paragraphs SYNOPSIS
Example 1 use Text::Wrap $initial_tab = " "; # Tab before first line $subsequent_tab = ""; # All other lines flush left print wrap($initial_tab, $subsequent_tab, @text); print fill($initial_tab, $subsequent_tab, @text); @lines = wrap($initial_tab, $subsequent_tab, @text); @paragraphs = fill($initial_tab, $subsequent_tab, @text); Example 2 use Text::Wrap qw(wrap $columns $huge); $columns = 132; # Wrap at 132 characters $huge = 'die'; $huge = 'wrap'; $huge = 'overflow'; Example 3 use Text::Wrap $Text::Wrap::columns = 72; print wrap('', '', @text); DESCRIPTION
"Text::Wrap::wrap()" is a very simple paragraph formatter. It formats a single paragraph at a time by breaking lines at word boundries. Indentation is controlled for the first line ($initial_tab) and all subsequent lines ($subsequent_tab) independently. Please note: $ini- tial_tab and $subsequent_tab are the literal strings that will be used: it is unlikley you would want to pass in a number. Text::Wrap::fill() is a simple multi-paragraph formatter. It formats each paragraph separately and then joins them together when it's done. It will destory any whitespace in the original text. It breaks text into paragraphs by looking for whitespace after a newline. In other respects it acts like wrap(). OVERRIDES
"Text::Wrap::wrap()" has a number of variables that control its behavior. Because other modules might be using "Text::Wrap::wrap()" it is suggested that you leave these variables alone! If you can't do that, then use "local($Text::Wrap::VARIABLE) = YOURVALUE" when you change the values so that the original value is restored. This "local()" trick will not work if you import the variable into your own namespace. Lines are wrapped at $Text::Wrap::columns columns. $Text::Wrap::columns should be set to the full width of your output device. In fact, every resulting line will have length of no more than "$columns - 1". It is possible to control which characters terminate words by modifying $Text::Wrap::break. Set this to a string such as '[s:]' (to break before spaces or colons) or a pre-compiled regexp such as "qr/[s']/" (to break before spaces or apostrophes). The default is simply 's'; that is, words are terminated by spaces. (This means, among other things, that trailing punctuation such as full stops or commas stay with the word they are "attached" to.) Beginner note: In example 2, above $columns is imported into the local namespace, and set locally. In example 3, $Text::Wrap::columns is set in its own namespace without importing it. "Text::Wrap::wrap()" starts its work by expanding all the tabs in its input into spaces. The last thing it does it to turn spaces back into tabs. If you do not want tabs in your results, set $Text::Wrap::unexapand to a false value. Likewise if you do not want to use 8-character tabstops, set $Text::Wrap::tabstop to the number of characters you do want for your tabstops. If you want to separate your lines with something other than " " then set $Text::Wrap::seporator to your preference. When words that are longer than $columns are encountered, they are broken up. "wrap()" adds a " " at column $columns. This behavior can be overridden by setting $huge to 'die' or to 'overflow'. When set to 'die', large words will cause "die()" to be called. When set to 'overflow', large words will be left intact. Historical notes: 'die' used to be the default value of $huge. Now, 'wrap' is the default value. EXAMPLE
print wrap(" ","","This is a bit of text that forms a normal book-style paragraph"); AUTHOR
David Muir Sharnoff <muir@idiom.com> with help from Tim Pierce and many many others. perl v5.8.0 2002-06-01 Text::Wrap(3pm)
Man Page