SHAPECLUSTERING(1)SHAPECLUSTERING(1)NAME
shapeclustering - shape clustering training for Tesseract
SYNOPSIS
shapeclustering -D output_dir -U unicharset -O mfunicharset -F font_props -X xheights FILE...
DESCRIPTION shapeclustering(1) takes extracted feature .tr files (generated by tesseract(1) run in a special mode from box files) and produces a file
shapetable and an enhanced unicharset. This program is still experimental, and is not required (yet) for training Tesseract.
OPTIONS -U FILE
The unicharset generated by unicharset_extractor(1).
-D dir
Directory to write output files to.
-F font_properties_file
(Input) font properties file, where each line is of the following form, where each field other than the font name is 0 or 1:
'font_name' 'italic' 'bold' 'fixed_pitch' 'serif' 'fraktur'
-X xheights_file
(Input) x heights file, each line is of the following form, where xheight is calculated as the pixel x height of a character drawn at
32pt on 300 dpi. [ That is, if base x height + ascenders + descenders = 133, how much is x height? ]
'font_name' 'xheight'
-O FILE
The output unicharset that will be given to combine_tessdata(1).
SEE ALSO tesseract(1), cntraining(1), unicharset_extractor(1), combine_tessdata(1), unicharset(5)
http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3
COPYING
Copyright (C) Google, 2011 Licensed under the Apache License, Version 2.0
AUTHOR
The Tesseract OCR engine was written by Ray Smith and his research groups at Hewlett Packard (1985-1995) and Google (2006-present).
02/09/2012 SHAPECLUSTERING(1)
Check Out this Related Man Page
ADDFTINFO(1) General Commands Manual ADDFTINFO(1)NAME
addftinfo - add information to troff font files for use with groff
SYNOPSIS
addftinfo [ -v ] [ -param value... ] res unitwidth font
DESCRIPTION
addftinfo reads a troff font file and adds some additional font-metric information that is used by the groff system. The font file with
the information added is written on the standard output. The information added is guessed using some parametric information about the font
and assumptions about the traditional troff names for characters. The main information added is the heights and depths of characters. The
res and unitwidth arguments should be the same as the corresponding parameters in the DESC file; font is the name of the file describing
the font; if font ends with I the font will be assumed to be italic.
OPTIONS -v prints the version number.
All other options changes one of the parameters that is used to derive the heights and depths. Like the existing quantities in the font
file, each value is in inches/res for a font whose point size is unitwidth. param must be one of:
x-height
The height of lowercase letters without ascenders such as x.
fig-height
The height of figures (digits).
asc-height
The height of characters with ascenders, such as b, d or l.
body-height
The height of characters such as parentheses.
cap-height
The height of uppercase letters such as A.
comma-depth
The depth of a comma.
desc-depth
The depth of characters with descenders, such as p,q, or y.
body-depth
The depth of characters such as parentheses.
addftinfo makes no attempt to use the specified parameters to guess the unspecified parameters. If a parameter is not specified the
default will be used. The defaults are chosen to have the reasonable values for a Times font.
SEE ALSO groff_font(5), groff(1), groff_char(7)Groff Version 1.18.1 27 June 2001 ADDFTINFO(1)
I have a huge matrix file containing some 1.5 million rows and 6000 columns. The matrix looks something like this:
1 2 3
4 5 6
7 8 9
3 4 5
I want to add all the numbers in the columns of this matrix and display the result to my stdout. This means that the numbers in the first column are:
... (2 Replies)
Hi,
Just trying to get to grips with sed and awk for some reporting for work and I need some assistance:
I have a file that lists policy names on the first line and then on the second line whether the policy is active or not.
Policy Name: Policy1
Active: yes
Policy... (8 Replies)
version info :
vi availabe with RHEL 5.4
I have a text file with 10,000 lines. I want to copy lines from 5000th line to 7000th and redirect to a file. Any idea how I can do this?
Note:
The above scenario is just an example. In my actual requirement, the file has 14 million lines and I want... (9 Replies)
Hi everyone,
I know the following questions are noobish questions but I am asking them because I am confused about the basics of history behind UNIX and LINUX.
Ok onto business, my questions are-:
Was/Is UNIX ever an open source operating system ?
If UNIX was... (21 Replies)
Hello,
I couldn't find an actual introduction thread, so I decided to just put this here.
I go by d0wngrade online. I have been programming in multiple languages for about 15+ years. I started with standard web design languages like HTML and CSS, but I then advanced from design to development... (2 Replies)
Hi guys...
The first active code line in AudioScope.sh is set -u .
This causes a complete exit if a variable is used/found but has not been allocated at the start of the program.
However, apart from writing code to do the task, is there a switch to to check which variables have been... (17 Replies)
Hi.
In thread https://www.unix.com/shell-programming-and-scripting/267833-grouping-counting.html rovf and I had a mini-discussion on grep and awk.
Here is a demo script that compares the awk and grep approaches for this single problem:
#!/usr/bin/env bash
# @(#) s2 Demonstrate group... (1 Reply)
Hello,
I have to fish out some specific columns from a file based on the header value. I have the list of columns I need in a different file. I thought I could read in the list of headers I need,
# file with header names of required columns in required order
headers_file=$2
# read contents... (11 Replies)
For those interested in installing dash shell on OSX Lion to help test POSIX compliancy of shell scripts, it is quite easy. I did it like this:
If you don't have gcc on your system:
0. Download and install the Command Line Tools for Xcode package from Sign In - Apple *
1. Download the dash... (2 Replies)
Hello and thanks in advance for any help anyone can offer me
I'm trying to learn the find command and thought I was understanding it... Apparently I was wrong. I was doing compound searches and I started getting weird results with the -size test. I was trying to do a search on a 1G file owned by... (14 Replies)
I have data of an excel files as given below,
file1
org1_1 1 1 2.5 100
org1_2 1 2 5.5 98
org1_3 1 3 7.2 88
file2
org2_1 1 1 2.5 100
org2_2 1 2 5.5 56
org2_3 1 3 7.2 70
I have multiple excel files as above shown.
I have to copy column 1, column 4 and paste into a new excel file as... (26 Replies)
Dear All,
Taking a break from Vue.js coding for the site, SEO and YT videos; and hopefully addressing some well deserved criticism from some here that I have been too focused on the visual aspects of the forums versus the substance and the community....
While the "current generation... (9 Replies)
Hi all...
Well guys and gals, I jumped in at the deep end and found things that PERL cannot do by default.
Many tricky terminal escape codes are not catered for so I had to create workarounds.
One thing I searched for was this:
Passing perl variable to shell command
AND, @Neo this was... (15 Replies)