Ocr Post: 302307182

Sponsored Content

Special Forums UNIX and Linux Applications Ocr Post 302307182 by CRGreathouse on Tuesday 14th of April 2009 09:53:06 PM

04-14-2009

Registered User

I had downloaded Tesseract earlier, but it had a few problems:
* It wouldn't compile (./configure gave C++ errors).
* It doesn't work on pdf files, only tiffs.
* It doesn't work on files with multiple columns
* It doesn't deskew, despeckle, or do other cleanup needed to get sensible output.

I downloaded OCRopus to try to get around some of the limitations, but without being able to compile Tesseract of course that's all for naught.

Code:

checking build system type... i686-pc-linux-gnu
checking host system type... i686-pc-linux-gnu
checking for cl.exe... no
checking for g++... no
checking for C++ compiler default output file name... configure: error: C++ compiler cannot create executables
See `config.log' for more details.

config.log.txt (4.9 KB)

CRGreathouse

View Public Profile for CRGreathouse

Find all posts by CRGreathouse

3 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Working with OCR text inside PDF files

I'm trying to find a way to automate cleanup of OCR for a large number of scanned pages - due to limitations of the access mechanism where these are to end up, I need to create pdf files that include the background text for searching. Going in I have Tif images too dirty to OCR and re-keyed text...

2. UNIX for Advanced & Expert Users

Regular expression for finding OCR mistakes.

I have a large file of plain text, created using some OCR software. Some words have inevitably been got wrong. I've been trying to create grep or sed, etc., regular expressions to find them - but haven't quite managed to get it right. Here's what I'm trying to achieve: Output all lines which...

3. Shell Programming and Scripting

OCR text that needs cleaning

Hi, I have OCR'ed text that needs cleaning. Lines are delimited by parts of speech (POS), for example, each line will have either an adj. OR s. f. OR s. m. etc I need to uppercase all text before the POS but all text within parentheses to be lowercase Text after (and including) the POS...

LEARN ABOUT DEBIAN

llexec

LLINES(1)						      General Commands Manual							 LLINES(1)

NAME

       llexec

SYNOPSIS

       llexec [OPTION]...
       llexec [OPTIONS]... -x report Database

DESCRIPTION

       Invokes the LifeLines genealogy program, execute one report (with the -x switch), then exit.

       For  up	to  date  documentation,  please see the LifeLines reference manual, or visit the project site online at: http://lifelines.source-
       forge.net

       llexec is a rustic version of the Lifelines software, basically without curses interface. It is mainly designed for CGI use.  For  interac-
       tive use, please use the llines(1) program.

OPTIONS

       -x report
	      execute the report program from the file named "report"

       -F     Finnish option (only available if compiled with Finnish flag)

       -d     developmental/debug mode (signals are not caught)

       -f     force open a database (only for use if reader/writer count is wrong)

       -i     open database with immutable access (no protection against other access -- for use on read-only media)

       -k     always show key values (normally key is not shown if a REFN is shown)

       -r     open database with read-only access (protect against other writer access)

       -u COLS,ROWS
	      specify window size (eg, -u120,34 specifies 120 columns by 34 rows)

       -w     open database with writeable access (protecte against other reader or writer access)

       -?     display options summary

SEE ALSO

       llines(1), btedit(1), dbverify(1)

DOCUMENTATION

       The  LifeLines documentation is generated as part of the installation, and may also be viewed at the project site: http://lifelines.source-
       forge.net.

Lifelines 3.0.28						     2003 May								 LLINES(1)