Ocr

 
Thread Tools Search this Thread
Special Forums UNIX and Linux Applications Ocr
# 1  
Old 04-14-2009
Ocr

Is there any open-source software that OCRs PDFs?
# 2  
Old 04-14-2009
check if this or this is what you want

or maybe this
# 3  
Old 04-14-2009
I had downloaded Tesseract earlier, but it had a few problems:
* It wouldn't compile (./configure gave C++ errors).
* It doesn't work on pdf files, only tiffs.
* It doesn't work on files with multiple columns
* It doesn't deskew, despeckle, or do other cleanup needed to get sensible output.

I downloaded OCRopus to try to get around some of the limitations, but without being able to compile Tesseract of course that's all for naught.
Code:
checking build system type... i686-pc-linux-gnu
checking host system type... i686-pc-linux-gnu
checking for cl.exe... no
checking for g++... no
checking for C++ compiler default output file name... configure: error: C++ compiler cannot create executables
See `config.log' for more details.

Login or Register to Ask a Question

Previous Thread | Next Thread

3 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

OCR text that needs cleaning

Hi, I have OCR'ed text that needs cleaning. Lines are delimited by parts of speech (POS), for example, each line will have either an adj. OR s. f. OR s. m. etc I need to uppercase all text before the POS but all text within parentheses to be lowercase Text after (and including) the POS... (6 Replies)
Discussion started by: safran
6 Replies

2. UNIX for Advanced & Expert Users

Regular expression for finding OCR mistakes.

I have a large file of plain text, created using some OCR software. Some words have inevitably been got wrong. I've been trying to create grep or sed, etc., regular expressions to find them - but haven't quite managed to get it right. Here's what I'm trying to achieve: Output all lines which... (2 Replies)
Discussion started by: gencon
2 Replies

3. Shell Programming and Scripting

Working with OCR text inside PDF files

I'm trying to find a way to automate cleanup of OCR for a large number of scanned pages - due to limitations of the access mechanism where these are to end up, I need to create pdf files that include the background text for searching. Going in I have Tif images too dirty to OCR and re-keyed text... (2 Replies)
Discussion started by: dorcas
2 Replies
Login or Register to Ask a Question