Sponsored Content
Top Forums Shell Programming and Scripting Working with OCR text inside PDF files Post 302277217 by dorcas on Thursday 15th of January 2009 05:14:09 PM
Old 01-15-2009
Working with OCR text inside PDF files

I'm trying to find a way to automate cleanup of OCR for a large number of scanned pages - due to limitations of the access mechanism where these are to end up, I need to create pdf files that include the background text for searching.

Going in I have Tif images too dirty to OCR and re-keyed text that matches page for page. I can see from reading here plenty of ways to turn the Tif files into pdf, what I can't find is a way to stick this text into the pdf file - I'm guessing this calls for some reverse-engineering of what ever mapping scheme pdf uses for the coordinates of words or characters. Does anyone know of a tool for getting access to this text - writing as well as reading. I'm looking at pdftk but so far all I can get is a dump of the "metadata" fields, but not the text with position mapping...
 

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

looping a array inside inside ssh is not working, pls help

set -A arr a1 a2 a3 a4 # START ssh -xq $Server1 -l $Username /usr/bin/ksh <<-EOS integer j=0 for loop in ${arr} do printf "array - ${arr}\n" (( j = j + 1 )) j=`expr j+1` done EOS # END ========= this is not giving me correct output. I... (5 Replies)
Discussion started by: reldb
5 Replies

2. Homework & Coursework Questions

copy files inside a text file

Hi Guys , I am new to this and Hi to all ,Need your help I am trying to copy Files which are inside file.txt The files inside file.txt are inthe below order file1.log file2.log file3.log ....... I want to copy these files to an output Directory , Please help (1 Reply)
Discussion started by: hc17972
1 Replies

3. Homework & Coursework Questions

copy files inside a text file

Hi Guys , I am new to this and Hi to all ,Need your help I am trying to copy Files which are inside file.txt The files inside file.txt are inthe below order file1.log file2.log file3.log ....... I want to copy these files to an output Directory , Please help (1 Reply)
Discussion started by: hc17972
1 Replies

4. Shell Programming and Scripting

Searching for a string in .PDF files inside .RAR & .ZIP archives.

Hi, I have got a large number of .PDF files that are archived in .RAR & ZIP files in various directories and I would like to search for strings inside the PDF files. I would think you would need something that can recursively read directories, extract the .RAR/.ZIP file in memory, read the... (3 Replies)
Discussion started by: lewk
3 Replies

5. Programming

Is it possible to change search inside .pdf or .doc files?

the titele was wrong ... the true one is: Is it possible to search words inside .pdf or .doc files? is it possible if i changed the word into binary combination:eek:? and this way is super too hyper huge of greatest codes i ever seen:D to read only 1 word so is there any other ways:confused:? ... (1 Reply)
Discussion started by: fwrlfo
1 Replies

6. UNIX for Dummies Questions & Answers

Pdftotext from multiple pdf files to a single text file

I have a directory having a number of pdf files. I want to convert all the files to text, stored in a single text file The following creates multiple text files ls *.pdf | xargs -n1 pdftotext (1 Reply)
Discussion started by: kristinu
1 Replies

7. Shell Programming and Scripting

Converting secured pdf files to pdf using acroread

Does anybody have idea of Converting secured pdf files to pdf using acroread ? ---------- Post updated at 04:49 PM ---------- Previous update was at 04:44 PM ---------- This file is not password protected. (4 Replies)
Discussion started by: Soham
4 Replies

8. Shell Programming and Scripting

Check if files inside a text file are found under a directory

Hi all, Please somebody help me with this: I want to check if the files listed in a text file, are found under a directory or not. For example: the file is list_of_files.txt, which contains inside this rows: # cat list_of_files logs errors paths debug # I want to check if these... (3 Replies)
Discussion started by: arrals_vl
3 Replies

9. Shell Programming and Scripting

OCR text that needs cleaning

Hi, I have OCR'ed text that needs cleaning. Lines are delimited by parts of speech (POS), for example, each line will have either an adj. OR s. f. OR s. m. etc I need to uppercase all text before the POS but all text within parentheses to be lowercase Text after (and including) the POS... (6 Replies)
Discussion started by: safran
6 Replies
PS_ADD_BOOKMARK(3)							 1							PS_ADD_BOOKMARK(3)

ps_add_bookmark - Add bookmark to current page

SYNOPSIS
int ps_add_bookmark (resource $psdoc, string $text, [int $parent], [int $open]) DESCRIPTION
Adds a bookmark for the current page. Bookmarks usually appear in PDF-Viewers left of the page in a hierarchical tree. Clicking on a book- mark will jump to the given page. The note will not be visible if the document is printed or viewed but it will show up if the document is converted to pdf by either Acrobat Distillertm or Ghostview. PARAMETERS
o $psdoc - Resource identifier of the postscript file as returned by ps_new(3). o $text - The text used for displaying the bookmark. o $parent - A bookmark previously created by this function which is used as the parent of the new bookmark. o $open - If $open is unequal to zero the bookmark will be shown open by the pdf viewer. RETURN VALUES
The returned value is a reference for the bookmark. It is only used if the bookmark shall be used as a parent. The value is greater zero if the function succeeds. In case of an error zero will be returned. SEE ALSO
ps_add_launchlink(3), ps_add_pdflink(3), ps_add_weblink(3). PHP Documentation Group PS_ADD_BOOKMARK(3)
All times are GMT -4. The time now is 04:44 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy