03-28-2011
Great !
I knew the moment that I posted that this issue might come up
. I am not sure if you want to look into pdftotext utility out there, however the overhead might be too much.
10 More Discussions You Might Find Interesting
1. UNIX for Dummies Questions & Answers
Hi,
I've searched this site and not found this already, so if I missed on my search, sorry.
I need to pass in a variable to a script, where the first three characters of that variable represent a calendar quarter, and the last 2 characters are the year. I.E. Q0105 for Q1, Q0205 for Q2, and... (3 Replies)
Discussion started by: Rediranch
3 Replies
2. UNIX Desktop Questions & Answers
i want know how to compress and uncompress file using unix,
compress uncompress,zip,unzip,rar,unrar,how its work and more about this.:confused: (1 Reply)
Discussion started by: ismael xavier
1 Replies
3. UNIX for Dummies Questions & Answers
hey,
i need to use grep to search a bunch of header files inside a directory to return which file i can find the function i'm searching for in. how do i use wild cards to search through the files? i can only figure out how to search inside the directory, not inside the files that are in the... (4 Replies)
Discussion started by: kylethesir
4 Replies
4. Shell Programming and Scripting
I'm trying to find a way to automate cleanup of OCR for a large number of scanned pages - due to limitations of the access mechanism where these are to end up, I need to create pdf files that include the background text for searching.
Going in I have Tif images too dirty to OCR and re-keyed text... (2 Replies)
Discussion started by: dorcas
2 Replies
5. UNIX for Dummies Questions & Answers
Hello
I've an old xbox connected to Ubuntu 8.04 with an ethernet cable.
I use gFTP to transfer files on xbox (through FTP).
When I have to transfer a rar file, first of all I have to extract it on ubuntu, then on xbox.
I would like to transfer rar files directly on xbox. Is it possible with... (4 Replies)
Discussion started by: paolobitta
4 Replies
6. Shell Programming and Scripting
Hello to all,
I have a zip file with any name like FileName.zip, within the zip file there are more than 30 files with different extensions in the following format.
FileName_BMN_ROSJ.txt
FileName_THEUS.jpg
.
.
.
FileName_KWPWP.shx
I would like to unzip the file and rename each file... (2 Replies)
Discussion started by: Ophiuchus
2 Replies
7. UNIX for Dummies Questions & Answers
Hi all, need help here in moving a .zip file into a suse system and want it to be in .rar format. How can i do this? (1 Reply)
Discussion started by: mena
1 Replies
8. UNIX for Dummies Questions & Answers
I have a text which I divided them into sentences and now printed them in a rows.
I want to get the list of most of words ( the, and, a) and print 5 words after them (so 6 with the word itself). I have created an acceptfile with those rows and using grep but I have rows that have these words more... (2 Replies)
Discussion started by: A-V
2 Replies
9. Programming
the titele was wrong ... the true one is: Is it possible to search words inside .pdf or .doc files?
is it possible if i changed the word into binary combination:eek:?
and this way is super too hyper huge of greatest codes i ever seen:D to read only 1 word so is there any other ways:confused:?
... (1 Reply)
Discussion started by: fwrlfo
1 Replies
10. Shell Programming and Scripting
Hii,
Could someone help me to append string to the starting of all the filenames inside a directory but it should exclude .zip files and subdirectories.
Eg.
file1: test1.log
file2: test2.log
file3 test.zip
After running the script
file1: string_test1.log
file2: string_test2.log
file3:... (4 Replies)
Discussion started by: Ravi Kishore
4 Replies
LEARN ABOUT MINIX
pdftotext
pdftotext(1) General Commands Manual pdftotext(1)
NAME
pdftotext - Portable Document Format (PDF) to text converter (version 3.00)
SYNOPSIS
pdftotext [options] [PDF-file [text-file]]
DESCRIPTION
Pdftotext converts Portable Document Format (PDF) files to plain text.
Pdftotext reads the PDF file, PDF-file, and writes a text file, text-file. If text-file is not specified, pdftotext converts file.pdf to
file.txt. If text-file is '-', the text is sent to stdout.
OPTIONS
-f number
Specifies the first page to convert.
-l number
Specifies the last page to convert.
-r number
Specifies the resolution, in DPI. The default is 72 DPI.
-x number
Specifies the x-coordinate of the crop area top left corner
-y number
Specifies the y-coordinate of the crop area top left corner
-W number
Specifies the width of crop area in pixels (default is 0)
-H number
Specifies the height of crop area in pixels (default is 0)
-layout
Maintain (as best as possible) the original physical layout of the text. The default is to 'undo' physical layout (columns, hyphen-
ation, etc.) and output the text in reading order.
-raw Keep the text in content stream order. This is a hack which often "undoes" column formatting, etc. Use of raw mode is no longer
recommended.
-htmlmeta
Generate a simple HTML file, including the meta information. This simply wraps the text in <pre> and </pre> and prepends the meta
headers.
-bbox Generate an XHTML file containing bounding box information for each word in the file.
-enc encoding-name
Sets the encoding to use for text output. This defaults to "UTF-8".
-listenc
Lits the available encodings
-eol unix | dos | mac
Sets the end-of-line convention to use for text output.
-nopgbrk
Don't insert page breaks (form feed characters) between pages.
-opw password
Specify the owner password for the PDF file. Providing this will bypass all security restrictions.
-upw password
Specify the user password for the PDF file.
-q Don't print any messages or errors.
-v Print copyright and version information.
-h Print usage information. (-help and --help are equivalent.)
BUGS
Some PDF files contain fonts whose encodings have been mangled beyond recognition. There is no way (short of OCR) to extract text from
these files.
EXIT CODES
The Xpdf tools use the following exit codes:
0 No error.
1 Error opening a PDF file.
2 Error opening an output file.
3 Error related to PDF permissions.
99 Other error.
AUTHOR
The pdftotext software and documentation are copyright 1996-2004 Glyph & Cog, LLC. pdffonts(1), pdfimages(1), pdfinfo(1), pdftocairo(1),
pdftohtml(1), pdftoppm(1), pdftops(1)
22 January 2004 pdftotext(1)