Thanks for the update. Did you try Jim's suggestion here: ?
I did. I reported back that my version of Acrobat does not have the accessibility tool (apparently). When I click on it it shows that it's a "pro" feature that I do not have. But I have tried to save the document in Acrobat and it will save it under another filename without issue.
This would indicate that the first place to look would be at the fonts, since the man page says:
BUGS -- Some PDF files contain fonts whose encodings have been mangled beyond recognition. There is no way (short of OCR) to extract text from these files.
Did you check the file and list all the fonts and compare that list of fonts to a working PDF file (which converts to text properly)?
I was just looking at that and comparing the old version to the new version. The PDF_checker for the old version (which DOES convert) says that there are font errors...
I'm in a bit of deep water here because I'm an application programmer and rarely lift the hood on PDF structure. On this project where I use 'pdftotext', I simply use the command line instructions, take my text file and move on. Once the utility doesn't work (for whatever reason), I'm at a loss. My guess is that the size of the PDF (425kb for the bad one compared to about 17kb for the ones that work properly) suggests that it's actually an image. Does the PDF_Checker information I posted earlier tell us that or no? Thanks again.
I am trying to use the csplit file on a file that contains records that have more than 2048 characters on a line. The resultant split file seems to ignore the rest of the line and I lose the data.
Is there any way that csplit can handle record lengths greater than 2048?
Thanks (0 Replies)
Good day,
I've been trying to look for a way to compile the Xpdf sources in our HP-UX server, but have been failing to do so because there is no GCC installed, and I don't have privileges to install GCC. I was looking for a functionality to convert PDF files to .txt, which is exactly like the... (2 Replies)
I'm running a simulation (programmed in C) which makes calls to gnuplot periodically to plot data I have stored.
First I open a pipe to gnuplot and set it to multiplot:
FILE * pipe = popen("gnuplot", "w");
fprintf(pipe, "set multiplot\n");
fflush(pipe);
(this pipe stays open until the... (0 Replies)
Hi,
I need a documentation about limitations on the linux partition. On how many primary and extended I could create. And also on different type of storage, how many big capacity I can create.
Thanks. (3 Replies)
Hi,
I have used pdftotext with good results in the past, but today for some reason I keep getting the same error message.
My command is as follows:
And the error message is
I am using Vmware player with Ubuntu server, but I don't think that is causing this issue as I have been using... (2 Replies)
Hi,
I have noticed some performance issues on my RHEL5 server but the memory and CPU utilization on the box is fine.
I have a 1G full duplexed eth0 card and I am suspicious that this may be causing the problem. My eth0 settings are as follows:
Settings for eth0:
Supported ports: ... (12 Replies)
Hi,
I recently started working with Solaris, and what I noticed is that a lot of commands I used to regularly use don't work, like sed -i and grep -r. I have found work arounds for these problems though but it's a pain in the ass.
I'm just wondering why they decided not to include these handy... (4 Replies)
In recently reading an article on linux basics before I embark and my personal installation project I came across this passage -
IDE drives have three types of partition: primary, logical, and extended. The partition table is located in the master boot record (MBR) of a disk. The MBR is the... (12 Replies)
I have a directory having a number of pdf files.
I want to convert all the files to text, stored in a single text file
The following creates multiple text files
ls *.pdf | xargs -n1 pdftotext (1 Reply)