Limitations of 'pdftotext' in Linux...


 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers Limitations of 'pdftotext' in Linux...
# 8  
Old 11-21-2019
Quote:
Originally Posted by kenlenard
Here is the PDF in question. Thanks again. I will report back after some further testing.
Sorry, but for security reasons I cannot permit you to post a PDF attachment which may be corrupted.

Did you follow my recommendation to run it though a PDF checker?
# 9  
Old 11-21-2019
As I said in my earlier post you will have to see if you have a way to convert to text, such as "Edit Text and Images" menu option in Acrobat.

Andrew
# 10  
Old 11-21-2019
Quote:
Originally Posted by apmcd47
As I said in my earlier post you will have to see if you have a way to convert to text, such as "Edit Text and Images" menu option in Acrobat.

Andrew
As I said in my earlier post... and I am the founder and lead admin here:

Did you follow my recommendation to run it though a PDF checker?

Note: Do not post PDF files here which our team has indicated they believe may be corrupted, especially if you have not validated the PDF is not corrupted. Thanks.

Moderator's Comments:
Mod Comment I'm closing this thread since you will not take the time to validate the PDF as requested, but instead, posted a potentially corrupt file. The last thing we want here is a corrupt or malformed PDF on this site.
# 11  
Old 11-21-2019
Pdftotext issue...

Guys, wow... I did not know that I should not upload a PDF. I apologize. I did not expect that it would get the original thread closed. I was up looking at this issue late last night and did not run the PDF checker until this afternoon. I was not able to check the accessibility of the PDF using my version of Acrobat. I have the PDF checker results. Is that file permissible to upload or is there something from the report that I can post here? The results suggest that the file is a normal PDF but using this utility is out of my experience.
# 12  
Old 11-21-2019
It is important that you and everyone who posts here follow our instructions.

I asked you directly to run your PDF file (a file which you indicated had issues, and our team members advised may have issues) though a PDF integrity checker, and you did not do that, and then up loaded it to our site.

You must follow moderator instructions, and especially admin instructions. This is a requirement is not optional.

So, I do not understand to be frank your "Guys, wow... "... geez wiz reply. I am the creator, lead admin and the person responsible for the integrity of this site for nearly two decades. If I ask you to run your file though a PDF checker, you should do so; but instead, you uploaded a potentially problematic file to this site. It would be less than responsible of me not to delete this file.

But moreover, you did not follow my instructions.

Deleting the PDF which you uploaded and closing your thread was out of an "act of kindness" on my part, as I did not issue you any infraction nor did I change your status to read only for not following my instructions.

This site gets over 1 million visitors a month. Quite frankly, and it is not personal toward you or you good self, I do not have time for those who post here and do not follow my instructions as the admin for this site.

I hope this is clear. Please follow my requests and instructions.

Thank you.

Yes, you can post the results of your file integrity check (in code tags) which in all frankness you should have done before posting the file in the first place, per my request. Thanks.
# 13  
Old 11-21-2019
Okay, here is the PDF_Checker results.

Code:
PDF Checker 1.5.0  Copyright 2018-2019 Datalogics, Inc. All Rights Reserved

Thu Nov 21 13:39:51 2019

JSON Profile: everything.json

Input Document: TTSNEW.pdf

File Size: 426 KB

<<=CHECKER_SUMMARY_START=>>
general:born-digital
images:color:resolution-too-low
sizeInBytes:435947
<<=CHECKER_SUMMARY_END=>>

Optimization Assessment
    Document is appropriately optimized

General Results
    Errors:
        None
    Information:
        Document was born digital.  It was produced from PDF authoring software and so it may contain text, images, tables, forms, and other objects.  These types of PDFs typically do not require OCR.
    Checks Completed:
        born-digital
        claims-pdfa-conformance
        claims-pdfe-conformance
        claims-pdfua-conformance
        claims-pdfvt-conformance
        claims-pdfx-conformance
        contains-owner-password
        contains-signature
        damaged
        image-only
        password-protected
        pdf-v2
        unable-to-open
        xfa-type

Userdata Results
    Errors:
        None
    Information:
        None
    Checks Completed:
        contains-annots
        contains-annots-not-for-printing
        contains-annots-not-for-viewing
        contains-annots-without-normal-appearances
        contains-embedded-files
        contains-metadata
        contains-optional-content
        contains-private-data
        contains-transparency

Fonts Results
    Errors:
        None
    Information:
        None
    Checks Completed:
        fontdescriptor-missing-capheight
        fontdescriptor-missing-fields
        uses-base14fonts-not-embedded
        uses-fonts-fully-embedded
        uses-fonts-not-embedded

Objects Results
    Errors:
        None
    Information:
        None
    Checks Completed:
        contains-javascript-actions
        contains-thumbnails

Cleanup Results
    Errors:
        None
    Information:
        None
    Checks Completed:
        suboptimal-compression

Image Results
    Errors:
        None
    Information:
        None
    Checks Completed:
        alternate-images

    Color Images
    Errors:
        None
    Information:
        Low resolution color image(s) present: 
            Total: (1 instance)
    Checks Completed:
        image-depth
        resolution-too-high
        resolution-too-low
        uses-jpeg2000-compression

    Grayscale Images
    Errors:
        None
    Information:
        None
    Checks Completed:
        resolution-too-high
        resolution-too-low
        uses-jpeg2000-compression

    Monochrome Images
    Errors:
        None
    Information:
        None
    Checks Completed:
        resolution-too-high
        resolution-too-low
        uses-jbig2-compression

My apologies again. I have been up working on a number of different emergencies this week until about 3am each night. This PDF issue is just one problem I am having at the moment and my attention is divided. I'm not trying to rile anyone up. Thank you again for looking at this.
# 14  
Old 11-21-2019
Quote:
My apologies again. I have been up working on a number of different emergencies this week until about 3am each night. This PDF issue is just one problem I am having at the moment and my attention is divided. I'm not trying to rile anyone up. Thank you again for looking at this.
No worries. I completely understand the stress of working many IT coding issues at once and juggling many balls all up in the air at the same time.

I will reopen the original thread and merge this post. into it.

OBTW, I am not "riled" or upset or angry in any way. I am like a "admin robot"... I just insure this site is healthy, running fast and smooth, protect the site from harm, and insure our mission, rules and guidelines are followed.

SOAP BOX COMMENT: Sidebar (not specific to your post):

Sometimes I ask a question or ask for input, to insure that questions are clear, not only for me, but for future generations who visit the site and have similar questions. This site is not a "put a nickel in and get an answer out site", as some would like it to be. Our mission is to teach people to solve their own problems, not to do other's work for them, like the old saying (paraphrasing) which I am sure you have heard before:

"Give a person a fish and you feed them for a day. Teach that same person to fish, and you feed them for a lifetime."

In the age of the Internet and social media, people have become too dependant on others to do their problem solving (and thinking) for them. When I created this site decades ago, long before FB, reddit, stack*, medium, and more; our goals were always to have a very high "signal to noise" ratio and to never become a "put a nickel in and get an answer site", to encourage people to describe and solve their own problems with our help.

I will continue to encourage all users in that direction, even if we are the last site on the Internet to be this way. Smilie

END OF SOAP BOX COMMENT: Sidebar (not specific to your post):

Moderator's Comments:
Mod Comment Discussions merged and reopened.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Pdftotext from multiple pdf files to a single text file

I have a directory having a number of pdf files. I want to convert all the files to text, stored in a single text file The following creates multiple text files ls *.pdf | xargs -n1 pdftotext (1 Reply)
Discussion started by: kristinu
1 Replies

2. Linux

Linux partitions and limitations

In recently reading an article on linux basics before I embark and my personal installation project I came across this passage - IDE drives have three types of partition: primary, logical, and extended. The partition table is located in the master boot record (MBR) of a disk. The MBR is the... (12 Replies)
Discussion started by: Synchlavier
12 Replies

3. Solaris

Solaris limitations

Hi, I recently started working with Solaris, and what I noticed is that a lot of commands I used to regularly use don't work, like sed -i and grep -r. I have found work arounds for these problems though but it's a pain in the ass. I'm just wondering why they decided not to include these handy... (4 Replies)
Discussion started by: Subbeh
4 Replies

4. Red Hat

Eth0 Limitations

Hi, I have noticed some performance issues on my RHEL5 server but the memory and CPU utilization on the box is fine. I have a 1G full duplexed eth0 card and I am suspicious that this may be causing the problem. My eth0 settings are as follows: Settings for eth0: Supported ports: ... (12 Replies)
Discussion started by: Duffs22
12 Replies

5. UNIX for Dummies Questions & Answers

Basic problem with pdftotext

Hi, I have used pdftotext with good results in the past, but today for some reason I keep getting the same error message. My command is as follows: And the error message is I am using Vmware player with Ubuntu server, but I don't think that is causing this issue as I have been using... (2 Replies)
Discussion started by: Joq
2 Replies

6. Red Hat

Limitations on the partition of linux

Hi, I need a documentation about limitations on the linux partition. On how many primary and extended I could create. And also on different type of storage, how many big capacity I can create. Thanks. (3 Replies)
Discussion started by: itik
3 Replies

7. UNIX and Linux Applications

gnuplot limitations

I'm running a simulation (programmed in C) which makes calls to gnuplot periodically to plot data I have stored. First I open a pipe to gnuplot and set it to multiplot: FILE * pipe = popen("gnuplot", "w"); fprintf(pipe, "set multiplot\n"); fflush(pipe); (this pipe stays open until the... (0 Replies)
Discussion started by: sedavidw
0 Replies

8. HP-UX

pdftotext / PDF conversion to .txt binaries

Good day, I've been trying to look for a way to compile the Xpdf sources in our HP-UX server, but have been failing to do so because there is no GCC installed, and I don't have privileges to install GCC. I was looking for a functionality to convert PDF files to .txt, which is exactly like the... (2 Replies)
Discussion started by: mike_s_6
2 Replies

9. UNIX for Dummies Questions & Answers

csplit limitations

I am trying to use the csplit file on a file that contains records that have more than 2048 characters on a line. The resultant split file seems to ignore the rest of the line and I lose the data. Is there any way that csplit can handle record lengths greater than 2048? Thanks (0 Replies)
Discussion started by: ravagga
0 Replies

10. UNIX for Dummies Questions & Answers

mkdir limitations

What characters can't be used with a mkdir? Any limits on length of name? Thank you, Randy M. Zeitman http://www.StoneRoseDesign.com (12 Replies)
Discussion started by: flignar
12 Replies
Login or Register to Ask a Question