how to find watermark in a pdf


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting how to find watermark in a pdf
# 1  
Old 10-23-2008
how to find watermark in a pdf

Hi All,

I have a few pdf files some of which have a watermark on it and my task is to find which all invoices have watermark without actually printing them.
Is there any way we can do this in Unix. Strings is not helping.
any idea how would I read binary file and grep for watermark. The watermark has a text "XYZ"

Regards...
# 2  
Old 10-23-2008
pdf files can be in compressed format, so this may not help at all, and depending on the pdf engine the X Y and Z can be separate:

Code:
for file in *.pdf
do
   grep -l '(XYZ)Tj$' $file
done

You could also try:
Code:
for file in *.pdf
do
   if [[ `grep -q '(X)Tj$' $file` ]] ; then
       if [ `grep -q '(Y)Tj$' $file` ]] ; then
          if [[ `grep -q '(Z)Tj$' $file` ]] ; then
               echo "$file"
          fi
       fi
   fi
done

This last one is an EXTREMELY inefficient method.... but is usually the way watermarks are generated. One char at a time.
# 3  
Old 10-23-2008
thanks for your answer. Could you explain how does '(X)Tj$' work. pdf is a binary file isn't it. on my version it is giving following error:
grep: illegal option -- q
Usage: grep -hblcnsviw pattern file . . .

it is Sun 5.10

Regards,.
Rahul
# 4  
Old 10-23-2008
figures.. Solaris 5.10 isn't close to POSIX...
grep -q is the 'silent' way to look for a pattern it suppreses the display, but returns a status value ($?) to indicate whether it succeeded or not. See if one of your options does that.

( stuff in here )Tj is the way a postscript ()show command is written to uncompressed pdf format files. This displays "stuff in here".

'(X)Tj$' is the pattern for the command to print a single character 'X'. Again in uncompressed format.

If you literally cannot read your pdf because it is full of really weird characters, then it is compressed and this method will not work.
# 5  
Old 10-23-2008
Instead of -q on Solaris you can just do:
Code:
grep value file >/dev/null

Just use the exit status from grep in your if. (0 = match found, 1 = no match).

So in your code it'd be like:
Code:
   if [[ `grep '(X)Tj$' $file >/dev/null` ]] ; then

And so on...
# 6  
Old 10-23-2008
The real rationale behind the -q option is to be more efficient - it stops searching on the first hit. It's there to say 'Hey this pattern is/is not in the file' with the least overhead.
# 7  
Old 10-23-2008
Most of the text to pdf convertors ie. easyPDF, QuickBooks PDF convertor, etc. use stream objects and do not store the "text" within the file in a format which you can grep for.
Login or Register to Ask a Question

Previous Thread | Next Thread

8 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

Change size of watermark

I'm using this code to watermark images (add a logo). How do I change the size of the watermark to cover a certain percentage of the image ffmpeg -i folder/s886_01.jpg -i watermark.png -filter_complex overlay=15:15 output.png (2 Replies)
Discussion started by: locoroco
2 Replies

2. Shell Programming and Scripting

Find all pdf an get make a folder from filename substring

Hi , i need your advice. i will find all *.pdf files and make a folder for every different prefix of file names. for example: test_21424234.pdf new_242342.pdf at the and i will that i create ( if not exits ) a new folder "test" and "new" , afterwards i will move the file in this new... (3 Replies)
Discussion started by: Maxwill
3 Replies

3. Shell Programming and Scripting

find . -path "*_nobackup*" -prune -iname "*.PDF" \( ! -name "*_nobackup.*" \)

These three finds worked as expected: $ find . -iname "*.PDF" $ find . -iname "*.PDF" \( ! -name "*_nobackup.*" \) $ find . -path "*_nobackup*" -prune -iname "*.PDF" They all returned the match: ./folder/file.pdf :b: This find returned no matches: $ find . -path "*_nobackup*" -prune... (3 Replies)
Discussion started by: wolfv
3 Replies

4. Shell Programming and Scripting

Converting secured pdf files to pdf using acroread

Does anybody have idea of Converting secured pdf files to pdf using acroread ? ---------- Post updated at 04:49 PM ---------- Previous update was at 04:44 PM ---------- This file is not password protected. (4 Replies)
Discussion started by: Soham
4 Replies

5. Shell Programming and Scripting

PDF Script to extract PDF Links MOD in Need

In here we have a script to extract all pdf links from a single page.. any idea's in how make this read instead of a page a list of pages.. and extract all pdf links ? #!/bin/bash # NAME: pdflinkextractor # AUTHOR: Glutanimate (http://askubuntu.com/users/81372/), 2013 #... (1 Reply)
Discussion started by: danielldf
1 Replies

6. Shell Programming and Scripting

Find out if PDF file is corrupted

Hello , I have several hundered PDF Files in which 20% seemes to be corrupt. Is it possible to create a Perl Script which uses a PDF Perl Module which open and closes a PDF File and reports the status(Health) if the file is corrupted or not. best regards from sdohn (1 Reply)
Discussion started by: sdohn
1 Replies

7. Shell Programming and Scripting

Perl - Convert html to pdf - PDF::FromHTML

Hi, I am trying to convert html to pdf using perl module PDF::FromHTML, am getting the error as given below. not well-formed (invalid token) at line 2, column 17, byte 56 at C:/Perl/lib/XML/Parser.pm line 187 at C:/Perl/site/lib/PDF/FromHTML.pm line 140 The perl code is as given... (2 Replies)
Discussion started by: DILEEP410
2 Replies

8. Solaris

Unix command for Watermark printing

Hi, Can anybody help me out with the unix command for watermark. i am using solaris 9. I have installed CUPS software. Printer driver supports watermark. I would like to know the watermark option for text,font and fontsize. lpr -P printername -o ________________ filename. please help... (1 Reply)
Discussion started by: meeraramanathan
1 Replies
Login or Register to Ask a Question