The UNIX and Linux Forums  

Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
.
google unix.com



Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
how to find a file named vijay in a directory using find command amirthraj_12 UNIX for Dummies Questions & Answers 6 10-25-2008 01:37 PM
Unix command for Watermark printing meeraramanathan SUN Solaris 1 05-30-2008 02:30 AM
Little bit weired : Find files in UNIX w/o using find or where command jatin.jain Shell Programming and Scripting 10 09-19-2007 07:47 AM
command find returned bash: /usr/bin/find: Argument list too long yacsil Shell Programming and Scripting 1 12-15-2003 06:38 PM
how to find a file in UNIX without find command? bluo Shell Programming and Scripting 3 09-25-2003 12:47 AM

Closed Thread
English Japanese Spanish French German Portuguese Italian Dutch Swedish Russian Norwegian Hungarian Hebrew Danish Bulgarian Greek Powered by Powered by Google
 
LinkBack Thread Tools Search this Thread Rate Thread Display Modes
  #1 (permalink)  
Old 10-23-2008
rahulkav rahulkav is offline
Registered User
  
 

Join Date: Aug 2008
Location: Reading, UK
Posts: 11
how to find watermark in a pdf

Hi All,

I have a few pdf files some of which have a watermark on it and my task is to find which all invoices have watermark without actually printing them.
Is there any way we can do this in Unix. Strings is not helping.
any idea how would I read binary file and grep for watermark. The watermark has a text "XYZ"

Regards...
  #2 (permalink)  
Old 10-23-2008
jim mcnamara jim mcnamara is offline Forum Staff  
...@...
  
 

Join Date: Feb 2004
Location: NM
Posts: 5,813
pdf files can be in compressed format, so this may not help at all, and depending on the pdf engine the X Y and Z can be separate:


Code:
for file in *.pdf
do
   grep -l '(XYZ)Tj$' $file
done

You could also try:

Code:
for file in *.pdf
do
   if [[ `grep -q '(X)Tj$' $file` ]] ; then
       if [ `grep -q '(Y)Tj$' $file` ]] ; then
          if [[ `grep -q '(Z)Tj$' $file` ]] ; then
               echo "$file"
          fi
       fi
   fi
done

This last one is an EXTREMELY inefficient method.... but is usually the way watermarks are generated. One char at a time.
  #3 (permalink)  
Old 10-23-2008
rahulkav rahulkav is offline
Registered User
  
 

Join Date: Aug 2008
Location: Reading, UK
Posts: 11
thanks for your answer. Could you explain how does '(X)Tj$' work. pdf is a binary file isn't it. on my version it is giving following error:
grep: illegal option -- q
Usage: grep -hblcnsviw pattern file . . .

it is Sun 5.10

Regards,.
Rahul
  #4 (permalink)  
Old 10-23-2008
jim mcnamara jim mcnamara is offline Forum Staff  
...@...
  
 

Join Date: Feb 2004
Location: NM
Posts: 5,813
figures.. Solaris 5.10 isn't close to POSIX...
grep -q is the 'silent' way to look for a pattern it suppreses the display, but returns a status value ($?) to indicate whether it succeeded or not. See if one of your options does that.

( stuff in here )Tj is the way a postscript ()show command is written to uncompressed pdf format files. This displays "stuff in here".

'(X)Tj$' is the pattern for the command to print a single character 'X'. Again in uncompressed format.

If you literally cannot read your pdf because it is full of really weird characters, then it is compressed and this method will not work.
  #5 (permalink)  
Old 10-23-2008
vimes vimes is offline
Registered User
  
 

Join Date: Oct 2008
Posts: 46
Instead of -q on Solaris you can just do:

Code:
grep value file >/dev/null

Just use the exit status from grep in your if. (0 = match found, 1 = no match).

So in your code it'd be like:

Code:
   if [[ `grep '(X)Tj$' $file >/dev/null` ]] ; then

And so on...
  #6 (permalink)  
Old 10-23-2008
jim mcnamara jim mcnamara is offline Forum Staff  
...@...
  
 

Join Date: Feb 2004
Location: NM
Posts: 5,813
The real rationale behind the -q option is to be more efficient - it stops searching on the first hit. It's there to say 'Hey this pattern is/is not in the file' with the least overhead.
  #7 (permalink)  
Old 10-23-2008
fpmurphy's Avatar
fpmurphy fpmurphy is offline Forum Staff  
Moderator
  
 

Join Date: Dec 2003
Location: Florida
Posts: 1,945
Most of the text to pdf convertors ie. easyPDF, QuickBooks PDF convertor, etc. use stream objects and do not store the "text" within the file in a format which you can grep for.
Closed Thread

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT -4. The time now is 07:03 AM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited. Language Translations Powered by .
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios
The UNIX and Linux Forums Content Copyright ©1993-2009. All Rights Reserved.Ad Management by RedTyger

Content Relevant URLs by vBSEO 3.2.0