Searching for a string in .PDF files inside .RAR & .ZIP archives.


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Searching for a string in .PDF files inside .RAR & .ZIP archives.
# 1  
Old 03-28-2011
Searching for a string in .PDF files inside .RAR & .ZIP archives.

Hi,

I have got a large number of .PDF files that are archived in .RAR & ZIP files in various directories and I would like to search for strings inside the PDF files.

I would think you would need something that can recursively read directories, extract the .RAR/.ZIP file in memory, read the PDF in memory, search for the given string in the PDF, display the result and in what .RAR/.ZIP filename and PDF it was found and discard everything to /dev/null so that you don't sit with everything extracted on your hard drive after the script is done, then move on to the next .RAR/.ZIP file etc. until done.

Is there any shell scripting wizards that could assist me with this?

Thanks
# 2  
Old 03-28-2011
find with exec grep should work

this is my first post so I hope I dont screw up.

Think this should work

Code:
mkdir testfolder
cp test.zip testfolder/
cd testfolder/
unzip test.zip 
find . -type f -exec grep teststring -print {} \;
cd ..
rm -rf testfolder/

you would have to insert a statement to unpack the rar files.

Last edited by pludi; 03-28-2011 at 04:49 AM.. Reason: code tags, please
# 3  
Old 03-28-2011
Welcome pkabali! There are some very knowledgeable people on here.

I think what you are suggesting is good but I am not sure how well find reads .PDF metadata ? I am searching but there is probably a CLI app that can read and print a .PDF on the CLI.

I found another script that is kind of in the direction of what I am looking for, I am just asking the guy for permission to post it here.
# 4  
Old 03-28-2011
Great ! Smilie

I knew the moment that I posted that this issue might come up Smilie . I am not sure if you want to look into pdftotext utility out there, however the overhead might be too much.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Append string to all the files inside a directory excluding subdirectories and .zip files

Hii, Could someone help me to append string to the starting of all the filenames inside a directory but it should exclude .zip files and subdirectories. Eg. file1: test1.log file2: test2.log file3 test.zip After running the script file1: string_test1.log file2: string_test2.log file3:... (4 Replies)
Discussion started by: Ravi Kishore
4 Replies

2. Programming

Is it possible to change search inside .pdf or .doc files?

the titele was wrong ... the true one is: Is it possible to search words inside .pdf or .doc files? is it possible if i changed the word into binary combination:eek:? and this way is super too hyper huge of greatest codes i ever seen:D to read only 1 word so is there any other ways:confused:? ... (1 Reply)
Discussion started by: fwrlfo
1 Replies

3. UNIX for Dummies Questions & Answers

searching words & print prefixed string after it

I have a text which I divided them into sentences and now printed them in a rows. I want to get the list of most of words ( the, and, a) and print 5 words after them (so 6 with the word itself). I have created an acceptfile with those rows and using grep but I have rows that have these words more... (2 Replies)
Discussion started by: A-V
2 Replies

4. UNIX for Dummies Questions & Answers

Move a .zip file to a unix system in .rar format

Hi all, need help here in moving a .zip file into a suse system and want it to be in .rar format. How can i do this? (1 Reply)
Discussion started by: mena
1 Replies

5. Shell Programming and Scripting

Rename files that are inside zip file

Hello to all, I have a zip file with any name like FileName.zip, within the zip file there are more than 30 files with different extensions in the following format. FileName_BMN_ROSJ.txt FileName_THEUS.jpg . . . FileName_KWPWP.shx I would like to unzip the file and rename each file... (2 Replies)
Discussion started by: Ophiuchus
2 Replies

6. UNIX for Dummies Questions & Answers

extract rar/zip files from pc DIRECTLY on xbox (FTP)

Hello I've an old xbox connected to Ubuntu 8.04 with an ethernet cable. I use gFTP to transfer files on xbox (through FTP). When I have to transfer a rar file, first of all I have to extract it on ubuntu, then on xbox. I would like to transfer rar files directly on xbox. Is it possible with... (4 Replies)
Discussion started by: paolobitta
4 Replies

7. Shell Programming and Scripting

Working with OCR text inside PDF files

I'm trying to find a way to automate cleanup of OCR for a large number of scanned pages - due to limitations of the access mechanism where these are to end up, I need to create pdf files that include the background text for searching. Going in I have Tif images too dirty to OCR and re-keyed text... (2 Replies)
Discussion started by: dorcas
2 Replies

8. UNIX for Dummies Questions & Answers

searching files inside directory

hey, i need to use grep to search a bunch of header files inside a directory to return which file i can find the function i'm searching for in. how do i use wild cards to search through the files? i can only figure out how to search inside the directory, not inside the files that are in the... (4 Replies)
Discussion started by: kylethesir
4 Replies

9. UNIX Desktop Questions & Answers

file zip,rar,tar,compress,uncompress,unzip,unrar

i want know how to compress and uncompress file using unix, compress uncompress,zip,unzip,rar,unrar,how its work and more about this.:confused: (1 Reply)
Discussion started by: ismael xavier
1 Replies

10. UNIX for Dummies Questions & Answers

Pattern searching inside Variable - not looking at files

Hi, I've searched this site and not found this already, so if I missed on my search, sorry. I need to pass in a variable to a script, where the first three characters of that variable represent a calendar quarter, and the last 2 characters are the year. I.E. Q0105 for Q1, Q0205 for Q2, and... (3 Replies)
Discussion started by: Rediranch
3 Replies
Login or Register to Ask a Question