03-28-2011
Searching for a string in .PDF files inside .RAR & .ZIP archives.
Hi,
I have got a large number of .PDF files that are archived in .RAR & ZIP files in various directories and I would like to search for strings inside the PDF files.
I would think you would need something that can recursively read directories, extract the .RAR/.ZIP file in memory, read the PDF in memory, search for the given string in the PDF, display the result and in what .RAR/.ZIP filename and PDF it was found and discard everything to /dev/null so that you don't sit with everything extracted on your hard drive after the script is done, then move on to the next .RAR/.ZIP file etc. until done.
Is there any shell scripting wizards that could assist me with this?
Thanks
10 More Discussions You Might Find Interesting
1. UNIX for Dummies Questions & Answers
Hi,
I've searched this site and not found this already, so if I missed on my search, sorry.
I need to pass in a variable to a script, where the first three characters of that variable represent a calendar quarter, and the last 2 characters are the year. I.E. Q0105 for Q1, Q0205 for Q2, and... (3 Replies)
Discussion started by: Rediranch
3 Replies
2. UNIX Desktop Questions & Answers
i want know how to compress and uncompress file using unix,
compress uncompress,zip,unzip,rar,unrar,how its work and more about this.:confused: (1 Reply)
Discussion started by: ismael xavier
1 Replies
3. UNIX for Dummies Questions & Answers
hey,
i need to use grep to search a bunch of header files inside a directory to return which file i can find the function i'm searching for in. how do i use wild cards to search through the files? i can only figure out how to search inside the directory, not inside the files that are in the... (4 Replies)
Discussion started by: kylethesir
4 Replies
4. Shell Programming and Scripting
I'm trying to find a way to automate cleanup of OCR for a large number of scanned pages - due to limitations of the access mechanism where these are to end up, I need to create pdf files that include the background text for searching.
Going in I have Tif images too dirty to OCR and re-keyed text... (2 Replies)
Discussion started by: dorcas
2 Replies
5. UNIX for Dummies Questions & Answers
Hello
I've an old xbox connected to Ubuntu 8.04 with an ethernet cable.
I use gFTP to transfer files on xbox (through FTP).
When I have to transfer a rar file, first of all I have to extract it on ubuntu, then on xbox.
I would like to transfer rar files directly on xbox. Is it possible with... (4 Replies)
Discussion started by: paolobitta
4 Replies
6. Shell Programming and Scripting
Hello to all,
I have a zip file with any name like FileName.zip, within the zip file there are more than 30 files with different extensions in the following format.
FileName_BMN_ROSJ.txt
FileName_THEUS.jpg
.
.
.
FileName_KWPWP.shx
I would like to unzip the file and rename each file... (2 Replies)
Discussion started by: Ophiuchus
2 Replies
7. UNIX for Dummies Questions & Answers
Hi all, need help here in moving a .zip file into a suse system and want it to be in .rar format. How can i do this? (1 Reply)
Discussion started by: mena
1 Replies
8. UNIX for Dummies Questions & Answers
I have a text which I divided them into sentences and now printed them in a rows.
I want to get the list of most of words ( the, and, a) and print 5 words after them (so 6 with the word itself). I have created an acceptfile with those rows and using grep but I have rows that have these words more... (2 Replies)
Discussion started by: A-V
2 Replies
9. Programming
the titele was wrong ... the true one is: Is it possible to search words inside .pdf or .doc files?
is it possible if i changed the word into binary combination:eek:?
and this way is super too hyper huge of greatest codes i ever seen:D to read only 1 word so is there any other ways:confused:?
... (1 Reply)
Discussion started by: fwrlfo
1 Replies
10. Shell Programming and Scripting
Hii,
Could someone help me to append string to the starting of all the filenames inside a directory but it should exclude .zip files and subdirectories.
Eg.
file1: test1.log
file2: test2.log
file3 test.zip
After running the script
file1: string_test1.log
file2: string_test2.log
file3:... (4 Replies)
Discussion started by: Ravi Kishore
4 Replies
LEARN ABOUT DEBIAN
pdf::api2::basic::pdf::dict
PDF::API2::Basic::PDF::Dict(3pm) User Contributed Perl Documentation PDF::API2::Basic::PDF::Dict(3pm)
NAME
PDF::API2::Basic::PDF::Dict - PDF Dictionaries and Streams. Inherits from PDF::Objind
INSTANCE VARIABLES
There are various special instance variables which are used to look after, particularly, streams. Each begins with a space:
stream
Holds the stream contents for output
streamfile
Holds the stream contents in an external file rather than in memory. This is not the same as a PDF file stream. The data is stored in
its unfiltered form.
streamloc
If both ' stream' and ' streamfile' are empty, this indicates where in the source PDF the stream starts.
METHODS
$d->outobjdeep($fh)
Outputs the contents of the dictionary to a PDF file. This is a recursive call.
It also outputs a stream if the dictionary has a stream element. If this occurs then this method will calculate the length of the stream
and insert it into the stream's dictionary.
$d->read_stream($force_memory)
Reads in a stream from a PDF file. If the stream is greater than "PDF::Dict::mincache" (defaults to 32768) bytes to be stored, then the
default action is to create a file for it somewhere and to use that file as a data cache. If $force_memory is set, this caching will not
occur and the data will all be stored in the $self->{' stream'} variable.
$d->val
Returns the dictionary, which is itself.
perl v5.14.2 2014-03-09 PDF::API2::Basic::PDF::Dict(3pm)