we have a problem
We have some binary files ~25GB. In this files are many (millions) PDF files included.
How we can extract them from such huge files? In small files I got it with the command:
so the PDF file begins with PDF-1.? and ends with %%EOF
but it don't works on such big files. So we need another way to extract them.
Hi,
I have a huge file of bibliographic records in some standard format.I need a script to do some repeatable task as follows:
1. Needs to create folders as the strings starts with "item_*" from the input file
2. Create a file "contents" in each folders having "license.txt(tab... (5 Replies)
Hello All,
I need some assistance to extract a piece of information from a huge file.
The file is like this one :
database information
ccccccccccccccccc
ccccccccccccccccc
ccccccccccccccccc
ccccccccccccccccc
os information
cccccccccccccccccc
cccccccccccccccccc... (2 Replies)
Hi, All
I have a huge file which has 450G. Its tab-delimited format is as below
x1 A 50020 1
x1 B 50021 8
x1 C 50022 9
x1 A 50023 10
x2 D 50024 5
x2 C 50025 7
x2 F 50026 8
x2 N 50027 1
:
:
Now, I want to extract a subset from this file. In this subset, column 1 is x10, column 2 is... (3 Replies)
Hi, all:
I've got two folders, say, "folder1" and "folder2".
Under each, there are thousands of files.
It's quite obvious that there are some files missing in each. I just would like to find them. I believe this can be done by "diff" command.
However, if I change the above question a... (1 Reply)
Hello Everyone,
I have a perl script that reads two types of data files (txt and XML). These data files are huge and large in number. I am using something like this :
foreach my $t (@text)
{
open TEXT, $t or die "Cannot open $t for reading: $!\n";
while(my $line=<TEXT>){
... (4 Replies)
Hi,
I'm trying to search for a particular phrase in a large number of PDFs in a particular directory.
What I've done so far only prints out the line, but I haven't been able to display in which file the phrase appears.
find . -name '*.pdf' -exec pdftotext {} - \; | grep "search phrase"
... (2 Replies)
I'm trying to remove duplicate data from an input file with unsorted data which is of size >50GB and write the unique records to a new file.
I'm trying and already tried out a variety of options posted in similar threads/forums. But no luck so far..
Any suggestions please ?
Thanks !! (9 Replies)
I have a huge list of files (about 300,000) which have a pattern like this.
.I 1
.U
87049087
.S
Am J Emerg
.M
Allied Health Personnel/*; Electric Countershock/*;
.T
Refibrillation managed by EMT-Ds:
.P
ARTICLE.
.W
Some patients converted from ventricular fibrillation to organized... (1 Reply)
I need bash script that monitor folders for new pdf files and create xml file for rss feed with newest files on the list. I have some script, but it reports errors.
#!/bin/bash
SYSDIR="/var/www/html/Intranet"
HTTPLINK="http://TYPE.IP.ADDRESS.HERE/pdfs"
FEEDTITLE="Najnoviji dokumenti na... (20 Replies)
Discussion started by: markus1981
20 Replies
LEARN ABOUT DEBIAN
cam::pdf::decrypt
CAM::PDF::Decrypt(3pm) User Contributed Perl Documentation CAM::PDF::Decrypt(3pm)NAME
CAM::PDF::Decrypt - PDF security helper
LICENSE
See CAM::PDF.
SYNOPSIS
use CAM::PDF;
my $pdf = CAM::PDF->new($filename);
DESCRIPTION
This class is used invisibly by CAM::PDF whenever it detects that a document is encrypted. See new(), getPrefs() and setPrefs() in that
module.
FUNCTIONS
$pkg->new($pdf, $ownerpass, $userpass, $prompt)
Create and validate a new decryption object. If this fails, it will set $CAM::PDF::errstr and return undef.
$prompt is a boolean that says whether the user should be prompted for a password on the command line.
$self->decode_permissions($field)
Given a binary encoded permissions string from a PDF document, return the four individual boolean fields as an array:
print boolean
modify boolean
copy boolean
add boolean
$self->encode_permissions($print, $modify, $copy, $add)
Given four booleans, pack them into a single field in the PDF style that decode_permissions can understand. Returns that scalar.
$self->set_passwords($doc, $ownerpass, $userpass)
$self->set_passwords($doc, $ownerpass, $userpass, $permissions)
Change the PDF passwords to the specified values. When the PDF is output, it will be encrypted with the new passwords.
PERMISSIONS is an optional scalar of the form that decode_permissions can understand. If not specified, the existing values will be
retained.
Note: we only support writing using encryption version 1, even though we can read encryption version 2 as well.
$self->encrypt($doc, $string)
Encrypt the scalar using the passwords previously specified.
$self->decrypt($doc, $string)
Decrypt the scalar using the passwords previously specified.
AUTHOR
See CAM::PDF
perl v5.14.2 2012-07-08 CAM::PDF::Decrypt(3pm)