we have a problem
We have some binary files ~25GB. In this files are many (millions) PDF files included.
How we can extract them from such huge files? In small files I got it with the command:
so the PDF file begins with PDF-1.? and ends with %%EOF
but it don't works on such big files. So we need another way to extract them.
Hi,
I have a huge file of bibliographic records in some standard format.I need a script to do some repeatable task as follows:
1. Needs to create folders as the strings starts with "item_*" from the input file
2. Create a file "contents" in each folders having "license.txt(tab... (5 Replies)
Hello All,
I need some assistance to extract a piece of information from a huge file.
The file is like this one :
database information
ccccccccccccccccc
ccccccccccccccccc
ccccccccccccccccc
ccccccccccccccccc
os information
cccccccccccccccccc
cccccccccccccccccc... (2 Replies)
Hi, All
I have a huge file which has 450G. Its tab-delimited format is as below
x1 A 50020 1
x1 B 50021 8
x1 C 50022 9
x1 A 50023 10
x2 D 50024 5
x2 C 50025 7
x2 F 50026 8
x2 N 50027 1
:
:
Now, I want to extract a subset from this file. In this subset, column 1 is x10, column 2 is... (3 Replies)
Hi, all:
I've got two folders, say, "folder1" and "folder2".
Under each, there are thousands of files.
It's quite obvious that there are some files missing in each. I just would like to find them. I believe this can be done by "diff" command.
However, if I change the above question a... (1 Reply)
Hello Everyone,
I have a perl script that reads two types of data files (txt and XML). These data files are huge and large in number. I am using something like this :
foreach my $t (@text)
{
open TEXT, $t or die "Cannot open $t for reading: $!\n";
while(my $line=<TEXT>){
... (4 Replies)
Hi,
I'm trying to search for a particular phrase in a large number of PDFs in a particular directory.
What I've done so far only prints out the line, but I haven't been able to display in which file the phrase appears.
find . -name '*.pdf' -exec pdftotext {} - \; | grep "search phrase"
... (2 Replies)
I'm trying to remove duplicate data from an input file with unsorted data which is of size >50GB and write the unique records to a new file.
I'm trying and already tried out a variety of options posted in similar threads/forums. But no luck so far..
Any suggestions please ?
Thanks !! (9 Replies)
I have a huge list of files (about 300,000) which have a pattern like this.
.I 1
.U
87049087
.S
Am J Emerg
.M
Allied Health Personnel/*; Electric Countershock/*;
.T
Refibrillation managed by EMT-Ds:
.P
ARTICLE.
.W
Some patients converted from ventricular fibrillation to organized... (1 Reply)
I need bash script that monitor folders for new pdf files and create xml file for rss feed with newest files on the list. I have some script, but it reports errors.
#!/bin/bash
SYSDIR="/var/www/html/Intranet"
HTTPLINK="http://TYPE.IP.ADDRESS.HERE/pdfs"
FEEDTITLE="Najnoviji dokumenti na... (20 Replies)
Discussion started by: markus1981
20 Replies
LEARN ABOUT DEBIAN
archive::any
Archive::Any(3pm) User Contributed Perl Documentation Archive::Any(3pm)NAME
Archive::Any - Single interface to deal with file archives.
SYNOPSIS
use Archive::Any;
my $archive = Archive::Any->new($archive_file);
my @files = $archive->files;
$archive->extract;
my $type = $archive->type;
$archive->is_impolite;
$archive->is_naughty;
DESCRIPTION
This module is a single interface for manipulating different archive formats. Tarballs, zip files, etc.
new
my $archive = Archive::Any->new($archive_file);
my $archive = Archive::Any->new($archive_file, $type);
$type is optional. It lets you force the file type in-case Archive::Any can't figure it out.
extract
$archive->extract;
$archive->extract($directory);
Extracts the files in the archive to the given $directory. If no $directory is given, it will go into the current working directory.
files
my @file = $archive->files;
A list of files in the archive.
mime_type
my $mime_type = $archive->mime_type();
Returns the mime type of the archive.
is_impolite
my $is_impolite = $archive->is_impolite;
Checks to see if this archive is going to unpack into the current directory rather than create its own.
is_naughty
my $is_naughty = $archive->is_naughty;
Checks to see if this archive is going to unpack outside the current directory.
DEPRECATED
type
my $type = $archive->type;
Returns the type of archive. This method is provided for backwards compatibility in the Tar and Zip plugins and will be going away
soon in favor of "mime_type".
PLUGINS
For detailed information on writing plugins to work with Archive::Any, please see the pod documentation for Archive::Any::Plugin.
AUTHOR
Clint Moore <cmoore@cpan.org>
AUTHOR EMERITUS
Michael G Schwern
SEE ALSO
Archive::Any::Plugin
SUPPORT
You can find documentation for this module with the perldoc command.
perldoc Archive::Any
You can also look for information at:
o AnnoCPAN: Annotated CPAN documentation
<http://annocpan.org/dist/Archive-Any>
o CPAN Ratings
<http://cpanratings.perl.org/d/Archive-Any>
o RT: CPAN's request tracker
<http://rt.cpan.org/NoAuth/Bugs.html?Dist=Archive-Any>
o Search CPAN
<http://search.cpan.org/dist/Archive-Any>
LICENSE
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
See <http://www.perl.com/perl/misc/Artistic.html>
perl v5.10.0 2008-06-25 Archive::Any(3pm)