we have a problem
We have some binary files ~25GB. In this files are many (millions) PDF files included.
How we can extract them from such huge files? In small files I got it with the command:
so the PDF file begins with PDF-1.? and ends with %%EOF
but it don't works on such big files. So we need another way to extract them.
Hi,
I have a huge file of bibliographic records in some standard format.I need a script to do some repeatable task as follows:
1. Needs to create folders as the strings starts with "item_*" from the input file
2. Create a file "contents" in each folders having "license.txt(tab... (5 Replies)
Hello All,
I need some assistance to extract a piece of information from a huge file.
The file is like this one :
database information
ccccccccccccccccc
ccccccccccccccccc
ccccccccccccccccc
ccccccccccccccccc
os information
cccccccccccccccccc
cccccccccccccccccc... (2 Replies)
Hi, All
I have a huge file which has 450G. Its tab-delimited format is as below
x1 A 50020 1
x1 B 50021 8
x1 C 50022 9
x1 A 50023 10
x2 D 50024 5
x2 C 50025 7
x2 F 50026 8
x2 N 50027 1
:
:
Now, I want to extract a subset from this file. In this subset, column 1 is x10, column 2 is... (3 Replies)
Hi, all:
I've got two folders, say, "folder1" and "folder2".
Under each, there are thousands of files.
It's quite obvious that there are some files missing in each. I just would like to find them. I believe this can be done by "diff" command.
However, if I change the above question a... (1 Reply)
Hello Everyone,
I have a perl script that reads two types of data files (txt and XML). These data files are huge and large in number. I am using something like this :
foreach my $t (@text)
{
open TEXT, $t or die "Cannot open $t for reading: $!\n";
while(my $line=<TEXT>){
... (4 Replies)
Hi,
I'm trying to search for a particular phrase in a large number of PDFs in a particular directory.
What I've done so far only prints out the line, but I haven't been able to display in which file the phrase appears.
find . -name '*.pdf' -exec pdftotext {} - \; | grep "search phrase"
... (2 Replies)
I'm trying to remove duplicate data from an input file with unsorted data which is of size >50GB and write the unique records to a new file.
I'm trying and already tried out a variety of options posted in similar threads/forums. But no luck so far..
Any suggestions please ?
Thanks !! (9 Replies)
I have a huge list of files (about 300,000) which have a pattern like this.
.I 1
.U
87049087
.S
Am J Emerg
.M
Allied Health Personnel/*; Electric Countershock/*;
.T
Refibrillation managed by EMT-Ds:
.P
ARTICLE.
.W
Some patients converted from ventricular fibrillation to organized... (1 Reply)
I need bash script that monitor folders for new pdf files and create xml file for rss feed with newest files on the list. I have some script, but it reports errors.
#!/bin/bash
SYSDIR="/var/www/html/Intranet"
HTTPLINK="http://TYPE.IP.ADDRESS.HERE/pdfs"
FEEDTITLE="Najnoviji dokumenti na... (20 Replies)
Discussion started by: markus1981
20 Replies
LEARN ABOUT DEBIAN
pdftoipe
PDFTOIPE(1) General Commands Manual PDFTOIPE(1)NAME
pdftoipe - Convert PDF files into editable Ipe format
SYNOPSIS
pdftoipe { options } PDF file [ XML file ]
DESCRIPTION
pdftoipe converts arbitrary PDF files to Ipe's XML format.
Note that pdftoipe is not related to Ipe's use of the PDF file format. PDF files generated by Ipe contain an extra stream with Ipe markup
information, which is necessary for Ipe to read the file again. If you wish to convert an Ipe-generated PDF-file to XML format, you should
use ipetoipe -xml! pdftoipe is meant to allow you to take arbitrary PDF files and make them editable in Ipe.
pdftoipe does a pretty good job on drawings, but doesn't handle text very well. Ipe's text model is based on LaTeX, which is just very
different from the text found in most PDF files.
-notext
Ignore all text in the PDF file, convert graphics only
-literal
Allow Latex markup in text objects. The default is to escape all characters special in Latex.
-math Use LaTeX math mode for all text in the PDF file
-merge int
Set the text merge level, an integer between 0 (the default) and 2. It determines how eagerly pdftoipe tries to combine consecutive
text in the PDF document into a single Ipe text object. At level 0, only characters consecutively rendered in PDF are combined. At
level 1, more text is combined. At level 2, all text is combined until a path or image is drawn.
-unicode int
Determine what should be done with non-ASCII characters in text. At level 0, all non-ASCII characters are represented as [U+XXX].
At level 1 (the default), some often used characters (such as bullets) are replaced by Latex equivalents, others are represented as
[U+XXX]. At level 2, characters that are not replaced by Latex equivalents are included in UTF-8. At level 3, all characters are
included as UTF-8.
At level 2 and 3, UTF-8 is set as the input encoding in the Latex preamble of the generated Ipe document.
Note that this only concerns characters for which the PDF file provides a mapping to Unicode. Characters from embedded fonts with-
out Unicode mapping (such as symbol fonts) are always represented as [S+XX].
-f int First page to convert
-l int Last page to convert
-opw string
Owner password for encrypted PDF files
-upw string
User password for encrypted PDF files
-q Quiet mode (don't print any messages or errors)
AUTHOR
Otfried Cheong
REPORTING BUGS
Please report bugs at http://ipe7.sourceforge.net/bugzilla.html
SEE ALSO
More information about Ipe can be found in The Ipe Manual, available online at http://ipe7.sourceforge.net/manual/manual.html
October 13, 2009 PDFTOIPE(1)