You could trim this down to avoid using so many temporary files.
The first sed command is somewhat more specific than just grep ,pdf -- instead of accepting any character (sic) followed by "pdf" anywhere in the file name, it looks specifically for .pdf at the end of the line. Maybe that's not what you want; if so, take out the $ perhaps.
Quote:
Originally Posted by Dave Stockdale
Also, you could run a single find here; that should reduce running time significantly if the directory tree is big.
(The wrapping with a backslash is insignificant; I just did that here to avoid getting a very wide forum posting.)
I have a file that is 20 - 80+ MB in size that is a certain type of log file.
It logs one of our processes and this process is multi-threaded. Therefore the log file is kind of a mess. Here's an example:
The logfile looks like: "DATE TIME - THREAD ID - Details", and a new file is created... (4 Replies)
I'm trying to make a simple search script but cannot get it right. The script should search for keywords inside files. Then return the file paths in a variable. (Each file path separated with \n).
#!/bin/bash
SEARCHQUERY="searchword1 searchword2 searchword3";
for WORD in $SEARCHQUERY
do
... (6 Replies)
Hi,
I have a text file with data in that I wish to extract, assign to a variable and process through a loop.
Kind of the process that I am after:
1: Grep the text file for the values.
Currently using:
cat /root/test.txt | grep TESTING= | awk -F"=" '{ a = $2 } {print a}' | sort -u
... (0 Replies)
Hi all,
I have problem with searching hundreds of CSV files, the problem is that search is lasting too long (over 5min).
Csv files are "," delimited, and have 30 fields each line, but I always grep same 4 fields - so is there a way to grep just those 4 fields to speed-up search.
Example:... (11 Replies)
Hello,
I am using sed in a for loop to replace text in a 100MB file. I have about 55,000 entries to convert in a csv file with two entries per line. The following script works to search file.txt for the first field from conversion.csv and then replace it with the second field. While it works fine,... (15 Replies)
This is my first experience writing unix script. I've created the following script. It does what I want it to do, but I need it to be a lot faster. Is there any way to speed it up?
cat 'Tax_Provision_Sample.dat' | sort | while read p; do fn=`echo $p|cut -d~ -f2,4,3,8,9`; echo $p >> "$fn.txt";... (20 Replies)
Dear all,
Please help with the following.
I have a file, let's call it data.txt, that has 3 columns and approx 700,000 lines, and looks like this:
rs1234 A C
rs1236 T G
rs2345 G T
Please use code tags as required by forum rules!
I have a second file, called reference.txt,... (1 Reply)
HI Guys hoping some one can help
I have two files on both containing uk phone numbers
master is a file which has been collated over a few years ad currently contains around 4 million numbers
new is a file which also contains 4 million number i need to split new nto two separate files... (4 Replies)
Hi,
I've written a ksh script that read a file and parse/filter/format each line. The script runs as expected but it runs for 24+ hours for a file that has 2million lines. And sometimes, the input file has 10million lines which means it can be running for more than 2 days and still not finish.... (9 Replies)
Hello experts,
we have input files with 700K lines each (one generated for every hour). and we need to convert them as below and move them to another directory once.
Sample INPUT:-
# cat test1
1559205600000,8474,NormalizedPortInfo,PctDiscards,0.0,Interface,BG-CTA-AX1.test.com,Vl111... (7 Replies)
Discussion started by: prvnrk
7 Replies
LEARN ABOUT DEBIAN
pdf::api2::basic::pdf::pages
PDF::API2::Basic::PDF::Pages(3pm) User Contributed Perl Documentation PDF::API2::Basic::PDF::Pages(3pm)NAME
PDF::API2::Basic::PDF::Pages - a PDF pages hierarchical element. Inherits from PDF::API2::Basic::PDF::Dict
DESCRIPTION
A Pages object is the parent to other pages objects or to page objects themselves.
METHODS
PDF::API2::Basic::PDF::Pages->new($pdfs,$parent)
This creates a new Pages object. Notice that $parent here is not the file context for the object but the parent pages object for this
pages. If we are using this class to create a root node, then $parent should point to the file context, which is identified by not having a
Type of Pages. $pdfs is the file object (or objects) in which to create the new Pages object.
$p->out_obj($isnew)
Tells all the files that this thing is destined for that they should output this object come time to output. If this object has no parent,
then it must be the root. So set as the root for the files in question and tell it to be output too. If $isnew is set, then call new_obj
rather than out_obj to create as a new object in the file.
$p->find_page($pnum)
Returns the given page, using the page count values in the pages tree. Pages start at 0.
$p->add_page($page, $pnum)
Inserts the page before the given $pnum. $pnum can be -ve to count from the END of the document. -1 is after the last page. Likewise $pnum
can be greater than the number of pages currently in the document, to append.
This method only guarantees to provide a reasonable pages tree if pages are appended or prepended to the document. Pages inserted in the
middle of the document may simply be inserted in the appropriate leaf in the pages tree without adding any new branches or leaves. To tidy
up such a mess, it is best to call $p->rebuild_tree to rebuild the pages tree into something efficient.
$root_pages = $p->rebuild_tree([@pglist])
Rebuilds the pages tree to make a nice balanced tree that conforms to Adobe recommendations. If passed a pglist then the tree is built for
that list of pages. No check is made of whether the pglist contains pages.
Returns the top of the tree for insertion in the root object.
@pglist = $p->get_pages
Returns a list of page objects in the document in page order
$p->find_prop($key)
Searches up through the inheritance tree to find a property.
$p->add_font($pdf, $font)
Creates or edits the resource dictionary at this level in the hierarchy. If the font is already supported even through the hierarchy, then
it is not added.
$p->bbox($xmin, $ymin, $xmax, $ymax, [$param])
Specifies the bounding box for this and all child pages. If the values are identical to those inherited then no change is made. $param
specifies the attribute name so that other 'bounding box'es can be set with this method.
$p->proc_set(@entries)
Ensures that the current resource contains all the entries in the proc_sets listed. If necessary it creates a local resource dictionary to
achieve this.
$p->get_top
Returns the top of the pages tree
perl v5.14.2 2011-03-10 PDF::API2::Basic::PDF::Pages(3pm)