Sponsored Content
Top Forums Shell Programming and Scripting PDF Script to extract PDF Links MOD in Need Post 302902270 by danielldf on Monday 19th of May 2014 04:55:46 PM
Old 05-19-2014
PDF Script to extract PDF Links MOD in Need

In here we have a script to extract all pdf links from a single page.. any idea's in how make this read instead of a page a list of pages.. and extract all pdf links ?

Code:
#!/bin/bash

# NAME:         pdflinkextractor
# AUTHOR:       Glutanimate (http://askubuntu.com/users/81372/), 2013
# LICENSE:      GNU GPL v2
# DEPENDENCIES: wget lynx
# DESCRIPTION:  extracts PDF links from websites and dumps them to the stdout and as a textfile
#               only works for links pointing to files with the ".pdf" extension
#
# USAGE:        pdflinkextractor "www.website.com"

WEBSITE="$1"

echo "Getting link list..."

lynx -cache=0 -dump -listonly "$WEBSITE" | grep ".*\.pdf$" | awk '{print $2}' | tee pdflinks.txt

# OPTIONAL
#
# DOWNLOAD PDF FILES
#
#echo "Downloading..."    
#wget -P pdflinkextractor_files/ -i pdflinks.txt

 

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Script To dlete PDF file s and Folders

Hi We have to delete PDF files and Folders older than five days .Can anyone help with the shell script Regards Ved (10 Replies)
Discussion started by: ved123
10 Replies

2. Shell Programming and Scripting

Regarding Shell Script References,PDF and Tutorials

Hi, Could you pls guide me a reference materials or PDF or Tutorials link for Shell Scripting.I'm new to Unix Shell Scripting.want to explore as much as possible in Shell Scripting.... Thanks Sollins (2 Replies)
Discussion started by: sollins
2 Replies

3. Shell Programming and Scripting

Extract Table from PDF

Hi Guys! I want to extract table from PDF in HTML. Can we do this using Shell script....??. Please provide me your suggestions. Any help will be highly appreciated. Thanks! (2 Replies)
Discussion started by: parshant_bvcoe
2 Replies

4. Shell Programming and Scripting

Perl - Convert html to pdf - PDF::FromHTML

Hi, I am trying to convert html to pdf using perl module PDF::FromHTML, am getting the error as given below. not well-formed (invalid token) at line 2, column 17, byte 56 at C:/Perl/lib/XML/Parser.pm line 187 at C:/Perl/site/lib/PDF/FromHTML.pm line 140 The perl code is as given... (2 Replies)
Discussion started by: DILEEP410
2 Replies

5. Shell Programming and Scripting

Script for converting a pdf to book format

Hello, excuse my English... I'm trying to do a nautilus-script to transform a normal A4 pdf to another pdf with book format, ready to be printed (double sided). I mean, the script put pages in order and also put 2 pages per horizontal A4 page (p.e.: a pdf with 8 pages would look like: 8-1, 2-7,... (2 Replies)
Discussion started by: dokan
2 Replies

6. Programming

help me with perl script that creat pdf

Hi, I have one xml file, I extracted some comments and saved in pdf file.I written code like this #!/usr/bin/perl use warnings; use strict; use PDF::API2; use PDF::API2::Page; use XML::LibXML::Reader; use Data::Dumper; my $file; open( $file, 'formal.xml'); my $reader =... (1 Reply)
Discussion started by: veerubiji
1 Replies

7. Shell Programming and Scripting

Shell Script to Dynamically Extract file content based on Parameters from a pdf file

Hi Guru's, I am new to shell scripting. I have a unique requirement: The system generates a single pdf(/tmp/ABC.pdf) file with Invoices for Multiple Customers, the format is something like this: Page1 >> Customer 1 >>Invoice1 + invoice 2 >> Page1 end Page2 >> Customer 2 >>Invoice 3 + Invoice 4... (3 Replies)
Discussion started by: DIps
3 Replies

8. Shell Programming and Scripting

Converting secured pdf files to pdf using acroread

Does anybody have idea of Converting secured pdf files to pdf using acroread ? ---------- Post updated at 04:49 PM ---------- Previous update was at 04:44 PM ---------- This file is not password protected. (4 Replies)
Discussion started by: Soham
4 Replies

9. Shell Programming and Scripting

Perl to extract from a pdf

The below perl script produces the metrics.txt below using the run.txt as the input. perl -ne 'BEGIN{print join("\t","R_Index", "ISP Loading", "Pre-Enrichment", "Total Reads", "Read Length", "Key Signal", "Usable Sequence", "Enrichment", "Polyclonal" ,"Low Quality" ,"Test Fragment", "Aligned... (2 Replies)
Discussion started by: cmccabe
2 Replies
PDF::API2::Basic::PDF::Page(3pm)			User Contributed Perl Documentation			  PDF::API2::Basic::PDF::Page(3pm)

NAME
PDF::API2::Basic::PDF::Page - Represents a PDF page, inherits from PDF::API2::Basic::PDF::Pages DESCRIPTION
Represents a page of output in PDF. It also keeps track of the content stream, any resources (such as fonts) being switched, etc. Page inherits from Pages due to a number of shared methods. They are really structurally quite different. INSTANCE VARIABLES
A page has various working variables: curstrm The currently open stream METHODS
PDF::API2::Basic::PDF::Page->new($pdf, $parent, $index) Creates a new page based on a pages object (perhaps the root object). The page is also added to the parent at this point, so pages are ordered in a PDF document in the order in which they are created rather than in the order they are closed. Only the essential elements in the page dictionary are created here, all others are either optional or can be inherited. The optional index value indicates the index in the parent list that this page should be inserted (so that new pages need not be appended) $p->add($str) Adds the string to the currently active stream for this page. If no stream exists, then one is created and added to the list of streams for this page. The slightly cryptic name is an aim to keep it short given the number of times people are likely to have to type it. $p->ship_out($pdf) Ships the page out to the given output file context perl v5.14.2 2011-03-10 PDF::API2::Basic::PDF::Page(3pm)
All times are GMT -4. The time now is 10:37 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy