Sponsored Content
Top Forums Shell Programming and Scripting PDF Script to extract PDF Links MOD in Need Post 302902270 by danielldf on Monday 19th of May 2014 04:55:46 PM
Old 05-19-2014
PDF Script to extract PDF Links MOD in Need

In here we have a script to extract all pdf links from a single page.. any idea's in how make this read instead of a page a list of pages.. and extract all pdf links ?

Code:
#!/bin/bash

# NAME:         pdflinkextractor
# AUTHOR:       Glutanimate (http://askubuntu.com/users/81372/), 2013
# LICENSE:      GNU GPL v2
# DEPENDENCIES: wget lynx
# DESCRIPTION:  extracts PDF links from websites and dumps them to the stdout and as a textfile
#               only works for links pointing to files with the ".pdf" extension
#
# USAGE:        pdflinkextractor "www.website.com"

WEBSITE="$1"

echo "Getting link list..."

lynx -cache=0 -dump -listonly "$WEBSITE" | grep ".*\.pdf$" | awk '{print $2}' | tee pdflinks.txt

# OPTIONAL
#
# DOWNLOAD PDF FILES
#
#echo "Downloading..."    
#wget -P pdflinkextractor_files/ -i pdflinks.txt

 

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Script To dlete PDF file s and Folders

Hi We have to delete PDF files and Folders older than five days .Can anyone help with the shell script Regards Ved (10 Replies)
Discussion started by: ved123
10 Replies

2. Shell Programming and Scripting

Regarding Shell Script References,PDF and Tutorials

Hi, Could you pls guide me a reference materials or PDF or Tutorials link for Shell Scripting.I'm new to Unix Shell Scripting.want to explore as much as possible in Shell Scripting.... Thanks Sollins (2 Replies)
Discussion started by: sollins
2 Replies

3. Shell Programming and Scripting

Extract Table from PDF

Hi Guys! I want to extract table from PDF in HTML. Can we do this using Shell script....??. Please provide me your suggestions. Any help will be highly appreciated. Thanks! (2 Replies)
Discussion started by: parshant_bvcoe
2 Replies

4. Shell Programming and Scripting

Perl - Convert html to pdf - PDF::FromHTML

Hi, I am trying to convert html to pdf using perl module PDF::FromHTML, am getting the error as given below. not well-formed (invalid token) at line 2, column 17, byte 56 at C:/Perl/lib/XML/Parser.pm line 187 at C:/Perl/site/lib/PDF/FromHTML.pm line 140 The perl code is as given... (2 Replies)
Discussion started by: DILEEP410
2 Replies

5. Shell Programming and Scripting

Script for converting a pdf to book format

Hello, excuse my English... I'm trying to do a nautilus-script to transform a normal A4 pdf to another pdf with book format, ready to be printed (double sided). I mean, the script put pages in order and also put 2 pages per horizontal A4 page (p.e.: a pdf with 8 pages would look like: 8-1, 2-7,... (2 Replies)
Discussion started by: dokan
2 Replies

6. Programming

help me with perl script that creat pdf

Hi, I have one xml file, I extracted some comments and saved in pdf file.I written code like this #!/usr/bin/perl use warnings; use strict; use PDF::API2; use PDF::API2::Page; use XML::LibXML::Reader; use Data::Dumper; my $file; open( $file, 'formal.xml'); my $reader =... (1 Reply)
Discussion started by: veerubiji
1 Replies

7. Shell Programming and Scripting

Shell Script to Dynamically Extract file content based on Parameters from a pdf file

Hi Guru's, I am new to shell scripting. I have a unique requirement: The system generates a single pdf(/tmp/ABC.pdf) file with Invoices for Multiple Customers, the format is something like this: Page1 >> Customer 1 >>Invoice1 + invoice 2 >> Page1 end Page2 >> Customer 2 >>Invoice 3 + Invoice 4... (3 Replies)
Discussion started by: DIps
3 Replies

8. Shell Programming and Scripting

Converting secured pdf files to pdf using acroread

Does anybody have idea of Converting secured pdf files to pdf using acroread ? ---------- Post updated at 04:49 PM ---------- Previous update was at 04:44 PM ---------- This file is not password protected. (4 Replies)
Discussion started by: Soham
4 Replies

9. Shell Programming and Scripting

Perl to extract from a pdf

The below perl script produces the metrics.txt below using the run.txt as the input. perl -ne 'BEGIN{print join("\t","R_Index", "ISP Loading", "Pre-Enrichment", "Total Reads", "Read Length", "Key Signal", "Usable Sequence", "Enrichment", "Polyclonal" ,"Low Quality" ,"Test Fragment", "Aligned... (2 Replies)
Discussion started by: cmccabe
2 Replies
PODOFOINCREMENTALUPDATES(1)				     podofoincrementalupdates				       PODOFOINCREMENTALUPDATES(1)

NAME
podofoincrementalupdates - Provides information about incremental updates in PDF files SYNOPSIS
podofoincrementalupdates [-e N out.pdf] file.pdf DESCRIPTION
podofoincrementalupdates is one of the command line tools from the PoDoFo library that provide several useful operations to work with PDF files. It can print information of incremental updates to file.pdf. By default the number of incremental updates will be printed. OPTIONS
-e N Extract the Nth update out.pdf Output PDF file. file.pdf Input PDF file. SEE ALSO
podofobox(1), podofocountpages(1), podofocrop(1), podofoencrypt(1), podofoimg2pdf(1), podofoimgextract(1), podofoimpose(1), podofomerge(1), podofopages(1), podofopdfinfo(1), podofotxt2pdf(1), podofotxtextract(1), podofouncompress(1), podofoxmp(1) AUTHORS
PoDoFo is written by Dominik Seichter <domseichter@web.de> and others. This manual page was written by Oleksandr Moskalenko <malex@debian.org> for the Debian Project (but may be used by others). PoDoFo 2010-12-09 PODOFOINCREMENTALUPDATES(1)
All times are GMT -4. The time now is 10:07 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy