Sponsored Content
Top Forums Shell Programming and Scripting PDF Script to extract PDF Links MOD in Need Post 302902270 by danielldf on Monday 19th of May 2014 04:55:46 PM
Old 05-19-2014
PDF Script to extract PDF Links MOD in Need

In here we have a script to extract all pdf links from a single page.. any idea's in how make this read instead of a page a list of pages.. and extract all pdf links ?

Code:
#!/bin/bash

# NAME:         pdflinkextractor
# AUTHOR:       Glutanimate (http://askubuntu.com/users/81372/), 2013
# LICENSE:      GNU GPL v2
# DEPENDENCIES: wget lynx
# DESCRIPTION:  extracts PDF links from websites and dumps them to the stdout and as a textfile
#               only works for links pointing to files with the ".pdf" extension
#
# USAGE:        pdflinkextractor "www.website.com"

WEBSITE="$1"

echo "Getting link list..."

lynx -cache=0 -dump -listonly "$WEBSITE" | grep ".*\.pdf$" | awk '{print $2}' | tee pdflinks.txt

# OPTIONAL
#
# DOWNLOAD PDF FILES
#
#echo "Downloading..."    
#wget -P pdflinkextractor_files/ -i pdflinks.txt

 

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Script To dlete PDF file s and Folders

Hi We have to delete PDF files and Folders older than five days .Can anyone help with the shell script Regards Ved (10 Replies)
Discussion started by: ved123
10 Replies

2. Shell Programming and Scripting

Regarding Shell Script References,PDF and Tutorials

Hi, Could you pls guide me a reference materials or PDF or Tutorials link for Shell Scripting.I'm new to Unix Shell Scripting.want to explore as much as possible in Shell Scripting.... Thanks Sollins (2 Replies)
Discussion started by: sollins
2 Replies

3. Shell Programming and Scripting

Extract Table from PDF

Hi Guys! I want to extract table from PDF in HTML. Can we do this using Shell script....??. Please provide me your suggestions. Any help will be highly appreciated. Thanks! (2 Replies)
Discussion started by: parshant_bvcoe
2 Replies

4. Shell Programming and Scripting

Perl - Convert html to pdf - PDF::FromHTML

Hi, I am trying to convert html to pdf using perl module PDF::FromHTML, am getting the error as given below. not well-formed (invalid token) at line 2, column 17, byte 56 at C:/Perl/lib/XML/Parser.pm line 187 at C:/Perl/site/lib/PDF/FromHTML.pm line 140 The perl code is as given... (2 Replies)
Discussion started by: DILEEP410
2 Replies

5. Shell Programming and Scripting

Script for converting a pdf to book format

Hello, excuse my English... I'm trying to do a nautilus-script to transform a normal A4 pdf to another pdf with book format, ready to be printed (double sided). I mean, the script put pages in order and also put 2 pages per horizontal A4 page (p.e.: a pdf with 8 pages would look like: 8-1, 2-7,... (2 Replies)
Discussion started by: dokan
2 Replies

6. Programming

help me with perl script that creat pdf

Hi, I have one xml file, I extracted some comments and saved in pdf file.I written code like this #!/usr/bin/perl use warnings; use strict; use PDF::API2; use PDF::API2::Page; use XML::LibXML::Reader; use Data::Dumper; my $file; open( $file, 'formal.xml'); my $reader =... (1 Reply)
Discussion started by: veerubiji
1 Replies

7. Shell Programming and Scripting

Shell Script to Dynamically Extract file content based on Parameters from a pdf file

Hi Guru's, I am new to shell scripting. I have a unique requirement: The system generates a single pdf(/tmp/ABC.pdf) file with Invoices for Multiple Customers, the format is something like this: Page1 >> Customer 1 >>Invoice1 + invoice 2 >> Page1 end Page2 >> Customer 2 >>Invoice 3 + Invoice 4... (3 Replies)
Discussion started by: DIps
3 Replies

8. Shell Programming and Scripting

Converting secured pdf files to pdf using acroread

Does anybody have idea of Converting secured pdf files to pdf using acroread ? ---------- Post updated at 04:49 PM ---------- Previous update was at 04:44 PM ---------- This file is not password protected. (4 Replies)
Discussion started by: Soham
4 Replies

9. Shell Programming and Scripting

Perl to extract from a pdf

The below perl script produces the metrics.txt below using the run.txt as the input. perl -ne 'BEGIN{print join("\t","R_Index", "ISP Loading", "Pre-Enrichment", "Total Reads", "Read Length", "Key Signal", "Usable Sequence", "Enrichment", "Polyclonal" ,"Low Quality" ,"Test Fragment", "Aligned... (2 Replies)
Discussion started by: cmccabe
2 Replies
PS2PDF(1)							    Ghostscript 							 PS2PDF(1)

NAME
ps2pdf - Convert PostScript to PDF using ghostscript ps2pdf12 - Convert PostScript to PDF 1.2 (Acrobat 3-and-later compatible) using ghostscript ps2pdf13 - Convert PostScript to PDF 1.3 (Acrobat 4-and-later compatible) using ghostscript ps2pdf14 - Convert PostScript to PDF 1.4 (Acrobat 5-and-later compatible) using ghostscript SYNOPSIS
ps2pdf [options...] {input.[e]ps|-} [output.pdf|-] ps2pdf12 [options...] {input.[e]ps|-} [output.pdf|-] ps2pdf13 [options...] {input.[e]ps|-} [output.pdf|-] ps2pdf14 [options...] {input.[e]ps|-} [output.pdf|-] DESCRIPTION
The ps2pdf scripts are work-alikes for nearly all the functionality (but not the user interface) of Adobe's Acrobat(TM) Distiller(TM) prod- uct: they convert PostScript files to Portable Document Format (PDF) files. If the output filename is not specified, the output is placed in a file of the same name with a '.pdf' extension in the current working directory. Either the input filename or the output filename can be '-' to request reading from stdin or writing to stdout, respectively, when used as a filter. The three scripts differ as follows: - ps2pdf12 will always produce PDF 1.2 output (Acrobat 3-and-later compatible). - ps2pdf13 will always produce PDF 1.3 output (Acrobat 4-and-later compatible). - ps2pdf14 will always produce PDF 1.4 output (Acrobat 5-and-later compatible). - ps2pdf per se currently produces PDF 1.4 output. However, this may change in the future. If you care about the compatibility level of the output, use ps2pdf12, ps2pdf13 or ps2pdf14, or use the -dCompatibility=1.x switch in the command line. There are some limitations in ps2pdf's conversion. See the HTML documentation for more information. A large number of Adobe Distiller(TM) parameters which can be used to control the conversion are also documented there, including instructions for generating PDF/X and PDF/A documents. OPTIONS
The ps2pdf scripts use the same options as gs(1). EXAMPLES
Converting a figure.ps to figure.pdf: ps2pdf figure.ps A conversion with more specifics: ps2pdf -dPDFSETTINGS=/prepress figure.ps proof.pdf Converting as part of a pipe: make_report.pl -t ps | ps2pdf -dCompatibility=1.3 - - | lpr SEE ALSO
gs(1), ps2pdfwr(1), Ps2pdf.htm in the Ghostscript documentation BUGS
See http://bugs.ghostscript.com/ and the Usenet news group comp.lang.postscript. VERSION
This document was last revised for Ghostscript version 9.07. AUTHOR
Artifex Software, Inc. are the primary maintainers of Ghostscript. This manpage by George Ferguson. 9.07 12 February 2013 PS2PDF(1)
All times are GMT -4. The time now is 01:27 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy