wget pdf downloading problem


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers wget pdf downloading problem
# 1  
Old 01-05-2011
wget pdf downloading problem

Hi. I am trying to make a mirror of this free online journal:
Code:
http://www.informaworld.com/smpp/title~content=t716100758~db=all

Under the individual issues, the link location for the "Full Text PDF" does not have ".pdf" as an extension -- so when I use wget it misses the file. However clicking manually directs to the .pdf file.

Can someone suggest how I may overcome this problem, and make a mirror of the full journal (starting from the index page above)?

I've spent several days trying to figure this out, but hopefully I can manage it with a little help.

Thanks for your time Smilie
# 2  
Old 01-05-2011
Can you show the full command you are trying ? Are you using "wget -r" ?
# 3  
Old 01-05-2011
Hi. Yes, the commands I have tried are:
wget -r -l 2 url
wget -r -l 3 url

and also with 'url' and "url".

Essentially I want a direct mirror of the site -- so that I can click from the index page above, through to the html pages for the different "issues"; and then to the pdf files. Possibly I will need to expand the "volumes" before doing this?

I've been using this page as a guide:
Code:
http://linuxtuts.blogspot.com/2008/03/tutorials-on-wget.html

In fact I haven't managed to copy the html files yet; so obviously my understanding is wrong somewhere!
# 4  
Old 01-05-2011
There is no easy way to do what you want to do using wget. Looking at the source for that page would have shown you what is going on.

For example, consider the document entitled "Drought-tolerant plant growth promoting Bacillus ... ". The corresponding PDF file is "930332435.pdf" To retrieve that document you would have to parse this HTML code
Code:
<a target="_new" href="./ftinterface~db=all~content=a930332435~fulltext=713240930" title="Click to view the PDF fulltext"

to the extract the content tag, i.e. a930332435, and build a new URL which wget could then use to retrieve the document.

A good technique to prevent website scrapping!
# 5  
Old 01-05-2011
...and to prevent search-engine hits, too.
# 6  
Old 01-05-2011
Thanks for the help jpmurphy. It seems more difficult than I imagined; and sadly my knowledge is not up to the task!

I've found that I can get the pdf files by opening the individual issues and using the downthemall addon for firefox. The only problem is that they are not then in a nice "clickable" archive; and it is very time consuming for larger collections which I would also like to mirror.

But nevermind, I have a solution to the problem at least.

Thanks again for your time and help Smilie
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Wget for downloading a public file (stream) as mp4

I need a hint for using wget for getting a free content from a TV station that is streaming its material for a while until it appears on any video platform, that means no use of illegal methods, because it is on air, recently published and available. But reading the manual for wget I tried the... (5 Replies)
Discussion started by: 1in10
5 Replies

2. Shell Programming and Scripting

Download pdf's using wget convert to txt

wget -i genedx.txt The code above will download multiple pdf files from a site, but how can i download and convert these to .txt? I have attached the master list (genedx.txt - which contains the url and file names) as well as the two PDF's that are downloaded. I am trying to have those... (7 Replies)
Discussion started by: cmccabe
7 Replies

3. Shell Programming and Scripting

Wget error while downloading from https website

Hi, I would like to download a file from a https website. I don't have the file name as it changes every day. I am using the following command: wget --no-check-certificate -r -np --user=ABC --password=DEF -O temp.txt https://<website/directory> I am getting followin error in my... (9 Replies)
Discussion started by: pinnacle
9 Replies

4. Shell Programming and Scripting

Downloading with Wget

Hello everyone. I'm new both to the forum and to unix scripting, and this website has been very useful in putting together a script I am working on. However, I have run into a bit of a snag, which is why I have come here seeking help. First I will say what I am trying to do, and then what I have... (2 Replies)
Discussion started by: keltonhalbert
2 Replies

5. UNIX for Dummies Questions & Answers

Crontab Wget, downloading a file to a FTP

Hi there, I've got my own domain, ftp etc.. I'm using cPanel and I want to download a file periodically, every say 24 hours. I've used this command: wget -t inf http : / / www . somesite . com / webcam.jpg ftp : / / i @ MyDomain . net : Password @ ftp . MyDomain . net^no spaces... (24 Replies)
Discussion started by: zYx
24 Replies

6. Solaris

HTTP error while downloading solaris patches using wget

Hello, I am getting a HTTP error while downloading solaris patches using wget. 'Downloading unsigned patch 113096-03. --2010-06-18 03:51:15-- http://sunsolve.sun.com/pdownload.pl?target=113096-03&method=h Resolving sunsolve.sun.com (sunsolve.sun.com)... 192.18.108.40 Connecting to... (5 Replies)
Discussion started by: sunny_a_j
5 Replies

7. Shell Programming and Scripting

downloading form content as pdf

Hi All, I have a requirement of dowloading the dynamic form content displayed in a webpage as a pdf file. The form content is not too complex but intermediate - it has textboxes, images, textarea, radiobuttons,dropdowns etc. Can anyone suggest how i can achieve this? Your... (0 Replies)
Discussion started by: DILEEP410
0 Replies

8. Shell Programming and Scripting

Problem in Downloading one day old files from FTP site

HI, I'm downloading one day old files from ftp site. Below is my script ---------------------------- printf "open $HOST \n" > ftp.cmd printf "user $USER $PASSWD\n" >> ftp.cmd printf "bin\n" >> ftp.cmd #printf "cd /Models/\n" >> ftp.cmd printf "prompt\n" >> ftp.cmd printf "for oldfile... (4 Replies)
Discussion started by: shekhar_v4
4 Replies

9. Linux

Problem downloading linux kernel 2.6 on to AT91RM9200

Hi, I builded the linux kernel 2.6 with the following tool chain binutils:2.16 gcc:3.4.4 glibc:2.3.5 kernel:2.6.10 and applied the corresponding patches to it.I got the kernel Image.I downloaded the Image on to the AT91RM9200 board.But when i am booting the image it is showing the... (1 Reply)
Discussion started by: yugandhar
1 Replies

10. UNIX for Dummies Questions & Answers

Problem Downloading Red Hat

I've been having problems downloading Red Hat 7.2 from their FTP site. It downloads rather slowly(between 2-3k/sec, I'm on broadband) and after about 10 minutes stops downloading altogether. Am I doing something wrong? (2 Replies)
Discussion started by: Tradewynd
2 Replies
Login or Register to Ask a Question