I am using wget to crawl a website using the following command:
What I have found is that after two days of crawling some links are still not downloaded. For example, if some page has 10 links in it as anchor texts some link are downloaded but most are not. I want to use wget to follow crawling by extracting links and then crawl accordingly based on the links from a page. Here's a clear idea:
I start with page1.html and there are 10 hyperlinks. I extract all those hyperlinks (of course save page1.html locally) and proceed with those hyperlinks next one by one and keep downloading the web pages based on hyperlinks from all the new pages. I want to limit myself to one external site, else I'll run out of disk space. Is there any way of doing this. Hope I make some sense.
Hi,
I would like to download a file from a https website. I don't have the file name as it changes every day.
I am using the following command:
wget --no-check-certificate -r -np --user=ABC --password=DEF -O temp.txt https://<website/directory>
I am getting followin error in my... (9 Replies)
Need a help with a simple(i hope) script that would get a website location from stdin and then check all the links that site contains for some random regular expression ,and then save the links name and the expression found in some random file.Any help would be really helpfull.
Considerin i`m... (5 Replies)
Hello,
In using wget with the -k option to convert links to relative URLs, I am finding that not all the links get converted in a recursive download, and when downloading a single file, none of them do. I am assuming that this is because wget will only convert those URLs for files it has... (1 Reply)