Wget help


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Wget help
# 1  
Old 03-24-2014
Wget help

How can I download only *.zip and *.rar files from a website <index> who has multiple directories in root parent directory?

I need wget to crawl every directory and download only zip and rar files. Is there anyway I could do it?
# 2  
Old 03-24-2014
Depending on your OS, did you try something like:

Code:
 wget -m --accept=zip,rar http://example.com/

From the wget man page:

Code:
 Recursive Accept/Reject Options
       -A acclist --accept acclist
       -R rejlist --reject rejlist
	   Specify comma-separated lists of file name suffixes or patterns to accept or reject.
	   Note that if any of the wildcard characters, *, ?, [ or ], appear in an element of
	   acclist or rejlist, it will be treated as a pattern, rather than a suffix.

# 3  
Old 03-24-2014
Tried your way but seems like its not working as I want to. It creates the directory paths. All I want to is to download only the .zip and .rar files without the directory paths.

I'm running CentOS 6.4

---------- Post updated at 01:48 PM ---------- Previous update was at 01:16 PM ----------

wget -r -U Mozilla -t 1 -nd -A zip "http://www.xxx.xxx" -e robots=off

isnt working (googled it)

Output:

Code:
--2014-03-24 18:47:40--  http://ccc/i/?C=N;O=D
Connecting to www.xxxx.xxxx|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: âindex.html?C=N;O=Dâ

    [      <=>                                                                                                           ] 135,018      130K/s   in 1.0s

2014-03-24 18:47:42 (130 KB/s) - âindex.html?C=N;O=Dâ

                                                     Removing index.html?C=N;O=D since it should be rejected.

# 4  
Old 03-24-2014
How about crawling the whole server, retrieving all findings as a list only.
In the next step, parse the list for your files (zip,rar) and then download those findings to current dir.

eg something like:
Code:
wget -r --spider example.com > retrieved.txt
for FOUND in $(grep -e rar -e zip retrieved.txt);do
  wget -nc $FOUND > $(basename $FOUND)
done

# 5  
Old 03-24-2014
Isnt working like that either. Try it yourself. Page is:

301 Moved Permanently
# 6  
Old 03-24-2014
Quote:
Originally Posted by galford
Tried your way but seems like its not working as I want to. It creates the directory paths. All I want to is to download only the .zip and .rar files without the directory paths.
I don't see an issue here.... just download them with the directory path and them use a simple script (find /your/wget/root -exec mv * /mydir {} \; )to move all the files to a single directory if that is what you want to do.

It's not an issue if wget creates multiple directories and paths because you can move those files very easily after downloading with wget.
# 7  
Old 03-24-2014
Quote:
Originally Posted by sea
How about crawling the whole server, retrieving all findings as a list only.
In the next step, parse the list for your files (zip,rar) and then download those findings to current dir.

eg something like:
Code:
wget -r --spider example.com > retrieved.txt
for FOUND in $(grep -e rar -e zip retrieved.txt);do
  wget -nc $FOUND > $(basename $FOUND)
done


Tried your way for jpg files and i'm getting this:

Code:
Spider mode enabled. Check if remote file exists.
--2014-03-24 19:31:01--  http://www.cutetracking.com/images/Dating/spice/3.jpg
Reusing existing connection to www.cutetracking.com:80.
HTTP request sent, awaiting response... 200 OK
Length: 44925 (44K) [image/jpeg]
Remote file exists but does not contain any link -- not retrieving.

 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Wget - working in browser but cannot download from wget

Hi, I need to download a zip file from my the below US govt link. https://www.sam.gov/SAMPortal/extractfiledownload?role=WW&version=SAM&filename=SAM_PUBLIC_MONTHLY_20160207.ZIP I only have wget utility installed on the server. When I use the below command, I am getting error 403... (2 Replies)
Discussion started by: Prasannag87
2 Replies

2. Shell Programming and Scripting

Wget and gz

Can wget be used to goto a site and piped into a .gz extrated command? wget ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh37 | gunzip -d clinvar_20150603.vcf.gz (1 Reply)
Discussion started by: cmccabe
1 Replies

3. Red Hat

Wget

If I run the following command wget -r --no-parent --reject "index.html*" 10.11.12.13/backups/ A local directory named 10.11.12.13/backups with the content of web site data is created. What I want to do is have the data placed in a local directory called $HOME/backups. Thanks for... (1 Reply)
Discussion started by: popeye
1 Replies

4. Shell Programming and Scripting

WGET help!

Hi Friends, I have an url like this https://www.unix.com/help/ In this help directory, I have more than 300 directories which contains file or files. So, the 300 directories are like this http://unix.com/help/ dir1 file1 dir2 file2 dir3 file3_1 file3_2... (4 Replies)
Discussion started by: jacobs.smith
4 Replies

5. UNIX for Dummies Questions & Answers

Wget

...... (1 Reply)
Discussion started by: hoo
1 Replies

6. Shell Programming and Scripting

wget help?

can someone please help in understanding this shell script? wget --progress=dot:mega --cut-dirs=4 -r -c -nH -np --reject index.html*,icons/*.gif \ http://*****.oz.xxxxx.com:<portnum>/omcsm/releases/dew/${UPGRADE_VERSION}/ (1 Reply)
Discussion started by: dnam9917
1 Replies

7. Shell Programming and Scripting

wget

Hi I want to download some files using wget , and want to save in a specified directory. Is there any way to save it.Please suggest me. (1 Reply)
Discussion started by: mnmonu
1 Replies

8. Shell Programming and Scripting

Help with wget

Hi, i need temperature hourly from a web page Im using wget to get the web page. I would like to save the page downloaded in a file called page. I check the file everytime i run the wget function but its not saving but instead creates a wx.php file....Each time i run it...a new wx.php file is... (2 Replies)
Discussion started by: vadharah
2 Replies

9. Shell Programming and Scripting

wget help

i am trying to ftp files/dirs with wget. i am having an issue where the path always takes me to my home dir even when i specify something else. For example: wget -m ftp://USER:PASS@IP_ADDRESS/Path/on/remote/box ...but if that path on the remote box isn't in my home dir it doesn't change to... (0 Replies)
Discussion started by: djembeplayer
0 Replies

10. Shell Programming and Scripting

wget -r

I have noticed a lot of expensive books appearing online so I have decided to copy them to CD. I was going to write a program in java to do this, but remembered that wget GNU program some of you guys were talking about. Instead of spending two hours or so writing a program to do this.... (1 Reply)
Discussion started by: photon
1 Replies
Login or Register to Ask a Question