how to limit files downloaded by wget


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting how to limit files downloaded by wget
# 1  
Old 01-25-2010
[SOLVED] how to limit files downloaded by wget

I am trying to download a page and retrieve only wav and mp3 files via wget.

the website is:

Alarm Sounds | Free Sound Effects | Alarm Sound Clips | Sound Bites

my command is :

Code:
wget -rl 2 -e robots=off -A wav,mp3 http://soundbible.com/tags-alarm.html

When not using the
Code:
-A wav,mp3

option, then I get the mp3/wav files, but of course a whole lot more.

So what am I doing wrong?

Thanks,
Narnie

---------- Post updated at 09:55 PM ---------- Previous update was at 09:19 PM ----------

I believe I have figured it out.

It sees the -A wav,mp3 argument and is expecting a .wav and .mp3 extension.

wget is not saving them as such, but as a strange string name like this for example:

force-sounds.php?id=wav%2FWorld War 2 Rifle-SoundBible.com-1227426354.wav&clip=wav

which is confusing the -A wav,mp3 argument.

Problem is, when I tried -A *wav*,*mp3* which then assumes it is a pattern and not a suffix, then it still won't download them.

At least it is known why this is, but it appears there is not a work-around other than downloading everything and then deleting what is not necessary.

Yours,
Narnie

Last edited by Narnie; 02-09-2010 at 08:02 PM.. Reason: Solved
# 2  
Old 01-26-2010
It looks like it's because they're embedded in javascript/flash. I hate when sites do that. Try these three commands:

Code:
wget http://soundbible.com/tags-alarm.html
awk 'BEGIN{ RS="&"}{gsub(/.*theFile=/,"");print}' < tags-alarm.html | grep mp3 | uniq > list
wget -i list

Explanation:
After wget gets the page, the links in the raw HTML look like this:
Code:
<embed src="wimpy_button.swf" flashvars="theFile=http://soundbible.com/mp3/Turn Off The Air-SoundBible.com-100970025.mp3&wimpyReg=MiU....(a lot more gibberish)

I got the awk line from someone here and it's very useful. When it see "theFile=", it extracts everything until it runs into "&". There is some extra gibberish for some reason but piping it to "grep mp3" gets rid of it and every mp3 link is duplicated so piping to uniq takes care of that. The output goes to list and "wget -i list" gets the links in the list file.

That gets 59 links, I hope that's all of them. I didn't see any wav files so I just did it for mp3s.
# 3  
Old 01-28-2010
Quote:
Originally Posted by fubaya
It looks like it's because they're embedded in javascript/flash. I hate when sites do that. Try these three commands:

Code:
wget http://soundbible.com/tags-alarm.html
awk 'BEGIN{ RS="&"}{gsub(/.*theFile=/,"");print}' < tags-alarm.html | grep mp3 | uniq > list
wget -i list

Explanation:
After wget gets the page, the links in the raw HTML look like this:
Code:
<embed src="wimpy_button.swf" flashvars="theFile=http://soundbible.com/mp3/Turn Off The Air-SoundBible.com-100970025.mp3&wimpyReg=MiU....(a lot more gibberish)

I got the awk line from someone here and it's very useful. When it see "theFile=", it extracts everything until it runs into "&". There is some extra gibberish for some reason but piping it to "grep mp3" gets rid of it and every mp3 link is duplicated so piping to uniq takes care of that. The output goes to list and "wget -i list" gets the links in the list file.

That gets 59 links, I hope that's all of them. I didn't see any wav files so I just did it for mp3s.
Thanks. I figured it was due to the embedded nature. Thanks for the awk line. That is very helpful.

Yours,
Narnie
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Deleting multiple files off an ftp server once they have been downloaded

Hello, I have a server that I have to ftp files off and they all start SGRD and are followed by 6 numbers. SGRD000001 SGRD000002 SGRD000003 The script I have will run every 10 mins to pick up files as new ones will be coming in all the time and what I want to do is delete the files I have... (7 Replies)
Discussion started by: sph90457
7 Replies

2. Shell Programming and Scripting

For loop till the files downloaded

Need assistance in writing a for loop script or any looping method. Below is the code where i can get all the files from the URL . There are about 80 files in the URL .Every day the files get updated . Script that i wanted is the loop must keep on running till it gets 80 files. It matches the count... (5 Replies)
Discussion started by: ajayram_arya
5 Replies

3. Shell Programming and Scripting

BASH scripting - Preventing wget messed downloaded files

hello. How can I detect within script, that the downloaded file had not a correct size. linux:~ # wget --limit-rate=20k --ignore-length -O /Software_Downloaded/MULTIMEDIA_ADDON/skype-4.1.0.20-suse.i586.rpm ... (6 Replies)
Discussion started by: jcdole
6 Replies

4. Shell Programming and Scripting

Files download using wget

Hi, I need to implement below logic to download files daily from a URL. * Need to check if it is yesterday's file (YYYY-DD-MM.dat) * If present then download from URL (sample_url/2013-01-28.dat) * Need to implement wait logic if not present * if it still not able to find the file... (1 Reply)
Discussion started by: rakesh5300
1 Replies

5. Shell Programming and Scripting

Specific image to be downloaded with wget

Hello All, I have gone through Google and came to know that we can download images from a site using wget. Now I am been asked to check whether an image is populated in a site or not. If yes, please send that image to an address as an attachment.. Say for example, the site is Wiki -... (6 Replies)
Discussion started by: sathyaonnuix
6 Replies

6. Shell Programming and Scripting

Extract urls from index.html downloaded using wget

Hi, I need to basically get a list of all the tarballs located at uri I am currently doing a wget on urito get the index.html page Now this index page contains the list of uris that I want to use in my bash script. can someone please guide me ,. I am new to Linux and shell scripting. ... (5 Replies)
Discussion started by: mnanavati
5 Replies

7. Web Development

php files are downloaded

Hello, I have setup Cherokee web server and php 5.2 in Opensolaris zone. Problem is that all .php files are downloaded from web server and not served when I use IP address instead of DNS name in web brovser. Example: test.mydomain.com <-- php works 192.168.0.10/index.php <--... (3 Replies)
Discussion started by: kreno
3 Replies

8. Shell Programming and Scripting

Help with WGET and renaming downloaded files :(

Hi everybody, I would greatly appreciate some expertise in this matter. I am trying find an efficient way to batch download files from a website and rename each file with the url it originated from (from the CLI). (ie. Instead of xyz.zip, the output file would be http://www.abc.com/xyz.zip) A... (10 Replies)
Discussion started by: o0110o
10 Replies

9. Shell Programming and Scripting

Any limit on files

I am doing an ftp of around 1010 files and I am using mput for this. For some reason its only transferring 10 or 20 files and the rest are not getting transferred. There is some socket error in the log. is there an issue if we have more than 50 or so files for mput. here is the o/p in the log... (2 Replies)
Discussion started by: dsravan
2 Replies

10. UNIX for Advanced & Expert Users

question regarding ftp. Files downloaded are of size Zero.

I need to download some files from a remote server using ftp. I have ftp'd into the site. I then do an mget * to retrieve all of the data files. Everything seems to proceed normally and I am given feedback that the files were downloaded. Now if I go into the DOS Shell or Windows explorer, it list... (5 Replies)
Discussion started by: ralphisnow
5 Replies
Login or Register to Ask a Question