Sponsored Content
Top Forums Shell Programming and Scripting how to limit files downloaded by wget Post 302389799 by fubaya on Monday 25th of January 2010 11:55:42 PM
Old 01-26-2010
It looks like it's because they're embedded in javascript/flash. I hate when sites do that. Try these three commands:

Code:
wget http://soundbible.com/tags-alarm.html
awk 'BEGIN{ RS="&"}{gsub(/.*theFile=/,"");print}' < tags-alarm.html | grep mp3 | uniq > list
wget -i list

Explanation:
After wget gets the page, the links in the raw HTML look like this:
Code:
<embed src="wimpy_button.swf" flashvars="theFile=http://soundbible.com/mp3/Turn Off The Air-SoundBible.com-100970025.mp3&wimpyReg=MiU....(a lot more gibberish)

I got the awk line from someone here and it's very useful. When it see "theFile=", it extracts everything until it runs into "&". There is some extra gibberish for some reason but piping it to "grep mp3" gets rid of it and every mp3 link is duplicated so piping to uniq takes care of that. The output goes to list and "wget -i list" gets the links in the list file.

That gets 59 links, I hope that's all of them. I didn't see any wav files so I just did it for mp3s.
 

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

question regarding ftp. Files downloaded are of size Zero.

I need to download some files from a remote server using ftp. I have ftp'd into the site. I then do an mget * to retrieve all of the data files. Everything seems to proceed normally and I am given feedback that the files were downloaded. Now if I go into the DOS Shell or Windows explorer, it list... (5 Replies)
Discussion started by: ralphisnow
5 Replies

2. Shell Programming and Scripting

Any limit on files

I am doing an ftp of around 1010 files and I am using mput for this. For some reason its only transferring 10 or 20 files and the rest are not getting transferred. There is some socket error in the log. is there an issue if we have more than 50 or so files for mput. here is the o/p in the log... (2 Replies)
Discussion started by: dsravan
2 Replies

3. Shell Programming and Scripting

Help with WGET and renaming downloaded files :(

Hi everybody, I would greatly appreciate some expertise in this matter. I am trying find an efficient way to batch download files from a website and rename each file with the url it originated from (from the CLI). (ie. Instead of xyz.zip, the output file would be http://www.abc.com/xyz.zip) A... (10 Replies)
Discussion started by: o0110o
10 Replies

4. Web Development

php files are downloaded

Hello, I have setup Cherokee web server and php 5.2 in Opensolaris zone. Problem is that all .php files are downloaded from web server and not served when I use IP address instead of DNS name in web brovser. Example: test.mydomain.com <-- php works 192.168.0.10/index.php <--... (3 Replies)
Discussion started by: kreno
3 Replies

5. Shell Programming and Scripting

Extract urls from index.html downloaded using wget

Hi, I need to basically get a list of all the tarballs located at uri I am currently doing a wget on urito get the index.html page Now this index page contains the list of uris that I want to use in my bash script. can someone please guide me ,. I am new to Linux and shell scripting. ... (5 Replies)
Discussion started by: mnanavati
5 Replies

6. Shell Programming and Scripting

Specific image to be downloaded with wget

Hello All, I have gone through Google and came to know that we can download images from a site using wget. Now I am been asked to check whether an image is populated in a site or not. If yes, please send that image to an address as an attachment.. Say for example, the site is Wiki -... (6 Replies)
Discussion started by: sathyaonnuix
6 Replies

7. Shell Programming and Scripting

Files download using wget

Hi, I need to implement below logic to download files daily from a URL. * Need to check if it is yesterday's file (YYYY-DD-MM.dat) * If present then download from URL (sample_url/2013-01-28.dat) * Need to implement wait logic if not present * if it still not able to find the file... (1 Reply)
Discussion started by: rakesh5300
1 Replies

8. Shell Programming and Scripting

BASH scripting - Preventing wget messed downloaded files

hello. How can I detect within script, that the downloaded file had not a correct size. linux:~ # wget --limit-rate=20k --ignore-length -O /Software_Downloaded/MULTIMEDIA_ADDON/skype-4.1.0.20-suse.i586.rpm ... (6 Replies)
Discussion started by: jcdole
6 Replies

9. Shell Programming and Scripting

For loop till the files downloaded

Need assistance in writing a for loop script or any looping method. Below is the code where i can get all the files from the URL . There are about 80 files in the URL .Every day the files get updated . Script that i wanted is the loop must keep on running till it gets 80 files. It matches the count... (5 Replies)
Discussion started by: ajayram_arya
5 Replies

10. Shell Programming and Scripting

Deleting multiple files off an ftp server once they have been downloaded

Hello, I have a server that I have to ftp files off and they all start SGRD and are followed by 6 numbers. SGRD000001 SGRD000002 SGRD000003 The script I have will run every 10 mins to pick up files as new ones will be coming in all the time and what I want to do is delete the files I have... (7 Replies)
Discussion started by: sph90457
7 Replies
httpindex(1)						      General Commands Manual						      httpindex(1)

NAME
httpindex - HTTP front-end for SWISH++ indexer SYNOPSIS
wget [ options ] URL... 2>&1 | httpindex [ options ] DESCRIPTION
httpindex is a front-end for index++(1) to index files copied from remote servers using wget(1). The files (in a copy of the remote direc- tory structure) can be kept, deleted, or replaced with their descriptions after indexing. OPTIONS
wget Options The wget(1) options that are required are: -A, -nv, -r, and -x; the ones that are highly recommended are: -l, -nh, -t, and -w. (See the EXAMPLE.) httpindex Options httpindex accepts the same short options as index++(1) except for -H, -I, -l, -r, -S, and -V. The following options are unique to httpindex: -d Replace the text of local copies of retrieved files with their descriptions after they have been indexed. This is useful to display file descriptions in search results without having to have complete copies of the remote files thus saving filesystem space. (See the extract_description() function in WWW(3) for details about how descriptions are extracted.) -D Delete the local copies of retrieved files after they have been indexed. This prevents your local filesystem from filling up with copies of remote files. EXAMPLE
To index all HTML and text files on a remote web server keeping descriptions locally: wget -A html,txt -linf -t2 -rxnv -nh -w2 http://www.foo.com 2>&1 | httpindex -d -e'html:*.html,text:*.txt' Note that you need to redirect wget(1)'s output from standard error to standard output in order to pipe it to httpindex. EXIT STATUS
Exits with a value of zero only if indexing completed sucessfully; non-zero otherwise. CAVEATS
In addition to those for index++(1), httpindex does not correctly handle the use of multiple -e, -E, -m, or -M options (because the Perl script uses the standard GetOpt::Std package for processing command-line options that doesn't). The last of any of those options ``wins.'' The work-around is to use multiple values for those options seperated by commas to a single one of those options. For example, if you want to do: httpindex -e'html:*.html' -e'text:*.txt' do this instead: httpindex -e'html:*.html,text:*.txt' SEE ALSO
index++(1), wget(1), WWW(3) AUTHOR
Paul J. Lucas <pauljlucas@mac.com> SWISH++ August 2, 2005 httpindex(1)
All times are GMT -4. The time now is 08:49 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy