download an html file via wget and pass it to mysql and update a database Post: 302517976

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Script to download file using wget

Hi I need a Shell script that will download a text file every second from a http server using wget. Can anyone provide me any pointers or sample scripts that will help me go about this task ??? regards techie

2. UNIX for Dummies Questions & Answers

Using wget to download a file

Hello Everyone, I'm trying to use wget recursively to download a file. Only html files are being downloaded, instead of the target file. I'm trying this for the first time, here's what I've tried: wget -r -O jdk.bin...

3. Shell Programming and Scripting

download a particular file using wget

Hi All I want to download srs8.3.0.1.standard.linux24_EM64T.tar.gz file from the following website : http://downloads.biowisdomsrs.com/srs83_dist/ But this website contains lots of zipped files I want to download the above file only discarding other zipped files. When I am trying the...

4. UNIX and Linux Applications

download file using wget

I need to download the following srs8.3.0.1.standard.linux26_32.tar.gz file from the following website: http://downloads.biowisdomsrs.com/srs83_dist There are many gzip files along with the above one in the above site but I want to download the srs8.3.0.1.standard.linux26_32.tar.gz only from...

5. Shell Programming and Scripting

How to download to a file using wget in perl?

Hi, I want to download some online data using wget command and write the contents to a file. For example this is the URL i want to download and store it in a file called "results.txt". #This is the URL. $url="http://www.example.com"; #retrieve data and store in a file results.txt ...

6. Shell Programming and Scripting

How to download file without curl and wget

Hi I need a Shell script that will download a zip file every second from a http server but i can't use neither curl nor wget. Can anyone will help me go about this task ??? Thanks!!

7. Shell Programming and Scripting

MySQL bulk retrieval of database through html form

Have to delete this long post. Seems nobody would spent time on it.

8. Shell Programming and Scripting

Scrape 10 million pages and save the raw html data in mysql database

I have a list of 10 million page urls. I want those pages scraped and saved in the mysql database as raw html. I own a Linux VPS server with 1GB RAM and WHM/cPanel. I would like to scrape at least 100,000 urls in 24 hours. So can anyone give me some sample shell scripting code?

9. Shell Programming and Scripting

Wget download file ( do not overwrite )

Hello all, I want to write auto update script for my embedded device, which can check and download newer version of my program and extract the files on the device. The download center is hosted on remote web server . Script checks the hosted file on web site and if the new version is there...

10. UNIX for Dummies Questions & Answers

Using symbolic link for database MySQL in CentOS, not update sizing

I have no idea what I should set the topic here ==' This is my story, please you there kindly help me I'm quite newbie for this. ================================== My host server is CentOS, I spared 9.9GB for /var path that used by MySQL and...It's full because of heavy load traffic, then...

LEARN ABOUT DEBIAN

httpindex

httpindex(1)						      General Commands Manual						      httpindex(1)

NAME

       httpindex - HTTP front-end for SWISH++ indexer

SYNOPSIS

       wget [ options ] URL...	2>&1 | httpindex [ options ]

DESCRIPTION

       httpindex is a front-end for index++(1) to index files copied from remote servers using wget(1).  The files (in a copy of the remote direc-
       tory structure) can be kept, deleted, or replaced with their descriptions after indexing.

OPTIONS

   wget Options
       The wget(1) options that are required are: -A, -nv, -r, and -x; the ones that are highly recommended are: -l, -nh, -t, and  -w.	 (See  the
       EXAMPLE.)

   httpindex Options
       httpindex accepts the same short options as index++(1) except for -H, -I, -l, -r, -S, and -V.

       The following options are unique to httpindex:

       -d     Replace the text of local copies of retrieved files with their descriptions after they have been indexed.  This is useful to display
	      file descriptions in search results without having to have complete copies of the remote files thus saving filesystem  space.   (See
	      the extract_description() function in WWW(3) for details about how descriptions are extracted.)

       -D     Delete  the  local copies of retrieved files after they have been indexed.  This prevents your local filesystem from filling up with
	      copies of remote files.

EXAMPLE

       To index all HTML and text files on a remote web server keeping descriptions locally:

	    wget -A html,txt -linf -t2 -rxnv -nh -w2 http://www.foo.com 2>&1 |
	    httpindex -d -e'html:*.html,text:*.txt'

       Note that you need to redirect wget(1)'s output from standard error to standard output in order to pipe it to httpindex.

EXIT STATUS

       Exits with a value of zero only if indexing completed sucessfully; non-zero otherwise.

CAVEATS

       In addition to those for index++(1), httpindex does not correctly handle the use of multiple -e, -E, -m, or -M options  (because  the  Perl
       script uses the standard GetOpt::Std package for processing command-line options that doesn't).	The last of any of those options ``wins.''

       The work-around is to use multiple values for those options seperated by commas to a single one of those options.  For example, if you want
       to do:

	    httpindex -e'html:*.html' -e'text:*.txt'

       do this instead:

	    httpindex -e'html:*.html,text:*.txt'

SEE ALSO

       index++(1), wget(1), WWW(3)

AUTHOR

       Paul J. Lucas <pauljlucas@mac.com>

SWISH++ 							  August 2, 2005						      httpindex(1)