06-16-2014
Quote:
Originally Posted by
Corona688
If it's saving index.html, you forgot the --spider.
You can feed wget a list of URL's with awk '{...}' | wget -I - ...
I put the --spider but it says that still
so run wget withe the spider line
then again with it feeding into it?
like awk | wget?
or is that all just one command?
thanks!
10 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
I'm running Fedora Core 6 as an FTP server on a powerMac G4...
I'm trying to create a script to remove files older than 3 days...
I'm able to find all data older than 3 days but it finds hidden files such as
/home/ftp/goossens/.canna
/home/ftp/goossens/.kde... (4 Replies)
Discussion started by: James_UK
4 Replies
2. Solaris
On Solaris, suppose there is a directory 'dir'.
Log files of size approx 1MB are continuously being
deposited here by scp command. I have a script that scans
this dir every 5 mins and moves away the log files that
have been deposited so far.
How do I design my script so that I pick up *only*... (6 Replies)
Discussion started by: sentak
6 Replies
3. UNIX for Advanced & Expert Users
I'm using wget 1.11.4 on Cygwin 1.5.25.
I'm trying to recursively download a directory tree, which is the root of a javadoc tree.
This is approximately the command line I tried:
wget -x -p -r http://<host>/.../apidoc
When it finished, it seemed like it downloaded... (0 Replies)
Discussion started by: dkarr
0 Replies
4. Shell Programming and Scripting
Can you tell me how to download the directory tree just starting from "project1/" in this URL?
"https://somesite.com/projects/t/project1/"
This command does not seem to do what I want as it downloads also files from the upper hierarchy:
wget --no-check-certificate --http-user=user... (4 Replies)
Discussion started by: majormark
4 Replies
5. Shell Programming and Scripting
Is there a way to customize ls to ignore files ending with ~ and #? (those are Emacs backup and auto-save files). I found -B option, which only ignores ~ files (2 Replies)
Discussion started by: yaroslavvb
2 Replies
6. Shell Programming and Scripting
Hello,
I know find can be prevented from recursing into directories with something like the following...
find . -name .svn -prune -a type d
But how can I completely prevent directories of a certain name (.svn) from being displayed at all, the top level and the children?
I really... (2 Replies)
Discussion started by: nwb123
2 Replies
7. Shell Programming and Scripting
Hello Unix Geeks,
I am in a situation to use wget for crawling a site where the site contains 5 IP addresses. Out of 5, 4 are accessible and 1 is having a problem due to firewall problems.
In this case, my wget is getting stuck with that X.X.X.X and giving up. How can I ignore this IP and... (4 Replies)
Discussion started by: sathyaonnuix
4 Replies
8. Shell Programming and Scripting
Dear All,
I am using find command
find /my_rep/*/RKYPROOF/*/*/WDM/HOME_INT/PWD_DATA -name rk*myguidelines*.pdf -print
The problem i am facing here is find /my_rep/*/
the directory after my_rep could be mice001, mice002 and mice001_PO, mice002_PO
i want to ignore mice***_PO directory... (3 Replies)
Discussion started by: yadavricky
3 Replies
9. Shell Programming and Scripting
i have a cron that mirrors a site periodically
wget -r -nc --passive-ftp ftp://user:pass@123.456.789.0
i want to download this into a directory called /files
but when I do this, it always create a new directory called "123.456.789.0" (the hostname)
it puts it into /files/123.456.789.0
but... (3 Replies)
Discussion started by: vanessafan99
3 Replies
10. UNIX for Advanced & Expert Users
I am using aix. I would like to ignore the /u directory. I tried this but it is not working.
find / -type f -type d \( -path /u \) -prune -o -name '*rpm*' 2>/dev/null
/u/appx/ls.rpm
/u/arch/vim.rpm (4 Replies)
Discussion started by: cokedude
4 Replies
LEARN ABOUT DEBIAN
gedcom::webservices
Gedcom::WebServices(3pm) User Contributed Perl Documentation Gedcom::WebServices(3pm)
NAME
Gedcom::WebServices - Basic web service routines for Gedcom.pm
Version 1.16 - 24th April 2009
SYNOPSIS
wget -qO - http://www.example.com/ws/plain/my_family/i9/name
DESCRIPTION
This module provides web service access to a GEDCOM file in conjunction with mod_perl. Using it, A request for imformation can be made in
the form of a URL specifying the GEDCOM file to be used, which information is required and the format in which the information is to be
delivered. This information is then returned in the specified format.
There are currently three supported formats:
o plain - no markup
o XML
o JSON
URLs
The format of the URLs used to access the web services are:
$BASEURL/$FORMAT/$GEDCOM/$XREF/requested/information
$BASEURL/$FORMAT/$GEDCOM?search=search_criteria
BASEURL
The base URL to access the web services.
FORMAT
The format in which to return the results.
GEDCOM
The name of the GEDCOM file to use (the extension .ged is assumed).
XREF
The xref of the record about which information is required. XREFs can be obtained initially from a search, and subsequently from
certain queries.
requested/information
The information requested. This is in the same format as that taken by the get_value method.
search_criteria
An individual to search for. This is in the same format as that taken by the get_individual method.
EXAMPLES
$ wget -qO - 'http://pjcj.sytes.net:8585/ws/plain/royal92?search=elizabeth_ii'
/ws/plain/royal92/I52
$ wget -qO - http://pjcj.sytes.net:8585/ws/plain/royal92/I52
0 @I52@ INDI
1 NAME Elizabeth_II Alexandra Mary/Windsor/
1 TITL Queen of England
1 SEX F
1 BIRT
2 DATE 21 APR 1926
2 PLAC 17 Bruton St.,London,W1,England
1 FAMS @F14@
1 FAMC @F12@
$ wget -qO - http://pjcj.sytes.net:8585/ws/plain/royal92/I52/name
Elizabeth_II Alexandra Mary /Windsor/
$ wget -qO - http://pjcj.sytes.net:8585/ws/plain/royal92/I52/birth/date
21 APR 1926
$ wget -qO - http://pjcj.sytes.net:8585/ws/plain/royal92/I52/children
/ws/plain/royal92/I58
/ws/plain/royal92/I59
/ws/plain/royal92/I60
/ws/plain/royal92/I61
$ wget -qO - http://pjcj.sytes.net:8585/ws/json/royal92/I52/name
{"name":"Elizabeth_II Alexandra Mary /Windsor/"}
$ wget -qO - http://pjcj.sytes.net:8585/ws/xml/royal92/I52/name
<NAME>Elizabeth_II Alexandra Mary /Windsor/</NAME>
$ wget -qO - http://pjcj.sytes.net:8585/ws/xml/royal92/I52
<INDI ID="I52">
<NAME>Elizabeth_II Alexandra Mary/Windsor/</NAME>
<TITL>Queen of England</TITL>
<SEX>F</SEX>
<BIRT>
<DATE>21 APR 1926</DATE>
<PLAC>17 Bruton St.,London,W1,England</PLAC>
</BIRT>
<FAMS REF="F14"/>
<FAMC REF="F12"/>
</INDI>
CONFIGURATION
Add a section similar to the following to your mod_perl config:
PerlWarn On
PerlTaintCheck On
PerlPassEnv GEDCOM_TEST
<IfDefine GEDCOM_TEST>
<Perl>
$Gedcom::TEST = 1;
</Perl>
</IfDefine>
<Perl>
use Apache::Status;
$ENV{PATH} = "/bin:/usr/bin";
delete @ENV{"IFS", "CDPATH", "ENV", "BASH_ENV"};
$Gedcom::DATA = $Gedcom::ROOT; # location of data stored on server
use lib "$Gedcom::ROOT/blib/lib";
use Gedcom::WebServices;
my $handlers =
[ qw
(
plain
xml
json
)
];
eval Gedcom::WebServices::_set_handlers($handlers);
# use Apache::PerlSections; print STDERR Apache::PerlSections->dump;
</Perl>
PerlTransHandler Gedcom::WebServices::_parse_uri
BUGS
Very probably.
See the BUGS file. And the TODO file.
VERSION
Version 1.16 - 24th April 2009
LICENCE
Copyright 2005-2009, Paul Johnson (paul@pjcj.net)
This software is free. It is licensed under the same terms as Perl itself.
The latest version of this software should be available from my homepage: http://www.pjcj.net
perl v5.14.2 2012-04-12 Gedcom::WebServices(3pm)