06-16-2014
thanks! that does seem to work but your commands are more advanced than what i know
for example i dont really understand what exactly the awk is doing but it does seem to be getting the directories
the first line saves the files into index.html
and the second one prints outs a lot of ftp:// lines
how do i feed that into the wget?
thanks!
10 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
I'm running Fedora Core 6 as an FTP server on a powerMac G4...
I'm trying to create a script to remove files older than 3 days...
I'm able to find all data older than 3 days but it finds hidden files such as
/home/ftp/goossens/.canna
/home/ftp/goossens/.kde... (4 Replies)
Discussion started by: James_UK
4 Replies
2. Solaris
On Solaris, suppose there is a directory 'dir'.
Log files of size approx 1MB are continuously being
deposited here by scp command. I have a script that scans
this dir every 5 mins and moves away the log files that
have been deposited so far.
How do I design my script so that I pick up *only*... (6 Replies)
Discussion started by: sentak
6 Replies
3. UNIX for Advanced & Expert Users
I'm using wget 1.11.4 on Cygwin 1.5.25.
I'm trying to recursively download a directory tree, which is the root of a javadoc tree.
This is approximately the command line I tried:
wget -x -p -r http://<host>/.../apidoc
When it finished, it seemed like it downloaded... (0 Replies)
Discussion started by: dkarr
0 Replies
4. Shell Programming and Scripting
Can you tell me how to download the directory tree just starting from "project1/" in this URL?
"https://somesite.com/projects/t/project1/"
This command does not seem to do what I want as it downloads also files from the upper hierarchy:
wget --no-check-certificate --http-user=user... (4 Replies)
Discussion started by: majormark
4 Replies
5. Shell Programming and Scripting
Is there a way to customize ls to ignore files ending with ~ and #? (those are Emacs backup and auto-save files). I found -B option, which only ignores ~ files (2 Replies)
Discussion started by: yaroslavvb
2 Replies
6. Shell Programming and Scripting
Hello,
I know find can be prevented from recursing into directories with something like the following...
find . -name .svn -prune -a type d
But how can I completely prevent directories of a certain name (.svn) from being displayed at all, the top level and the children?
I really... (2 Replies)
Discussion started by: nwb123
2 Replies
7. Shell Programming and Scripting
Hello Unix Geeks,
I am in a situation to use wget for crawling a site where the site contains 5 IP addresses. Out of 5, 4 are accessible and 1 is having a problem due to firewall problems.
In this case, my wget is getting stuck with that X.X.X.X and giving up. How can I ignore this IP and... (4 Replies)
Discussion started by: sathyaonnuix
4 Replies
8. Shell Programming and Scripting
Dear All,
I am using find command
find /my_rep/*/RKYPROOF/*/*/WDM/HOME_INT/PWD_DATA -name rk*myguidelines*.pdf -print
The problem i am facing here is find /my_rep/*/
the directory after my_rep could be mice001, mice002 and mice001_PO, mice002_PO
i want to ignore mice***_PO directory... (3 Replies)
Discussion started by: yadavricky
3 Replies
9. Shell Programming and Scripting
i have a cron that mirrors a site periodically
wget -r -nc --passive-ftp ftp://user:pass@123.456.789.0
i want to download this into a directory called /files
but when I do this, it always create a new directory called "123.456.789.0" (the hostname)
it puts it into /files/123.456.789.0
but... (3 Replies)
Discussion started by: vanessafan99
3 Replies
10. UNIX for Advanced & Expert Users
I am using aix. I would like to ignore the /u directory. I tried this but it is not working.
find / -type f -type d \( -path /u \) -prune -o -name '*rpm*' 2>/dev/null
/u/appx/ls.rpm
/u/arch/vim.rpm (4 Replies)
Discussion started by: cokedude
4 Replies
LEARN ABOUT DEBIAN
httpindex
httpindex(1) General Commands Manual httpindex(1)
NAME
httpindex - HTTP front-end for SWISH++ indexer
SYNOPSIS
wget [ options ] URL... 2>&1 | httpindex [ options ]
DESCRIPTION
httpindex is a front-end for index++(1) to index files copied from remote servers using wget(1). The files (in a copy of the remote direc-
tory structure) can be kept, deleted, or replaced with their descriptions after indexing.
OPTIONS
wget Options
The wget(1) options that are required are: -A, -nv, -r, and -x; the ones that are highly recommended are: -l, -nh, -t, and -w. (See the
EXAMPLE.)
httpindex Options
httpindex accepts the same short options as index++(1) except for -H, -I, -l, -r, -S, and -V.
The following options are unique to httpindex:
-d Replace the text of local copies of retrieved files with their descriptions after they have been indexed. This is useful to display
file descriptions in search results without having to have complete copies of the remote files thus saving filesystem space. (See
the extract_description() function in WWW(3) for details about how descriptions are extracted.)
-D Delete the local copies of retrieved files after they have been indexed. This prevents your local filesystem from filling up with
copies of remote files.
EXAMPLE
To index all HTML and text files on a remote web server keeping descriptions locally:
wget -A html,txt -linf -t2 -rxnv -nh -w2 http://www.foo.com 2>&1 |
httpindex -d -e'html:*.html,text:*.txt'
Note that you need to redirect wget(1)'s output from standard error to standard output in order to pipe it to httpindex.
EXIT STATUS
Exits with a value of zero only if indexing completed sucessfully; non-zero otherwise.
CAVEATS
In addition to those for index++(1), httpindex does not correctly handle the use of multiple -e, -E, -m, or -M options (because the Perl
script uses the standard GetOpt::Std package for processing command-line options that doesn't). The last of any of those options ``wins.''
The work-around is to use multiple values for those options seperated by commas to a single one of those options. For example, if you want
to do:
httpindex -e'html:*.html' -e'text:*.txt'
do this instead:
httpindex -e'html:*.html,text:*.txt'
SEE ALSO
index++(1), wget(1), WWW(3)
AUTHOR
Paul J. Lucas <pauljlucas@mac.com>
SWISH++ August 2, 2005 httpindex(1)