Wget - how to ignore files in immediate directory?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Wget - how to ignore files in immediate directory?
# 8  
Old 06-16-2014
Show exactly what you are doing.
This User Gave Thanks to Corona688 For This Post:
# 9  
Old 06-16-2014
Quote:
Originally Posted by Corona688
If it's saving index.html, you forgot the --spider.

You can feed wget a list of URL's with awk '{...}' | wget -I - ...
I put the --spider but it says that still

so run wget withe the spider line
then again with it feeding into it?
like awk | wget?
or is that all just one command?
thanks!
# 10  
Old 06-16-2014
Show exactly what you are doing, word for word, letter for letter, keystroke for keystroke.
This User Gave Thanks to Corona688 For This Post:
# 11  
Old 06-16-2014
first i run this:
Code:
wget --spider --no-remove-listing ftp://user:pass@hostname/directory/

i think that worked, it made
=> “.listing”


then
Code:
awk '{ sub(/\r/, ""); } /^d/ && ($9 != ".") && ($9 != "..") { print "ftp://user:pass@hostname/directory/" $9 }' .listing

this gives the list of directories



if i run this it says

awk: cmd. line:1: fatal: cannot open file `.listing' for reading (No such file or directory)
wget: missing URL
Code:
awk '{ sub(/\r/, ""); } /^d/ && ($9 != ".") && ($9 != "..") { print "ftp://user:pass@hostname/directory/" $9 }' .listing  |  wget -r -N  -nH --cut-dirs=1

im not sure what is the syntax for the feed im check it now!
# 12  
Old 06-16-2014
It means exactly what it says: .listing is not there. Probably you didn't run the first command, or ran it in a different directory.

You forgot the -I - on the last command, also. I'd also suggest -x, so it saves files into folders based on the URL.
This User Gave Thanks to Corona688 For This Post:
# 13  
Old 06-16-2014
id like to save all the directories into the same directory so that is ok

it says missing URL, but if the URL is being fed into it from the awk, what goes after the -I?
thanks!

Code:
awk '{ sub(/\r/, ""); } /^d/ && ($9 != ".") && ($9 != "..") { print "ftp://user:pass@host/directory1/" $9 }' .listing  |  wget -I -r -N  -nH --cut-dirs=1

# 14  
Old 06-16-2014
You put exactly what I said, -I -

The - tells it to read from stdin.
This User Gave Thanks to Corona688 For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

AIX find ignore directory

I am using aix. I would like to ignore the /u directory. I tried this but it is not working. find / -type f -type d \( -path /u \) -prune -o -name '*rpm*' 2>/dev/null /u/appx/ls.rpm /u/arch/vim.rpm (4 Replies)
Discussion started by: cokedude
4 Replies

2. Shell Programming and Scripting

How to change wget download directory?

i have a cron that mirrors a site periodically wget -r -nc --passive-ftp ftp://user:pass@123.456.789.0 i want to download this into a directory called /files but when I do this, it always create a new directory called "123.456.789.0" (the hostname) it puts it into /files/123.456.789.0 but... (3 Replies)
Discussion started by: vanessafan99
3 Replies

3. Shell Programming and Scripting

Find command with ignore directory

Dear All, I am using find command find /my_rep/*/RKYPROOF/*/*/WDM/HOME_INT/PWD_DATA -name rk*myguidelines*.pdf -print The problem i am facing here is find /my_rep/*/ the directory after my_rep could be mice001, mice002 and mice001_PO, mice002_PO i want to ignore mice***_PO directory... (3 Replies)
Discussion started by: yadavricky
3 Replies

4. Shell Programming and Scripting

Wget to ignore an IP address

Hello Unix Geeks, I am in a situation to use wget for crawling a site where the site contains 5 IP addresses. Out of 5, 4 are accessible and 1 is having a problem due to firewall problems. In this case, my wget is getting stuck with that X.X.X.X and giving up. How can I ignore this IP and... (4 Replies)
Discussion started by: sathyaonnuix
4 Replies

5. Shell Programming and Scripting

Find: ignore directory completely

Hello, I know find can be prevented from recursing into directories with something like the following... find . -name .svn -prune -a type d But how can I completely prevent directories of a certain name (.svn) from being displayed at all, the top level and the children? I really... (2 Replies)
Discussion started by: nwb123
2 Replies

6. Shell Programming and Scripting

Getting ls to ignore ~ and # files

Is there a way to customize ls to ignore files ending with ~ and #? (those are Emacs backup and auto-save files). I found -B option, which only ignores ~ files (2 Replies)
Discussion started by: yaroslavvb
2 Replies

7. Shell Programming and Scripting

wget a directory structure question

Can you tell me how to download the directory tree just starting from "project1/" in this URL? "https://somesite.com/projects/t/project1/" This command does not seem to do what I want as it downloads also files from the upper hierarchy: wget --no-check-certificate --http-user=user... (4 Replies)
Discussion started by: majormark
4 Replies

8. UNIX for Advanced & Expert Users

Why is wget copying my directory tree with some files with "@"?

I'm using wget 1.11.4 on Cygwin 1.5.25. I'm trying to recursively download a directory tree, which is the root of a javadoc tree. This is approximately the command line I tried: wget -x -p -r http://<host>/.../apidoc When it finished, it seemed like it downloaded... (0 Replies)
Discussion started by: dkarr
0 Replies

9. Solaris

How to ignore incomplete files

On Solaris, suppose there is a directory 'dir'. Log files of size approx 1MB are continuously being deposited here by scp command. I have a script that scans this dir every 5 mins and moves away the log files that have been deposited so far. How do I design my script so that I pick up *only*... (6 Replies)
Discussion started by: sentak
6 Replies

10. Shell Programming and Scripting

How to ignore '.' files

I'm running Fedora Core 6 as an FTP server on a powerMac G4... I'm trying to create a script to remove files older than 3 days... I'm able to find all data older than 3 days but it finds hidden files such as /home/ftp/goossens/.canna /home/ftp/goossens/.kde... (4 Replies)
Discussion started by: James_UK
4 Replies
Login or Register to Ask a Question