Today (Saturday) We will make some minor tuning adjustments to MySQL.

You may experience 2 up to 10 seconds "glitch time" when we restart MySQL. We expect to make these adjustments around 1AM Eastern Daylight Saving Time (EDT) US.


wget and xml isssue


Login or Register to Reply

 
Thread Tools Search this Thread
# 1  
wget and xml isssue

Hi All,
I need to download with wget all files with "xml" extension for a specifix url say for instance https://www.example.com/xmlfiles/
I need to do this 3 times a day downloading just the new files added since last download and/or files that are changed from the last download. In order to do this i have used the following command :
Code:
wget -r -nd -N -A xml --no-check-certificate https://user:password@www.example.com/xmlfiles/

I am successfully authenticate from the server but than I get
403 FORBIDDEN

If fo the following :
Code:
wget -r -nd -N --no-check-certificate https://user:password@www.example.com/xmlfiles/file.xml

Than I can successfully download the file/s from the url.

Where is my mistake ?
P.s. I cannot use ftp but just https

Than I need to parse all the downloaded xml files extracting data into some csv file what will be the best way ?
Here in attachment you can find one of my xml files as well as the output csv ( here exported into xls because csv is not allowed) file that I need to abtain after parsing of xml.



Thank you in advance for your help and Merry Christmas to all.
Nino

Last edited by pludi; 12-22-2009 at 07:53 AM.. Reason: removed links and added code tags
# 2  
If you point a web browser at:
Code:
https://www.example.com/xmlfiles/

does it try and retrieve index.htm or index.html (whatever the web server's default is set to)?
If that is the case then I imagine wget(1) is doing the same and presumably that page does not exist?
# 4  
Hi there, thank you for your reply.
I have tried also the -A.xml but no luck, you are right Tony wget is looking for the index.html page that in this case doesn't exist.
Any idea how solve the problem ?
Thanks you again.
Greetings
# 5  
Either:

1. Try a wget of index.htm, just in case it then gives you a list of the files in the directory from which you can then extract the names of the files you are interested in and then wget each file in turn.

2. Just wget each file in turn you are expecting to get.

3. Get the web server configuration amended so that option 1 works with either index.html or index.htm giving you a directory listing and then doing suggestion 1!
Login or Register to Reply

|
Thread Tools Search this Thread
Search this Thread:
Advanced Search

More UNIX and Linux Forum Topics You Might Find Helpful
CURL - Post Form Isssue ( sequel )
Fred13
Hi, I write a new thread to discuss about my closed topic with new information ( /280990-curl-post-form-issue.html ) The previous post was closed because of missing informations, I didn't have access yet to server logs. ----------------------------------------------------------------------...... Web Development
4
Web Development
Wget - working in browser but cannot download from wget
Prasannag87
Hi, I need to download a zip file from my the below US govt link. https://www.sam.gov/SAMPortal/extractfiledownload?role=WW&version=SAM&filename=SAM_PUBLIC_MONTHLY_20160207.ZIP I only have wget utility installed on the server. When I use the below command, I am getting error 403...... Shell Programming and Scripting
2
Shell Programming and Scripting
Perl Reading Excel sheet isssue
naaj_ila
There is a perl scriptwhich will read Excel sheet and create one file(.v) . Excel sheet::: A B C D 1 cpu_dailog 2 3 4 Perl will create the file(.v) like thsi ::: assert (cpu_dailog_iso ==2) ; assert (cpu_dailog_reset ==3); assert (cpu_dailog_idle...... Shell Programming and Scripting
3
Shell Programming and Scripting
python - wget xml doc and parse with awk
unclecameron
Well, that's what I'd do in bash :) Here's what I have so far: import urllib2 from BeautifulSoup import BeautifulStoneSoup xml = urllib2.urlopen('http://weatherlink.com/xml.php?user=blah&pass=blah') soup = BeautifulStoneSoup(xml) print soup.prettify() but all it does is grab the html...... Shell Programming and Scripting
0
Shell Programming and Scripting
email users isssue
stakes20
Hi, my email server is set up in a different machine which runs lineox enterprise 3.0. It exports /var/spool/mail to the sun server running solaris 9 and hence, all workstations nd users can access their mail. but the problem is some users cannot open their mail at all. the error "mailer...... UNIX for Dummies Questions & Answers
0
UNIX for Dummies Questions & Answers

Featured Tech Videos