wget and xml isssue


Login or Register to Reply

 
Thread Tools Search this Thread
# 1  
Old 12-22-2009
wget and xml isssue

Hi All,
I need to download with wget all files with "xml" extension for a specifix url say for instance https://www.example.com/xmlfiles/
I need to do this 3 times a day downloading just the new files added since last download and/or files that are changed from the last download. In order to do this i have used the following command :
Code:
wget -r -nd -N -A xml --no-check-certificate https://user:password@www.example.com/xmlfiles/

I am successfully authenticate from the server but than I get
403 FORBIDDEN

If fo the following :
Code:
wget -r -nd -N --no-check-certificate https://user:password@www.example.com/xmlfiles/file.xml

Than I can successfully download the file/s from the url.

Where is my mistake ?
P.s. I cannot use ftp but just https

Than I need to parse all the downloaded xml files extracting data into some csv file what will be the best way ?
Here in attachment you can find one of my xml files as well as the output csv ( here exported into xls because csv is not allowed) file that I need to abtain after parsing of xml.



Thank you in advance for your help and Merry Christmas to all.
Nino

Last edited by pludi; 12-22-2009 at 07:53 AM.. Reason: removed links and added code tags
# 2  
Old 12-24-2009
If you point a web browser at:
Code:
https://www.example.com/xmlfiles/

does it try and retrieve index.htm or index.html (whatever the web server's default is set to)?
If that is the case then I imagine wget(1) is doing the same and presumably that page does not exist?
# 4  
Old 12-28-2009
Hi there, thank you for your reply.
I have tried also the -A.xml but no luck, you are right Tony wget is looking for the index.html page that in this case doesn't exist.
Any idea how solve the problem ?
Thanks you again.
Greetings
# 5  
Old 12-29-2009
Either:

1. Try a wget of index.htm, just in case it then gives you a list of the files in the directory from which you can then extract the names of the files you are interested in and then wget each file in turn.

2. Just wget each file in turn you are expecting to get.

3. Get the web server configuration amended so that option 1 works with either index.html or index.htm giving you a directory listing and then doing suggestion 1!
Login or Register to Reply

|
Thread Tools Search this Thread
Search this Thread:
Advanced Search

More UNIX and Linux Forum Topics You Might Find Helpful
CURL - Post Form Isssue ( sequel ) Fred13 Web Development 4 2 Weeks Ago 12:33 PM
Splitting a single xml file into multiple xml files Narendra921631 Shell Programming and Scripting 3 03-03-2016 03:50 PM
Wget - working in browser but cannot download from wget Prasannag87 Shell Programming and Scripting 2 03-02-2016 07:20 AM
Split xml file into multiple xml based on letterID vx04 Shell Programming and Scripting 5 02-05-2016 10:17 AM
Extract strings from XML files and create a new XML milano.churchil Shell Programming and Scripting 12 06-22-2015 07:25 AM
How to add Xml tags to an existing xml using shell or awk? Nevergivup Shell Programming and Scripting 2 04-10-2013 03:55 AM
Compare two xml files while ignoring some xml tags Shaishav Shah Shell Programming and Scripting 2 02-12-2013 11:24 PM
Shell Command to compare two xml lines while ignoring xml tags Shaishav Shah Shell Programming and Scripting 1 02-11-2013 10:05 AM
Perl Reading Excel sheet isssue naaj_ila Shell Programming and Scripting 3 10-31-2012 06:14 PM
How to add the multiple lines of xml tags before a particular xml tag in a file mjavalkar Shell Programming and Scripting 0 06-25-2012 05:54 PM
python - wget xml doc and parse with awk unclecameron Shell Programming and Scripting 0 09-14-2011 03:10 AM
How to remove xml namespace from xml file using shell script? Gary1978 Shell Programming and Scripting 10 10-26-2008 10:32 PM
email users isssue stakes20 UNIX for Dummies Questions & Answers 0 11-03-2005 06:00 PM
How to parse a XML file using PERL and XML::DOm girigopal Shell Programming and Scripting 0 06-27-2005 07:46 AM