Sponsored Content
Full Discussion: wget and xml isssue
Top Forums Shell Programming and Scripting wget and xml isssue Post 302382105 by capnino on Tuesday 22nd of December 2009 06:41:17 AM
Old 12-22-2009
wget and xml isssue

Hi All,
I need to download with wget all files with "xml" extension for a specifix url say for instance https://www.example.com/xmlfiles/
I need to do this 3 times a day downloading just the new files added since last download and/or files that are changed from the last download. In order to do this i have used the following command :
Code:
wget -r -nd -N -A xml --no-check-certificate https://user:password@www.example.com/xmlfiles/

I am successfully authenticate from the server but than I get
403 FORBIDDEN

If fo the following :
Code:
wget -r -nd -N --no-check-certificate https://user:password@www.example.com/xmlfiles/file.xml

Than I can successfully download the file/s from the url.

Where is my mistake ?
P.s. I cannot use ftp but just https

Than I need to parse all the downloaded xml files extracting data into some csv file what will be the best way ?
Here in attachment you can find one of my xml files as well as the output csv ( here exported into xls because csv is not allowed) file that I need to abtain after parsing of xml.



Thank you in advance for your help and Merry Christmas to all.
Nino

Last edited by pludi; 12-22-2009 at 07:53 AM.. Reason: removed links and added code tags
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

email users isssue

Hi, my email server is set up in a different machine which runs lineox enterprise 3.0. It exports /var/spool/mail to the sun server running solaris 9 and hence, all workstations nd users can access their mail. but the problem is some users cannot open their mail at all. the error "mailer... (0 Replies)
Discussion started by: stakes20
0 Replies

2. Shell Programming and Scripting

python - wget xml doc and parse with awk

Well, that's what I'd do in bash :) Here's what I have so far: import urllib2 from BeautifulSoup import BeautifulStoneSoup xml = urllib2.urlopen('http://weatherlink.com/xml.php?user=blah&pass=blah') soup = BeautifulStoneSoup(xml) print soup.prettify() but all it does is grab the html... (0 Replies)
Discussion started by: unclecameron
0 Replies

3. Shell Programming and Scripting

How to add the multiple lines of xml tags before a particular xml tag in a file

Hi All, I'm stuck with adding multiple lines(irrespective of line number) to a file before a particular xml tag. Please help me. <A>testing_Location</A> <value>LA</value> <zone>US</zone> <B>Region</B> <value>Russia</value> <zone>Washington</zone> <C>Country</C>... (0 Replies)
Discussion started by: mjavalkar
0 Replies

4. Shell Programming and Scripting

Perl Reading Excel sheet isssue

There is a perl scriptwhich will read Excel sheet and create one file(.v) . Excel sheet::: A B C D 1 cpu_dailog 2 3 4 Perl will create the file(.v) like thsi ::: assert (cpu_dailog_iso ==2) ; assert (cpu_dailog_reset ==3); assert (cpu_dailog_idle... (3 Replies)
Discussion started by: naaj_ila
3 Replies

5. Shell Programming and Scripting

Shell Command to compare two xml lines while ignoring xml tags

I've got two different files and want to compare them. File 1 : HTML Code: <response ticketId="944" type="getQueryResults"><status>COMPLETE</status><description>Query results fetched successfully</description><recordSet totalCount="1" type="sms_records"><record... (1 Reply)
Discussion started by: Shaishav Shah
1 Replies

6. Shell Programming and Scripting

How to add Xml tags to an existing xml using shell or awk?

Hi , I have a below xml: <ns:Body> <ns:result> <Date Month="June" Day="Monday:/> </ns:result> </ns:Body> i have a lookup abc.txtt text file with below details Month June July August Day Monday Tuesday Wednesday I need a output xml with below tags <ns:Body> <ns:result>... (2 Replies)
Discussion started by: Nevergivup
2 Replies

7. Shell Programming and Scripting

Wget - working in browser but cannot download from wget

Hi, I need to download a zip file from my the below US govt link. https://www.sam.gov/SAMPortal/extractfiledownload?role=WW&version=SAM&filename=SAM_PUBLIC_MONTHLY_20160207.ZIP I only have wget utility installed on the server. When I use the below command, I am getting error 403... (2 Replies)
Discussion started by: Prasannag87
2 Replies

8. Web Development

CURL - Post Form Isssue ( sequel )

Hi, I write a new thread to discuss about my closed topic with new information ( /280990-curl-post-form-issue.html ) The previous post was closed because of missing informations, I didn't have access yet to server logs. ----------------------------------------------------------------------... (4 Replies)
Discussion started by: Fred13
4 Replies

9. UNIX for Beginners Questions & Answers

Grepping multiple XML tag results from XML file.

I want to write a one line script that outputs the result of multiple xml tags from a XML file. For example I have a XML file which has below XML tags in the file: <EMAIL>***</EMAIL> <CUSTOMER_ID>****</CUSTOMER_ID> <BRANDID>***</BRANDID> Now I want to grep the values of all these specified... (1 Reply)
Discussion started by: shubh752
1 Replies

10. UNIX for Beginners Questions & Answers

How to pull multiple XML tags from the same XML file in Shell.?

I'm searching for the names of a TV show in the XML file I've attached at the end of this post. What I'm trying to do now is pull out/list the data from each of the <SeriesName> tags throughout the document. Currently, I'm only able to get data the first instance of that XML field using the... (9 Replies)
Discussion started by: hungryd
9 Replies
DGET(1) 																   DGET(1)

NAME
dget -- Download Debian source and binary packages SYNOPSIS
dget [options] URL ... dget [options] package[=version] DESCRIPTION
dget downloads Debian packages. In the first form, dget fetches the requested URLs. If this is a .dsc or .changes file, then dget acts as a source-package aware form of wget: it also fetches any files referenced in the .dsc/.changes file. The downloaded source is then checked with dscverify and, if successful, unpacked by dpkg-source. In the second form, dget downloads a binary package (i.e., a .deb file) from the Debian mirror configured in /etc/apt/sources.list(.d). Unlike apt-get install -d, it does not require root privileges, writes to the current directory, and does not download dependencies. If a version number is specified, this version of the package is requested. In both cases dget is capable of getting several packages and/or URLs at once. (Note that .udeb packages used by debian-installer are located in separate packages files from .deb packages. In order to use .udebs with dget, you will need to have configured apt to use a packages file for component/debian-installer). Before downloading files listed in .dsc and .changes files, and before downloading binary packages, dget checks to see whether any of these files already exist. If they do, then their md5sums are compared to avoid downloading them again unnecessarily. dget also looks for matching files in /var/cache/apt/archives and directories given by the --path option or specified in the configuration files (see below). Finally, if downloading (.orig).tar.gz or .diff.gz files fails, dget consults apt-get source --print-uris. Download backends used are curl and wget, looked for in that order. dget was written to make it easier to retrieve source packages from the web for sponsor uploads. For checking the package with debdiff, the last binary version is available via dget package, the last source version via apt-get source package. OPTIONS
-b, --backup Move files that would be overwritten to ./backup. -q, --quiet Suppress wget/curl non-error output. -d, --download-only Do not run dpkg-source -x on the downloaded source package. This can only be used with the first method of calling dget. -x, --extract Run dpkg-source -x on the downloaded source package to unpack it. This option is the default and can only be used with the first method of calling dget. -u, --allow-unauthenticated Do not attempt to verify the integrity of downloaded source packages using dscverify. --build Run dpkg-buildpackage -b -uc on the downloaded source package. --path DIR[:DIR ...] In addition to /var/cache/apt/archives, dget uses the colon-separated list given as argument to --path to find files with a matching md5sum. For example: "--path /srv/pbuilder/result:/home/cb/UploadQueue". If DIR is empty (i.e., "--path ''" is specified), then any previously listed directories or directories specified in the configuration files will be ignored. This option may be specified multiple times, and all of the directories listed will be searched; hence, the above example could have been written as: "--path /srv/pbuilder/result --path /home/cb/UploadQueue". --insecure Allow SSL connections to untrusted hosts. --no-cache Bypass server-side HTTP caches by sending a Pragma: no-cache header. -h, --help Show a help message. -V, --version Show version information. CONFIGURATION VARIABLES
The two configuration files /etc/devscripts.conf and ~/.devscripts are sourced by a shell in that order to set configuration variables. Command line options can be used to override configuration file settings. Environment variable settings are ignored for this purpose. The currently recognised variable is: DGET_PATH This can be set to a colon-separated list of directories in which to search for files in addition to the default /var/cache/apt/archives. It has the same effect as the --path command line option. It is not set by default. DGET_UNPACK Set to 'no' to disable extracting downloaded source packages. Default is 'yes'. DGET_VERIFY Set to 'no' to disable checking signatures of downloaded source packages. Default is 'yes'. BUGS AND COMPATIBILITY
dget package should be implemented in apt-get install -d. Before devscripts version 2.10.17, the default was not to extract the downloaded source. Set DGET_UNPACK=no to revert to the old behaviour. AUTHOR
This program is Copyright (C) 2005-08 by Christoph Berg <myon@debian.org>. Modifications are Copyright (C) 2005-06 by Julian Gilbey <jdg@debian.org>. This program is licensed under the terms of the GPL, either version 2 of the License, or (at your option) any later version. SEE ALSO
apt-get(1), debcheckout(1), debdiff(1), dpkg-source(1), curl(1), wget(1). Debian Utilities 2013-12-23 DGET(1)
All times are GMT -4. The time now is 03:04 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy