Sponsored Content
Top Forums Shell Programming and Scripting Extract urls from index.html downloaded using wget Post 302462292 by Habitual on Wednesday 13th of October 2010 09:26:45 PM
Old 10-13-2010
lynx -dump http://www.domain.com/index.html | grep -A999 "^References$" | tail -n +3 | awk '{print $2 }'
 

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extract URLs from HTML code using sed

Hello, i try to extract urls from google-search-results, but i have problem with sed filtering of html-code. what i wont is just list of urls thay apears between ........<p><a href=" and next following " in html code. here is my code, i use wget and pipelines to filtering. wget works, but... (13 Replies)
Discussion started by: L0rd
13 Replies

2. Shell Programming and Scripting

how to limit files downloaded by wget

I am trying to download a page and retrieve only wav and mp3 files via wget. the website is: Alarm Sounds | Free Sound Effects | Alarm Sound Clips | Sound Bites my command is : wget -rl 2 -e robots=off -A wav,mp3 http://soundbible.com/tags-alarm.html When not using the -A wav,mp3... (2 Replies)
Discussion started by: Narnie
2 Replies

3. Shell Programming and Scripting

Help with WGET and renaming downloaded files :(

Hi everybody, I would greatly appreciate some expertise in this matter. I am trying find an efficient way to batch download files from a website and rename each file with the url it originated from (from the CLI). (ie. Instead of xyz.zip, the output file would be http://www.abc.com/xyz.zip) A... (10 Replies)
Discussion started by: o0110o
10 Replies

4. Shell Programming and Scripting

Remove external urls from .html file

Hi everyone. I have an html file with lines like so: link href="localFolder/..."> link href="htp://..."> img src="localFolder/..."> img src="htp://..."> I want to remove the links with http in the href and imgs with http in its src. I'm having trouble removing them because there... (4 Replies)
Discussion started by: CowCow339
4 Replies

5. Shell Programming and Scripting

extract fields from a downloaded html file

I have around 100 html files and in each html file I have 5-6 such paragraphs of a company and I need to extract the Name of the company from either the one after "title" or "/company" and then the number of employees and finally the location . <div class="search_result"> <div... (1 Reply)
Discussion started by: gubbu
1 Replies

6. Shell Programming and Scripting

How to remove urls from html files

Does anybody know how to remove all urls from html files? all urls are links with anchor texts in the form of <a href="http://www.anydomain.com">ANCHOR</a> they may start with www or not. Goal is to delete all urls and keep the ANCHOR text and if possible to change tags around anchor to... (2 Replies)
Discussion started by: georgi58
2 Replies

7. Shell Programming and Scripting

Specific image to be downloaded with wget

Hello All, I have gone through Google and came to know that we can download images from a site using wget. Now I am been asked to check whether an image is populated in a site or not. If yes, please send that image to an address as an attachment.. Say for example, the site is Wiki -... (6 Replies)
Discussion started by: sathyaonnuix
6 Replies

8. UNIX for Dummies Questions & Answers

Wget -i URLs.txt problem

Hi Everyone, I have a problem with wget using an input file of URLs. When I execute this -> wget -i URLs.txt I get the login.php pages transferred but not the files I have in the URLs.txt file. I need to use the input file because it will have new products to download each week. I want my VA to... (3 Replies)
Discussion started by: Keith londrie
3 Replies

9. Shell Programming and Scripting

BASH scripting - Preventing wget messed downloaded files

hello. How can I detect within script, that the downloaded file had not a correct size. linux:~ # wget --limit-rate=20k --ignore-length -O /Software_Downloaded/MULTIMEDIA_ADDON/skype-4.1.0.20-suse.i586.rpm ... (6 Replies)
Discussion started by: jcdole
6 Replies
DGET(1) 																   DGET(1)

NAME
dget -- Download Debian source and binary packages SYNOPSIS
dget [options] URL ... dget [options] package[=version] DESCRIPTION
dget downloads Debian packages. In the first form, dget fetches the requested URLs. If this is a .dsc or .changes file, then dget acts as a source-package aware form of wget: it also fetches any files referenced in the .dsc/.changes file. The downloaded source is then checked with dscverify and, if successful, unpacked by dpkg-source. In the second form, dget downloads a binary package (i.e., a .deb file) from the Debian mirror configured in /etc/apt/sources.list(.d). Unlike apt-get install -d, it does not require root privileges, writes to the current directory, and does not download dependencies. If a version number is specified, this version of the package is requested. In both cases dget is capable of getting several packages and/or URLs at once. (Note that .udeb packages used by debian-installer are located in separate packages files from .deb packages. In order to use .udebs with dget, you will need to have configured apt to use a packages file for component/debian-installer). Before downloading files listed in .dsc and .changes files, and before downloading binary packages, dget checks to see whether any of these files already exist. If they do, then their md5sums are compared to avoid downloading them again unnecessarily. dget also looks for matching files in /var/cache/apt/archives and directories given by the --path option or specified in the configuration files (see below). Finally, if downloading (.orig).tar.gz or .diff.gz files fails, dget consults apt-get source --print-uris. Download backends used are curl and wget, looked for in that order. dget was written to make it easier to retrieve source packages from the web for sponsor uploads. For checking the package with debdiff, the last binary version is available via dget package, the last source version via apt-get source package. OPTIONS
-b, --backup Move files that would be overwritten to ./backup. -q, --quiet Suppress wget/curl non-error output. -d, --download-only Do not run dpkg-source -x on the downloaded source package. This can only be used with the first method of calling dget. -x, --extract Run dpkg-source -x on the downloaded source package to unpack it. This option is the default and can only be used with the first method of calling dget. -u, --allow-unauthenticated Do not attempt to verify the integrity of downloaded source packages using dscverify. --build Run dpkg-buildpackage -b -uc on the downloaded source package. --path DIR[:DIR ...] In addition to /var/cache/apt/archives, dget uses the colon-separated list given as argument to --path to find files with a matching md5sum. For example: "--path /srv/pbuilder/result:/home/cb/UploadQueue". If DIR is empty (i.e., "--path ''" is specified), then any previously listed directories or directories specified in the configuration files will be ignored. This option may be specified multiple times, and all of the directories listed will be searched; hence, the above example could have been written as: "--path /srv/pbuilder/result --path /home/cb/UploadQueue". --insecure Allow SSL connections to untrusted hosts. --no-cache Bypass server-side HTTP caches by sending a Pragma: no-cache header. -h, --help Show a help message. -V, --version Show version information. CONFIGURATION VARIABLES
The two configuration files /etc/devscripts.conf and ~/.devscripts are sourced by a shell in that order to set configuration variables. Command line options can be used to override configuration file settings. Environment variable settings are ignored for this purpose. The currently recognised variable is: DGET_PATH This can be set to a colon-separated list of directories in which to search for files in addition to the default /var/cache/apt/archives. It has the same effect as the --path command line option. It is not set by default. DGET_UNPACK Set to 'no' to disable extracting downloaded source packages. Default is 'yes'. DGET_VERIFY Set to 'no' to disable checking signatures of downloaded source packages. Default is 'yes'. BUGS AND COMPATIBILITY
dget package should be implemented in apt-get install -d. Before devscripts version 2.10.17, the default was not to extract the downloaded source. Set DGET_UNPACK=no to revert to the old behaviour. AUTHOR
This program is Copyright (C) 2005-08 by Christoph Berg <myon@debian.org>. Modifications are Copyright (C) 2005-06 by Julian Gilbey <jdg@debian.org>. This program is licensed under the terms of the GPL, either version 2 of the License, or (at your option) any later version. SEE ALSO
apt-get(1), debcheckout(1), debdiff(1), dpkg-source(1), curl(1), wget(1). Debian Utilities 2013-12-23 DGET(1)
All times are GMT -4. The time now is 06:28 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy