Sponsored Content
Top Forums Shell Programming and Scripting Grabbing URL's from a website Post 302268543 by ogoy on Monday 15th of December 2008 10:46:54 PM
Old 12-15-2008
I've used the following with fairly good results:

links -dump "http://www.cnn.com" | egrep -o "http:.*"

I append the output > urlist.list however I am unsure how reliable this is. In other words yes, I do need help with parsing haha!

Thanks for the quick response guys Smilie

Ogoy
 

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

url calling and parameter passing to url in script

Hi all, I need to write a unix script in which need to call a url. Then need to pass parameters to that url. please help. Regards, gander_ss (1 Reply)
Discussion started by: gander_ss
1 Replies

2. Shell Programming and Scripting

url calling and parameter passing to url in script

Hi all, I need to write a unix script in which need to call a url. Then need to pass parameters to that url. please help. Regards, gander_ss (1 Reply)
Discussion started by: gander_ss
1 Replies

3. UNIX for Dummies Questions & Answers

Grabbing a value from an output file

I am executing a stored proc and sending the results in a log file. I then want to grab one result from the output parameters (bolded below, 2) so that I can store it in a variable which will then be called in another script. There are more details that get printed in the beginning of the log file,... (3 Replies)
Discussion started by: hern14
3 Replies

4. UNIX for Dummies Questions & Answers

ReDirecting a URL to another URL - Linux

Hello, I need to redirect an existing URL, how can i do that? There's a current web address to a GUI that I have to redirect to another webaddress. Does anyone know how to do this? This is on Unix boxes Linux. example: https://m45.testing.address.net/host.php make it so the... (3 Replies)
Discussion started by: SkySmart
3 Replies

5. Shell Programming and Scripting

Grabbing Certain Fields

ok, so a script i wrote spits out an output like the below: 2,JABABA,BV=114,CV=1,DF=-113,PCT=99.1228% as you can see, each field is separated by a comma. now, how can I get rid of the first field and ONLY show the rest of the fields. meaning, i want to get rid of the "2,", and... (3 Replies)
Discussion started by: SkySmart
3 Replies

6. Web Development

Regex to rewrite URL to another URL based on HTTP_HOST?

I am trying to find a way to test some code, but I need to rewrite a specific URL only from a specific HTTP_HOST The call goes out to http://SUB.DOMAIN.COM/showAssignment/7bde10b45efdd7a97629ef2fe01f7303/jsmodule/Nevow.Athena The ID in the middle is always random due to the cookie. I... (5 Replies)
Discussion started by: EXT3FSCK
5 Replies

7. UNIX for Dummies Questions & Answers

Awk: print all URL addresses between iframe tags without repeating an already printed URL

Here is what I have so far: find . -name "*php*" -or -name "*htm*" | xargs grep -i iframe | awk -F'"' '/<iframe*/{gsub(/.\*iframe>/,"\"");print $2}' Here is an example content of a PHP or HTM(HTML) file: <iframe src="http://ADDRESS_1/?click=5BBB08\" width=1 height=1... (18 Replies)
Discussion started by: striker4o
18 Replies

8. Shell Programming and Scripting

Grabbing data from a secured website

Hello there, I am a beginner in Perl, and I have a challenging project: I have to create a program that checks regularly on an online bank account for new operations, it should then feed a database keeping track of all the money going in and out. Of course the login details of this online... (4 Replies)
Discussion started by: freddie50
4 Replies

9. Shell Programming and Scripting

Reading URL using Mechanize and dump all the contents of the URL to a file

Hello, Am very new to perl , please help me here !! I need help in reading a URL from command line using PERL:: Mechanize and needs all the contents from the URL to get into a file. below is the script which i have written so far , #!/usr/bin/perl use LWP::UserAgent; use... (2 Replies)
Discussion started by: scott_cog
2 Replies

10. Shell Programming and Scripting

Grabbing variabes from logs

Hey guys, I am working on a script that needs to grab variables from a log file. The script will run morning 9 to 5pm and save variables for each run every hour, these results I will be aggregating at the end of the day. I am thinking I will be placing the date on each entry in the log every... (7 Replies)
Discussion started by: mo_VERTICASQL
7 Replies
LWP-REQUEST(1p) 					User Contributed Perl Documentation					   LWP-REQUEST(1p)

NAME
lwp-request, GET, POST, HEAD - Simple command line user agent SYNOPSIS
lwp-request [-afPuUsSedvhx] [-m method] [-b base URL] [-t timeout] [-i if-modified-since] [-c content-type] [-C credentials] [-p proxy-url] [-o format] url... DESCRIPTION
This program can be used to send requests to WWW servers and your local file system. The request content for POST and PUT methods is read from stdin. The content of the response is printed on stdout. Error messages are printed on stderr. The program returns a status value indicating the number of URLs that failed. The options are: -m <method> Set which method to use for the request. If this option is not used, then the method is derived from the name of the program. -f Force request through, even if the program believes that the method is illegal. The server might reject the request eventually. -b <uri> This URI will be used as the base URI for resolving all relative URIs given as argument. -t <timeout> Set the timeout value for the requests. The timeout is the amount of time that the program will wait for a response from the remote server before it fails. The default unit for the timeout value is seconds. You might append "m" or "h" to the timeout value to make it minutes or hours, respectively. The default timeout is '3m', i.e. 3 minutes. -i <time> Set the If-Modified-Since header in the request. If time is the name of a file, use the modification timestamp for this file. If time is not a file, it is parsed as a literal date. Take a look at HTTP::Date for recognized formats. -c <content-type> Set the Content-Type for the request. This option is only allowed for requests that take a content, i.e. POST and PUT. You can force methods to take content by using the "-f" option together with "-c". The default Content-Type for POST is "application/x-www-form-urlencoded". The default Content-type for the others is "text/plain". -p <proxy-url> Set the proxy to be used for the requests. The program also loads proxy settings from the environment. You can disable this with the "-P" option. -P Don't load proxy settings from environment. -H <header> Send this HTTP header with each request. You can specify several, e.g.: lwp-request -H 'Referer: http://other.url/' -H 'Host: somehost' http://this.url/ -C <username>:<password> Provide credentials for documents that are protected by Basic Authentication. If the document is protected and you did not specify the username and password with this option, then you will be prompted to provide these values. The following options controls what is displayed by the program: -u Print request method and absolute URL as requests are made. -U Print request headers in addition to request method and absolute URL. -s Print response status code. This option is always on for HEAD requests. -S Print response status chain. This shows redirect and authorization requests that are handled by the library. -e Print response headers. This option is always on for HEAD requests. -E Print response status chain with full response headers. -d Do not print the content of the response. -o <format> Process HTML content in various ways before printing it. If the content type of the response is not HTML, then this option has no effect. The legal format values are; text, ps, links, html and dump. If you specify the text format then the HTML will be formatted as plain latin1 text. If you specify the ps format then it will be formatted as Postscript. The links format will output all links found in the HTML document. Relative links will be expanded to absolute ones. The html format will reformat the HTML code and the dump format will just dump the HTML syntax tree. Note that the "HTML-Tree" distribution needs to be installed for this option to work. In addition the "HTML-Format" distribution needs to be installed for -o text or -o ps to work. -v Print the version number of the program and quit. -h Print usage message and quit. -a Set text(ascii) mode for content input and output. If this option is not used, content input and output is done in binary mode. Because this program is implemented using the LWP library, it will only support the protocols that LWP supports. SEE ALSO
lwp-mirror, LWP COPYRIGHT
Copyright 1995-1999 Gisle Aas. This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. AUTHOR
Gisle Aas <gisle@aas.no> perl v5.14.2 2012-02-11 LWP-REQUEST(1p)
All times are GMT -4. The time now is 11:58 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy