Sponsored Content
Top Forums Shell Programming and Scripting How to extract url from html page? Post 302463174 by kurumi on Saturday 16th of October 2010 05:08:53 AM
Old 10-16-2010
Quote:
Originally Posted by malcomex999
Code:
awk -F'href="|"  |">|</' '{for(i=2;i<=NF;i=i+4) print $i,$(i+2)}' infile

Code:
$ cat file
<a href="http://awebsite"  id="awebsite" class="first" someattribute="last" > website</a>
<a href="http://bwebsite"  id="bwebsite" class="first">websiteb</a>

$ awk -F'href="|"  |">|</' '{for(i=2;i<=NF;i=i+4) print $i,$(i+2)}' file
http://awebsite a>
http://bwebsite websiteb

$ ruby test.rb
-->http://awebsite,  website
-->http://bwebsite, websiteb

 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to get the page size (of a url) using wget

Hi , I am trying to get page size of a url(e.g.,www.example.com) using wget command.Any thoughts what are the parameters i need to send with wget to get the size alone? Regards, Raj (1 Reply)
Discussion started by: rajbal
1 Replies

2. UNIX for Dummies Questions & Answers

How do I extract text only from html file without HTML tag

I have a html file called myfile. If I simply put "cat myfile.html" in UNIX, it shows all the html tags like <a href=r/26><img src="http://www>. But I want to extract only text part. Same problem happens in "type" command in MS-DOS. I know you can do it by opening it in Internet Explorer,... (4 Replies)
Discussion started by: los111
4 Replies

3. Solaris

Accessing a HTML page

Hi All, In our unix server we have an apache web server running. I can access the default apache web page from my windows machine. Now, I want to create my own webpage. Therefore I created webpage at /export/home/myname/test.html file. Where do I need to place this file and what do I need... (0 Replies)
Discussion started by: pkm_oec
0 Replies

4. Web Development

findstr in html page

I am planning to create an html page that will count number of connected ports, challenge for me is how to put it in a page. Thanks! (1 Reply)
Discussion started by: webmunkey23
1 Replies

5. UNIX for Dummies Questions & Answers

Publishing HTML Page

Hi All, Thanks for reading. I am not sure if I am asking this in the correct group. But here it goes: There is a shell script which does some system checks and creates an html file called system_summary.html on my Red Hat machine say in /reports directory every hour. Now I want to view it... (1 Reply)
Discussion started by: deepakgang
1 Replies

6. Red Hat

Publishing HTML Page

Hi All, Thanks for reading. I am not sure if I am asking this in the correct group. But here it goes: There is a shell script which does some system checks and creates an html file called system_summary.html on my Red Hat machine say in /reports directory every hour. Now I want to view it... (6 Replies)
Discussion started by: deepakgang
6 Replies

7. Shell Programming and Scripting

Extracting anchor text and its URL from HTML files in BASH

Hi All, I have some HTML files and my requirement is to extract all the anchor text words from the HTML files along with their URLs and store the result in a separate text file separated by space. For example, <a href="/kid/stay_healthy/">Staying Healthy</a> which has /kid/stay_healthy/ as... (3 Replies)
Discussion started by: shoaibjameel123
3 Replies

8. Shell Programming and Scripting

URL/HTML encoding

Hey guys, looking for a way to encode a string into URL and HTML in a bash script that I'm making to encode strings in various different digests etc. Can't find anything on it anywhere else on the forums. Any help much appreciated, still very new to bash and programming etc. (4 Replies)
Discussion started by: 3therk1ll
4 Replies

9. Shell Programming and Scripting

Use curl to send a static xml file using url encoding to a web page using pos

Hi I am try to use curl to send a static xml file using url encoding to a web page using post. This has to go through a particular port on our firewall as well. This is my first exposure to curl and am not having much success, so any help you can supply, or point me in the right direction would be... (1 Reply)
Discussion started by: Paul Walker
1 Replies

10. Post Here to Contact Site Administrators and Moderators

Page Not Found error while parsing url

Hi I just tried to post following link while answering, its not parsing properly, just try on your browser Tried to paste while answering : https://www.unix.com/302873559-post2.htmlNot operator is not coming with HTML/PHP tags so attaching file (2 Replies)
Discussion started by: Akshay Hegde
2 Replies
XML::Atom::Atompub(3pm) 				User Contributed Perl Documentation				   XML::Atom::Atompub(3pm)

NAME
XML::Atom::Atompub - Extensions of XML::Atom for the Atom Publishing Protocol SYNOPSIS
use XML::Atom::Entry; use XML::Atom::Feed; use XML::Atom::Atompub; my $entry = XML::Atom::Entry->new; # <app:edited>2007-01-01T00:00:00Z</app:edited> $entry->edited('2007-01-01T00:00:00Z'); # <app:control><app:draft>yes</app:draft></app:control> my $control = XML::Atom::Control->new; $control->draft('yes'); $entry->control($control); # <content type="image/png" src="http://example.com/foo.png"/> my $content = XML::Atom::Content->new; $content->type('image/png'); $content->src('http://example.com/foo.png'); $entry->content($content); # <link rel="alternate" href="http://example.com/foo.html"/> $entry->alternate_link('http://example.com/foo.html'); my $feed = XML::Atom::Feed->new; # <link rel="self" href="http://example.com"/> $feed->self_link('http://example.com'); METHODS of XML::Atom Some elements are introduced by the Atom Publishing Protocol, which are imported into XML::Atom by this module. $entry->control([ $control ]) Returns an XML::Atom::Control object representing the control of the Entry, or "undef" if there is no control. If $control is supplied, it should be an XML::Atom::Control object representing the control. For example: my $control = XML::Atom::Control->new; $control->draft('yes'); $entry->control($control); $entry->edited([ $edited ]) Returns an atom:edited element. If $edited is given, sets the atom:edited element. $content->src([ $src ]) Returns a value of src attribute in atom:content element. If $src is given, the src attribute is added. $atom->alternate_link([ $href ]) Returns a value of href attribute in atom:link element with a link relation of alternate. If $href is given, an atom:link element with a link relation of alternate is added. $atom->self_link([ $href ]) Returns a value of href attribute in atom:link element with a link relation of self. If $href is given, an atom:link element with a link relation of self is added. $atom->edit_link([ $href ]) Returns a value of href attribute in atom:link element with a link relation of edit. If $href is given, an atom:link element with a link relation of edit is added. $atom->edit_media_link([ $href ]) Returns a value of href attribute in atom:link element with a link relation of edit-media. If $href is given, an atom:link element with a link relation of edit-media is added. $atom->related_link([ $href ]) Returns a value of href attribute in atom:link element with a link relation of related. If $href is given, an atom:link element with a link relation of related is added. $atom->enclosure_link([ $href ]) Returns a value of href attribute in atom:link element with a link relation of enclosure. If $href is given, an atom:link element with a link relation of enclosure is added. $atom->via_link([ $href ]) Returns a value of href attribute in atom:link element with a link relation of via. If $href is given, an atom:link element with a link relation of via is added. $atom->first_link([ $href ]) Returns a value of href attribute in atom:link element with a link relation of first. If $href is given, an atom:link element with a link relation of first is added. $atom->previous_link([ $href ]) Returns a value of href attribute in atom:link element with a link relation of previous. If $href is given, an atom:link element with a link relation of previous is added. $atom->next_link([ $href ]) Returns a value of href attribute in atom:link element with a link relation of next. If $href is given, an atom:link element with a link relation of next is added. $atom->last_link([ $href ]) Returns a value of href attribute in atom:link element with a link relation of last. If $href is given, an atom:link element with a link relation of last is added. SEE ALSO
XML::Atom XML::Atom::Service AUTHOR
Takeru INOUE, <takeru.inoue _ gmail.com> LICENCE AND COPYRIGHT
Copyright (c) 2007, Takeru INOUE "<takeru.inoue _ gmail.com>". All rights reserved. This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself. See perlartistic. DISCLAIMER OF WARRANTY
BECAUSE THIS SOFTWARE IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE SOFTWARE, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE SOFTWARE "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE SOFTWARE IS WITH YOU. SHOULD THE SOFTWARE PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR, OR CORRECTION. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE SOFTWARE AS PERMITTED BY THE ABOVE LICENCE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE SOFTWARE (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE SOFTWARE TO OPERATE WITH ANY OTHER SOFTWARE), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. perl v5.14.2 2012-04-04 XML::Atom::Atompub(3pm)
All times are GMT -4. The time now is 11:16 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy