Python Web Page Scraping Urls Creating A Dictionary Post: 302998746

8 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

making a web page

Hey im new to unix! I am tryin to create a web page in unix and have done it all but when i try and load it it says permission denied!?> i have chmod a+rx for folder and file to make sure but still permissions wont let me?! any ideas can anyone do a quick run through of how to make a web page...

2. UNIX for Dummies Questions & Answers

Accessing Web Page

Hello, I am new to unix, but wanted to know how can we fetch data from a web page (i.e. an HTML Page), my requirement is to read an html page and wanted to create a flat file (text file) based on the contents available in the mentioned HTML page. Thanks Imtiaz

3. UNIX for Dummies Questions & Answers

how do i make a web page

hey uhh this is my first post and i was wondering how do i make a web page for like a small business or something anything will help thanks

4. UNIX for Dummies Questions & Answers

Make a Web page

I'm 13 years of age and I am into computers. I am trying to learn how to make a webpage. I could use the help and I would greatly appriciate it.

5. Programming

fetching a web page in C

Hello, I'm a total newbie to HTTP commands, so I'm not sure how to do this. What I'd like is to write a C program to fetch the contents of a html page of a given address. Could someone help with this? Thanks in advance!

6. Shell Programming and Scripting

Creating a dictionary with domain name adjuncted

Hello, I have created a dictionary which has the following structure: DOMAINWORD=(equivalent in English)gloss(es) in Hindi each separated by a comma(equivalent in English)gloss(es) in Hindi each separated by a comma or a semi-colon An example will make this clear ...

7. Shell Programming and Scripting

Counting all words that start with a capital letter in a string using python dictionary

Hi, I have written the following python snippet to store the capital letter starting words into a dictionary as key and no of its appearances as a value in this dictionary against the key. #!/usr/bin/env python import sys import re hash = {} # initialize an empty dictinonary for line in...

8. Shell Programming and Scripting

Creating verbal structures from a dictionary and a template

My main aim here is to create a database of verbs in a language to Hindi. The output if it works well will be put up on a University site for researchers to use for Machine Translation. This because one of the main weaknesses of MT is in the area of verbs. Sorry for the long post but the problem...

LEARN ABOUT X11R4

uri

uri(n)						    Tcl Uniform Resource Identifier Management						    uri(n)

__________________________________________________________________________________________________________________________________________________

NAME

       uri - URI utilities

SYNOPSIS

       package require Tcl  8.2

       package require uri  ?1.2.1?

       uri::split url ?defaultscheme?

       uri::join ?key value?...

       uri::resolve base url

       uri::isrelative url

       uri::geturl url ?options...?

       uri::canonicalize uri

       uri::register schemeList script

_________________________________________________________________

DESCRIPTION

       This package contains two parts. First it provides regular expressions for a number of url/uri schemes. Second it provides a number of com-
       mands for manipulating urls/uris and fetching data specified by them. For the latter this package analyses the requested url/uri  and  then
       dispatches it to the appropriate package (http, ftp, ...) for actual fetching.

       The  package  currently	does  not conform to RFC 2396 (http://www.rfc-editor.org/rfc/rfc2396.txt), but quite likely should be. Patches and
       other help are welcome.

COMMANDS

       uri::split url ?defaultscheme?
	      uri::split takes an url, decodes it and then returns a list of key/value pairs suitable for array set containing the constituents of
	      the  url.  If  the  scheme is missing from the url it defaults to the value of defaultscheme if it was specified, or http else. Cur-
	      rently only the schemes http, ftp, mailto, urn, news, ldap and file are supported by the package itself.	See section  EXTENDING	on
	      how to expand that range.

	      The set of constituents of an url (= the set of keys in the returned dictionary) is dependent on the scheme of the url. The only key
	      which is therefore always present is scheme. For the following schemes the constituents and their keys are known:

	      ftp    user, pwd, host, port, path, type

	      http(s)
		     user, pwd, host, port, path, query, fragment. The fragment is optional.

	      file   path, host. The host is optional.

	      mailto user, host. The host is optional.

	      news   Either message-id or newsgroup-name.

       uri::join ?key value?...
	      uri::join takes a list of key/value pairs (generated by uri::split, for example) and returns the canonical url they represent.  Cur-
	      rently  only  the  schemes  http,  ftp,  mailto, urn, news, ldap and file are supported. See section EXTENDING on how to expand that
	      range.

       uri::resolve base url
	      uri::resolve resolves the specified url relative to base. In other words: A non-relative url is returned unchanged,  whereas  for  a
	      relative	url  the missing parts are taken from base and prepended to it. The result of this operation is returned. For an empty url
	      the result is base.

       uri::isrelative url
	      uri::isrelative determines whether the specified url is absolute or relative.

       uri::geturl url ?options...?
	      uri::geturl decodes the specified url and then dispatches the request to the package appropriate for the scheme found  in  the  url.
	      The  command  assumes  that the package to handle the given scheme either has the same name as the scheme itself (including possible
	      capitalization) followed by ::geturl, or, in case of this failing, has the same name as the scheme itself (including possible  capi-
	      talization).  It further assumes that whatever package was loaded provides a geturl-command in the namespace of the same name as the
	      package itself. This command is called with the given url and all given options.	Currently  geturl  does  not  handle  any  options
	      itself.

	      Note: file-urls are an exception to the rule described above. They are handled internally.

	      It  is  not  possible to specify results of the command. They depend on the geturl-command for the scheme the request was dispatched
	      to.

       uri::canonicalize uri
	      uri::canonicalize returns the canonical form of a URI.  The canonical form of a URI is one where relative path specifications, ie. .
	      and .., have been resolved.

       uri::register schemeList script
	      uri::register  registers	the  first element of schemeList as a new scheme and the remaining elements as aliases for this scheme. It
	      creates the namespace for the scheme and executes the script in the new namespace. The script has to  declare  variables	containing
	      the  regular  expressions  relevant to the scheme. At least the variable schemepart has to be declared as that one is used to extend
	      the variables keeping track of the registered schemes.

SCHEMES

       In addition to the commands mentioned above this package provides regular expression to recognize urls for a number of url schemes.

       For each supported scheme a namespace of the same name as the scheme itself is provided inside of the namespace uri containing the variable
       url  whose  contents  are  a  regular expression to recognize urls of that scheme. Additional variables may contain regular expressions for
       parts of urls for that scheme.

       The variable uri::schemes contains a list of all supported schemes. Currently these are ftp, ldap, file, http, gopher, mailto,  news,  wais
       and prospero.

EXTENDING

       Extending  the range of schemes supported by uri::split and uri::join is easy because both commands do not handle the request by themselves
       but dispatch it to another command in the uri namespace using the scheme of the url as criterion.

       uri::split and uri::join call Split[string totitle <scheme>] and  Join[string totitle <scheme>] respectively.

CREDITS

       Original code (regular expressions) by Andreas Kupries.	Modularisation by Steve Ball, also the split/join/resolve functionality.

BUGS, IDEAS, FEEDBACK
       This document, and the package it describes, will undoubtedly contain bugs and other problems.  Please report such in the category  uri	of
       the  Tcllib  SF	Trackers [http://sourceforge.net/tracker/?group_id=12883].  Please also report any ideas for enhancements you may have for
       either package and/or documentation.

KEYWORDS

       fetching information, file, ftp, gopher, http, ldap, mailto, news, prospero, rfc 2255, rfc 2396, uri, url, wais, www

uri								       1.2.1								    uri(n)