Query: lw2
OS: debian
Section: 3perl
Format: Original Unix Latex Style Formatted with HTML and a Horizontal Scroll Bar
LW2(3perl) User Contributed Perl Documentation LW2(3perl)NAMELW2 - Perl HTTP library version 2.5SYNOPSISuse LW2; require 'LW2.pm';DESCRIPTIONLibwhisker is a Perl library useful for HTTP testing scripts. It contains a pure-Perl reimplementation of functionality found in the "LWP", "URI", "Digest::MD5", "Digest::MD4", "Data::Dumper", "Authen::NTLM", "HTML::Parser", "HTML::FormParser", "CGI::Upload", "MIME::Base64", and "GetOpt::Std" modules. Libwhisker is designed to be portable (a single perl file), fast (general benchmarks show libwhisker is faster than LWP), and flexible (great care was taken to ensure the library does exactly what you want to do, even if it means breaking the protocol).FUNCTIONSThe following are the functions contained in Libwhisker: auth_brute_force Params: $auth_method, \%req, $user, @passwords [, $domain, $fail_code ] Return: $first_valid_password, undef if error/none found Perform a HTTP authentication brute force against a server (host and URI defined in %req). It will try every password in the password array for the given user. The first password (in conjunction with the given user) that doesn't return HTTP 401 is returned (and the brute force is stopped at that point). You should retry the request with the given password and double-check that you got a useful HTTP return code that indicates successful authentication (200, 302), and not something a bit more abnormal (407, 500, etc). $domain is optional, and is only used for NTLM auth. Note: set up any proxy settings and proxy auth in %req before calling this function. You can brute-force proxy authentication by setting up the target proxy as proxy_host and proxy_port in %req, using an arbitrary host and uri (preferably one that is reachable upon successful proxy authorization), and setting the $fail_code to 407. The $auth_method passed to this function should be a proxy-based one ('proxy-basic', 'proxy-ntlm', etc). if your server returns something other than 401 upon auth failure, then set $fail_code to whatever is returned (and it needs to be something *different* than what is received on auth success, or this function won't be able to tell the difference). auth_unset Params: \%req Return: nothing (modifies %req) Modifes %req to disable all authentication (regular and proxy). Note: it only removes the values set by auth_set(). Manually-defined [Proxy-]Authorization headers will also be deleted (but you shouldn't be using the auth_* functions if you're manually handling your own auth...) auth_set Params: $auth_method, \%req, $user, $password [, $domain] Return: nothing (modifies %req) Modifes %req to use the indicated authentication info. Auth_method can be: 'basic', 'proxy-basic', 'ntlm', 'proxy-ntlm'. Note: this function may not necessarily set any headers after being called. Also, proxy-ntlm with SSL is not currently supported. cookie_new_jar Params: none Return: $jar Create a new cookie jar, for use with the other functions. Even though the jar is technically just a hash, you should still use this function in order to be future-compatible (should the jar format change). cookie_read Params: $jar, \%response [, \%request, $reject ] Return: $num_of_cookies_read Read in cookies from an %response hash, and put them in $jar. Notice: cookie_read uses internal magic done by http_do_request in order to read cookies regardless of 'Set-Cookie[2]' header appearance. If the optional %request hash is supplied, then it will be used to calculate default host and path values, in case the cookie doesn't specify them explicitly. If $reject is set to 1, then the %request hash values are used to calculate and reject cookies which are not appropriate for the path and domains of the given request. cookie_parse Params: $jar, $cookie [, $default_domain, $default_path, $reject ] Return: nothing Parses the cookie into the various parts and then sets the appropriate values in the cookie $jar. If the cookie value is blank, it will delete it from the $jar. See the 'docs/cookies.txt' document for a full explanation of how Libwhisker parses cookies and what RFC aspects are supported. The optional $default_domain value is taken literally. Values with no leading dot (e.g. 'www.host.com') are considered to be strict hostnames and will only match the identical hostname. Values with leading dots (e.g. '.host.com') are treated as sub-domain matches for a single domain level. If the cookie does not indicate a domain, and a $default_domain is not provided, then the cookie is considered to match all domains/hosts. The optional $default_path is used when the cookie does not specify a path. $default_path must be absolute (start with '/'), or it will be ignored. If the cookie does not specify a path, and $default_path is not provided, then the default value '/' will be used. Set $reject to 1 if you wish to reject cookies based upon the provided $default_domain and $default_path. Note that $default_domain and $default_path must be specified for $reject to actually do something meaningful. cookie_write Params: $jar, \%request, $override Return: nothing Goes through the given $jar and sets the Cookie header in %req pending the correct domain and path. If $override is true, then the secure, domain and path restrictions of the cookies are ignored and all cookies are essentially included. Notice: cookie expiration is currently not implemented. URL restriction comparision is also case-insensitive. cookie_get Params: $jar, $name Return: @elements Fetch the named cookie from the $jar, and return the components. The returned items will be an array in the following order: value, domain, path, expire, secure value = cookie value, should always be non-empty string domain = domain root for cookie, can be undefined path = URL path for cookie, should always be a non-empty string expire = undefined (depreciated, but exists for backwards-compatibility) secure = whether or not the cookie is limited to HTTPs; value is 0 or 1 cookie_get_names Params: $jar Return: @names Fetch all the cookie names from the jar, which then let you cooke_get() them individually. cookie_get_valid_names Params: $jar, $domain, $url, $ssl Return: @names Fetch all the cookie names from the jar which are valid for the given $domain, $url, and $ssl values. $domain should be string scalar of the target host domain ('www.example.com', etc.). $url should be the absolute URL for the page ('/index.html', '/cgi-bin/foo.cgi', etc.). $ssl should be 0 for non-secure cookies, or 1 for all (secure and normal) cookies. The return value is an array of names compatible with cookie_get(). cookie_set Params: $jar, $name, $value, $domain, $path, $expire, $secure Return: nothing Set the named cookie with the provided values into the %jar. $name is required to be a non-empty string. $value is required, and will delete the named cookie from the $jar if it is an empty string. $domain and $path can be strings or undefined. $expire is ignored (but exists for backwards-compatibility). $secure should be the numeric value of 0 or 1. crawl_new Params: $START, $MAX_DEPTH, \%request_hash [, \%tracking_hash ] Return: $crawl_object The crawl_new() functions initializes a crawl object (hash) to the default values, and then returns it for later use by crawl(). $START is the starting URL (in the form of 'http://www.host.com/url'), and MAX_DEPTH is the maximum number of levels to crawl (the START URL counts as 1, so a value of 2 will crawl the START URL and all URLs found on that page). The request_hash is a standard initialized request hash to be used for requests; you should set any authentication information or headers in this hash in order for the crawler to use them. The optional tracking_hash lets you supply a hash for use in tracking URL results (otherwise crawl_new() will allocate a new anon hash). crawl Params: $crawl_object [, $START, $MAX_DEPTH ] Return: $count [ undef on error ] The heart of the crawl package. Will perform an HTTP crawl on the specified HOST, starting at START URI, proceeding up to MAX_DEPTH. Crawl_object needs to be the variable returned by crawl_new(). You can also indirectly call crawl() via the crawl_object itself: $crawl_object->{crawl}->($START,$MAX_DEPTH) Returns the number of URLs actually crawled (not including those skipped). dump Params: $name, @array [, $name, \%hash, $name, $scalar ] Return: $code [ undef on error ] The dump function will take the given $name and data reference, and will create an ASCII perl code representation suitable for eval'ing later to recreate the same structure. $name is the name of the variable that it will be saved as. Example: $output = LW2::dump('request',\%request); NOTE: dump() creates anonymous structures under the name given. For example, if you dump the hash %hin under the name 'hin', then when you eval the dumped code you will need to use %$hin, since $hin is now a *reference* to a hash. dump_writefile Params: $file, $name, @array [, $name, \%hash, $name, @scalar ] Return: 0 if success; 1 if error This calls dump() and saves the output to the specified $file. Note: LW does not checking on the validity of the file name, it's creation, or anything of the sort. Files are opened in overwrite mode. encode_base64 Params: $data [, $eol] Return: $b64_encoded_data This function does Base64 encoding. If the binary MIME::Base64 module is available, it will use that; otherwise, it falls back to an internal perl version. The perl version carries the following copyright: Copyright 1995-1999 Gisle Aas <gisle@aas.no> NOTE: the $eol parameter will be inserted every 76 characters. This is used to format the data for output on a 80 character wide terminal. decode_base64 Params: $data Return: $b64_decoded_data A perl implementation of base64 decoding. The perl code for this function was actually taken from an older MIME::Base64 perl module, and bears the following copyright: Copyright 1995-1999 Gisle Aas <gisle@aas.no> encode_uri_hex Params: $data Return: $result This function encodes every character (except the / character) with normal URL hex encoding. encode_uri_randomhex Params: $data Return: $result This function randomly encodes characters (except the / character) with normal URL hex encoding. encode_uri_randomcase Params: $data Return: $result This function randomly changes the case of characters in the string. encode_unicode Params: $data Return: $result This function converts a normal string into Windows unicode format (non-overlong or anything fancy). decode_unicode Params: $unicode_string Return: $decoded_string This function attempts to decode a unicode (UTF-8) string by converting it into a single-byte-character string. Overlong characters are converted to their standard characters in place; non-overlong (aka multi-byte) characters are substituted with the 0xff; invalid encoding characters are left as-is. Note: this function is useful for dealing with the various unicode exploits/vulnerabilities found in web servers; it is *not* good for doing actual UTF-8 parsing, since characters over a single byte are basically dropped/replaced with a placeholder. encode_anti_ids Params: \%request, $modes Return: nothing encode_anti_ids computes the proper anti-ids encoding/tricks specified by $modes, and sets up %hin in order to use those tricks. Valid modes are (the mode numbers are the same as those found in whisker 1.4): 1 Encode some of the characters via normal URL encoding 2 Insert directory self-references (/./) 3 Premature URL ending (make it appear the request line is done) 4 Prepend a long random string in the form of "/string/../URL" 5 Add a fake URL parameter 6 Use a tab instead of a space as a request spacer 7 Change the case of the URL (works against Windows and Novell) 8 Change normal seperators ('/') to Windows version ('') 9 Session splicing [NOTE: not currently available] A Use a carriage return(0x0d) as a request spacer B Use binary value 0x0b as a request spacer You can set multiple modes by setting the string to contain all the modes desired; i.e. $modes="146" will use modes 1, 4, and 6. FORMS FUNCTIONS The goal is to parse the variable, human-readable HTML into concrete structures useable by your program. The forms functions does do a good job at making these structures, but I will admit: they are not exactly simple, and thus not a cinch to work with. But then again, representing something as complex as a HTML form is not a simple thing either. I think the results are acceptable for what's trying to be done. Anyways... Forms are stored in perl hashes, with elements in the following format: $form{'element_name'}=@([ 'type', 'value', @params ]) Thus every element in the hash is an array of anonymous arrays. The first array value contains the element type (which is 'select', 'textarea', 'button', or an 'input' value of the form 'input-text', 'input-hidden', 'input-radio', etc). The second value is the value, if applicable (it could be undef if no value was specified). Note that select elements will always have an undef value--the actual values are in the subsequent options elements. The third value, if defined, is an anonymous array of additional tag parameters found in the element (like 'onchange="blah"', 'size="20"', 'maxlength="40"', 'selected', etc). The array does contain one special element, which is stored in the hash under a NULL character ("