Linux and UNIX Man Pages

Linux & Unix Commands - Search Man Pages

hxwls(1) [debian man page]

HXWLS(1)							  HTML-XML-utils							  HXWLS(1)

NAME
hxwls - list links in an HTML file SYNOPSIS
hxwls [ -l ] [ -t ] [ -r ] [ -h ] [ -b base ] [ file ] DESCRIPTION
The hxwls command reads an HTML file (standard input by default) and prints out all links it finds. The output is written to stdout. OPTIONS
The following options are supported: -l Produce a long listing. Instead of just the URI, hxwls prints three columns: the element name, the value of the REL attribute, and the target URI. -t Produce a tuple listing. hxwls prints four columns: the URI of the document itself, the element name, the value of the REL attribute, and the target URI. -r Print relative URLs as they are, without converting them to absolute URLs. -b base Use base as the initial base URL. If there is a <base> element in the document, it will override the -b option. -h Output as HTML. The output will be listed in the form of <a> elements. OPERANDS
The following operand is supported: file The name or the URL of an HTML file. If absent, standard input is read instead. DIAGNOSTICS
The following exit values are returned: 0 Successful completion. > 0 An error occurred in the parsing of the HTML file. hxwls will try to correct the error and produce output anyway. SEE ALSO
asc2xml(1), hxnormalize(1), hxnum(1), xml2asc(1) 6.x 10 Jul 2011 HXWLS(1)

Check Out this Related Man Page

HXTOC(1)							  HTML-XML-utils							  HXTOC(1)

NAME
hxtoc - insert a table of contents in an HTML file SYNOPSIS
hxtoc [ -x ] [ -l low ] [ -h high ] [ file ] [ -t ] [ -d ] [ -c class ] DESCRIPTION
The hxtoc command reads an HTML file, inserts missing ID attributes in all H1 to H6 elements between the levels -l and -h (unless the option -d is in effect, see below) and also inserts A elements with NAME attributes, so old browsers will recognize the H1 to H6 headers as target anchors as well (unless the option -t is in effect). The output is written to stdout. If there is a comment of the form <!--toc--> or a pair of comments <!--begin-toc--> ... <!--end-toc--> then the comment, or the pair with everything in between, will be replaced by a table of contents, consisting of a list (UL) of links to all headers in the document. The text of headers is copied to this table of contents, including any inline markup, except that DFN tags and SPAN tags with a CLASS of "index" are omitted (but the elements content is copied). If a header has a CLASS attribute with as value (or one of its values) the keyword "no-toc", then that header will not appear in the table of contents. OPTIONS
The following options are supported: -x Use XML conventions: empty elements are written with a slash at the end: <IMG /> -l low Sets the lowest numbered header to appear in the table of content. Default is 1 (i.e., H1). -h high Sets the highest numbered header to appear in the table of content. Default is 6 (i.e., H6). -t Normally, hxtoc adds both ID attributes and empty A elements with a NAME attribute and CLASS="bctarget", so that older browsers that do no understand ID will still find the target. With this option, the A elements will not be generated. -c class The generated UL elements in the table of contents will have a CLASS attribute with the value class. The default is "toc". -d Tries to use DIV elements as targets instead of H1 to H6: If a header element H1 to H6 within the range -l to -h is found and it is the first child of a DIV element, then the table of contents will link to the DIV instead of to the header element. The DIV will be given an ID if it doesn't have one yet. ID OPERANDS
The following operand is supported: file The name of an HTML file. If absent, standard input is read instead. DIAGNOSTICS
The following exit values are returned: 0 Successful completion. > 0 An error occurred in the parsing of the HTML file. hxtoc will try to correct the error and produce output anyway. SEE ALSO
asc2xml(1), hxnormalize(1), hxnum(1), xml2asc(1) BUGS
The error recovery for incorrect HTML is primitive. 6.x 10 Jul 2011 HXTOC(1)
Man Page