hxextract(1) debian man page | unix.com

Man Page: hxextract

Operating Environment: debian

Section: 1

HXEXTRACT(1)							  HTML-XML-utils						      HXEXTRACT(1)

NAME
hxextract - extract selected elements from a HTML or XML file
SYNOPSIS
hxextract [ -h | -? ] [ -x ] [ -s text ] [ -e text ] [ -b base ] element-or-class [ -c configfile | file-or-URL ]
DESCRIPTION
hxextract outputs all elements with a certain name and/or class. Input must be well-formed, since no HTML heuristics are applied.
OPTIONS
The following options are supported: -x Use XML format conventions. -s text Insert text at the start of the output. -e text Insert text at the end of the output. -b base URL base -c configfile Read @chapter lines from configfile (lines must be of the form "@chapter filename") and extract elements from each of those files. -h, -? Print command usage.
OPERANDS
The following operands are supported: element-or-class The name of an element to extract (e.g., "H2"), or the name of a class preceded by "." (e.g., ".example") or a combination of both (e.g., "H2.example"). file-or-URL A file name or a URL. To read from standard input, use "-".
ENVIRONMENT
To use a proxy to retrieve remote files, set the environment variables http_proxy and ftp_proxy. E.g., http_proxy="http://localhost:8080/"
BUGS
Remote files (specified with a URL) are currently only supported for HTTP. Password-protected files or files that depend on HTTP "cookies" are not handled. (You can use tools such as curl(1) or wget(1) to retrieve such files.)
SEE ALSO
hxselect(1) 6.x 10 Jul 2011 HXEXTRACT(1)
Related Man Pages
hxcopy(1) - debian
hxincl(1) - debian
hxnormalize(1) - debian
hxpipe(1) - debian
hxunpipe(1) - debian
Similar Topics in the Unix Linux Community
c program to extract text between two delimiters from some text file
Loop through text file > Copy Folder > Edit XML files in bulk?
Awk: print all URL addresses between iframe tags without repeating an already printed URL
How to Parse the XML data along with the URL in Shell Script?
Unexplained text in data files