Linux and UNIX Man Pages

Linux & Unix Commands - Search Man Pages

urifind(1p) [debian man page]

URIFIND(1p)						User Contributed Perl Documentation					       URIFIND(1p)

NAME
urifind - find URIs in a document and dump them to STDOUT. SYNOPSIS
$ urifind file DESCRIPTION
urifind is a simple script that finds URIs in one or more files (using "URI::Find"), and outputs them to to STDOUT. That's it. To find all the URIs in file1, use: $ urifind file1 To find the URIs in multiple files, simply list them as arguments: $ urifind file1 file2 file3 urifind will read from "STDIN" if no files are given or if a filename of "-" is specified: $ wget http://www.boston.com/ -O - | urifind When multiple files are listed, urifind prefixes each found URI with the file from which it came: $ urifind file1 file2 file1: http://www.boston.com/index.html file2: http://use.perl.org/ This can be turned on for single files with the "-p" ("prefix") switch: $urifind -p file3 file1: http://fsck.com/rt/ It can also be turned off for multiple files with the "-n" ("no prefix") switch: $ urifind -n file1 file2 http://www.boston.com/index.html http://use.perl.org/ By default, URIs will be displayed in the order found; to sort them ascii-betically, use the "-s" ("sort") option. To reverse sort them, use the "-r" ("reverse") flag ("-r" implies "-s"). $ urifind -s file1 file2 http://use.perl.org/ http://www.boston.com/index.html mailto:webmaster@boston.com $ urifind -r file1 file2 mailto:webmaster@boston.com http://www.boston.com/index.html http://use.perl.org/ Finally, urifind supports limiting the returned URIs by scheme or by arbitrary pattern, using the "-S" option (for schemes) and the "-P" option. Both "-S" and "-P" can be specified multiple times: $ urifind -S mailto file1 mailto:webmaster@boston.com $ urifind -S mailto -S http file1 mailto:webmaster@boston.com http://www.boston.com/index.html "-P" takes an arbitrary Perl regex. It might need to be protected from the shell: $ urifind -P 's?html?' file1 http://www.boston.com/index.html $ urifind -P '.org' -S http file4 http://www.gnu.org/software/wget/wget.html Add a "-d" to have urifind dump the refexen generated from "-S" and "-P" to "STDERR". "-D" does the same but exits immediately: $ urifind -P '.org' -S http -D $scheme = '^(http):' @pats = ('^(http):', '.org') To remove duplicates from the results, use the "-u" ("unique") switch. OPTION SUMMARY
-s Sort results. -r Reverse sort results (implies -s). -u Return unique results only. -n Don't include filename in output. -p Include filename in output (0 by default, but 1 if multiple files are included on the command line). -P $re Print only lines matching regex '$re' (may be specified multiple times). -S $scheme Only this scheme (may be specified multiple times). -h Help summary. -v Display version and exit. -d Dump compiled regexes for "-S" and "-P" to "STDERR". -D Same as "-d", but exit after dumping. AUTHOR
darren chamberlain <darren@cpan.org> COPYRIGHT
(C) 2003 darren chamberlain This library is free software; you may distribute it and/or modify it under the same terms as Perl itself. SEE ALSO
URI::Find perl v5.14.2 2012-04-08 URIFIND(1p)
Man Page