Extracting anchor text and its URL from HTML files in BASH
Hi All,
I have some HTML files and my requirement is to extract all the anchor text words from the HTML files along with their URLs and store the result in a separate text file separated by space. For example,
which has /kid/stay_healthy/ as the URL or path and Staying Healthy as the anchor text.
I want to extract both the above and store in a text file separated by spaces like
New path and new anchor now comes in another line (newline) and so on.
This is what I have tried so far. Got this code from the internet (to be very honest!):
The problem with the above code is that it is not able to extract the anchor text, second it is doing for a single HTML file. For storing the result in a separate file, I can just redirect the output to a text file using >
Hi Everyone,
I'm really new to all this so I'm really hoping someone can help. I have a directory with ~1000 lists from which I want to extract lines from and write to new files. For simplicity lets say they are shopping lists and I want to write out the lines corresponding to apples to a new... (2 Replies)
i m unable to extract data from one text files to different text files..i am able to concat two text files in d same file
---------- Post updated at 03:21 PM ---------- Previous update was at 03:16 PM ----------
i want a c program for it (2 Replies)
hi ,
i need to create a bash shell script that insert a text data file into an html made table, this table output has to mailed.I am new to shell scripting and have a very minimum idea of shell scripting.
please help. (9 Replies)
Hello Everyone,
I am trying to write a shell script(or Perl Script) that would do the following:
I have a file that contains the following lines:
File:
https://ims-svnus.com/dev/DB/trunk/feeds/templates/shell_script.txt -r860... (5 Replies)
Hey guys, looking for a way to encode a string into URL and HTML in a bash script that I'm making to encode strings in various different digests etc.
Can't find anything on it anywhere else on the forums.
Any help much appreciated, still very new to bash and programming etc. (4 Replies)
I have the file like this:
Timestamp URL Text 1331635241000 http://example.com Peoples footage at www.test.com,http://example4.com 1331635231000 http://example1.net crack the nuts http://example6.com 1331635280000 http://example2.net ... (0 Replies)
I have the file like this:
Timestamp URL Text 1331635241000 http://example.com Peoples footage at www.test.com,http://example4.com 1331635231000 http://example1.net crack the nuts http://example6.com 1331635280000 http://example2.net ... (0 Replies)
I have the file like this:
Timestamp URL Text 1331635241000 http://example.com Peoples footage at www.test.com,http://example4.com 1331635231000 http://example1.net crack the nuts http://example6.com 1331635280000 http://example2.net ... (3 Replies)
I have a file like this:
http://article.wn.com/view/2010/11/26/IV_drug_policy_feels_HIV_patients_Red_Cross/ http://aidsjournal.com/,www.cfpa.org.cn/page1/page2 , www.youtube.com
http://seattletimes.nwsource.com/html/jerrybrewer/2013517803_brewer25.html... (1 Reply)
In the bash below each .tar.bz2 (usually 2) are extracted and then the original .tar.bz2 is removed. However, only one (presumably the first extracted) is being removed, however both are extracted. I am not sure why this is? Thank you :).
tar.bz2 folders in /home/cmccabe/Desktop/NGS/API
... (3 Replies)
Discussion started by: cmccabe
3 Replies
LEARN ABOUT MOJAVE
pod::parselink5.18
Pod::ParseLink(3pm) Perl Programmers Reference Guide Pod::ParseLink(3pm)NAME
Pod::ParseLink - Parse an L<> formatting code in POD text
SYNOPSIS
use Pod::ParseLink;
my ($text, $inferred, $name, $section, $type) = parselink ($link);
DESCRIPTION
This module only provides a single function, parselink(), which takes the text of an L<> formatting code and parses it. It returns the
anchor text for the link (if any was given), the anchor text possibly inferred from the name and section, the name or URL, the section if
any, and the type of link. The type will be one of "url", "pod", or "man", indicating a URL, a link to a POD page, or a link to a Unix
manual page.
Parsing is implemented per perlpodspec. For backward compatibility, links where there is no section and name contains spaces, or links
where the entirety of the link (except for the anchor text if given) is enclosed in double-quotes are interpreted as links to a section
(L</section>).
The inferred anchor text is implemented per perlpodspec:
L<name> => L<name|name>
L</section> => L<"section"|/section>
L<name/section> => L<"section" in name|name/section>
The name may contain embedded E<> and Z<> formatting codes, and the section, anchor text, and inferred anchor text may contain any
formatting codes. Any double quotes around the section are removed as part of the parsing, as is any leading or trailing whitespace.
If the text of the L<> escape is entirely enclosed in double quotes, it's interpreted as a link to a section for backward compatibility.
No attempt is made to resolve formatting codes. This must be done after calling parselink() (since E<> formatting codes can be used to
escape characters that would otherwise be significant to the parser and resolving them before parsing would result in an incorrect parse of
a formatting code like:
L<verticalE<verbar>barE<sol>slash>
which should be interpreted as a link to the "vertical|bar/slash" POD page and not as a link to the "slash" section of the "bar" POD page
with an anchor text of "vertical". Note that not only the anchor text will need to have formatting codes expanded, but so will the target
of the link (to deal with E<> and Z<> formatting codes), and special handling of the section may be necessary depending on whether the
translator wants to consider markup in sections to be significant when resolving links. See perlpodspec for more information.
SEE ALSO
Pod::Parser
The current version of this module is always available from its web site at <http://www.eyrie.org/~eagle/software/podlators/>.
AUTHOR
Russ Allbery <rra@stanford.edu>.
COPYRIGHT AND LICENSE
Copyright 2001, 2008, 2009 Russ Allbery <rra@stanford.edu>.
This program is free software; you may redistribute it and/or modify it under the same terms as Perl itself.
perl v5.18.2 2013-11-04 Pod::ParseLink(3pm)