Scan for anchor tags in Perl?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Scan for anchor tags in Perl?
# 1  
Old 05-10-2013
Scan for anchor tags in Perl?

Hello all,

I have some .html files on my hard drive and trying to figure out (if it's possible) how to scan the files in the directory for <a> anchor tags to find linked files. I know how to bring the files in with Perl, but as text. Wondering if there's a way to probe the file for information.

Thank you
# 2  
Old 05-11-2013
I believe this will find all the <a> tags for you:
Code:
Match the characters "<a"
Match any single character that is not a line break character
Quantifiers must be preceded by a token that can be repeated «*»
Match the character ">" literally
Match any single character(Between zero and unlimited times, as many times as possible, giving back as needed (greedy))
Match the characters "</a>" literally

dot matches newlines

if ( $line =~ m!<a.(?s)*>.*</a>! ) {
    # perform code on match
}

This User Gave Thanks to spacebar For This Post:
# 3  
Old 05-11-2013
Thanks, my problem was bigger than that. I didn't know how to even scan the muck of information that was inside the .html file. But a day later, I thought why not try the unix "cat" command and boom, it gave me the muck of what I was after. Then I pawed through on how to do it in perl and then finally the stuff you're talking about.
I went with
Code:
($line =~ /.*<a.*http.*$/)

To get the external links, but I see your code has some value too, which I will be taking a look at. Thanks for your help!
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

What is the most widely used word anchor used in Regex?

Hello, All I learned from book about word anchor "\<" and "\>"; however when I tested them, they seem to work only in grep. Can anyone suggest word anchor that can be used in grep, awk, perl ...? (3 Replies)
Discussion started by: littlewenwen
3 Replies

2. UNIX for Dummies Questions & Answers

best way to scan?

i want to scan all open and closed ports on a server. how can i do this. i intend on using nmap, but if there are better ways to do it, please let me know. i understand there are a total of 6335 allowable ports on a server. so out of that 6335, i want to know which is open or closed. id... (1 Reply)
Discussion started by: SkySmart
1 Replies

3. AIX

Scan Rate

Hello, How can i tell ifthe ratio between fr and sr is ok? is fr/sr ratio of 0.9 acceptable? thanks. (1 Reply)
Discussion started by: LiorAmitai
1 Replies

4. Shell Programming and Scripting

Extracting anchor text and its URL from HTML files in BASH

Hi All, I have some HTML files and my requirement is to extract all the anchor text words from the HTML files along with their URLs and store the result in a separate text file separated by space. For example, <a href="/kid/stay_healthy/">Staying Healthy</a> which has /kid/stay_healthy/ as... (3 Replies)
Discussion started by: shoaibjameel123
3 Replies

5. Linux

SCEP and Trust Anchor

Hi Does anybody knows about the simple certificate enrollment protocol details ? if yes please provide me the details. And what is a trust anchor profile ? Thanks in advance. (0 Replies)
Discussion started by: chaitus.28
0 Replies

6. Shell Programming and Scripting

grep/egrep fails with $ anchor?

The $ seems to fail for me. I'm using GNU grep 2.5.4 (that is, nothing out of the ordinary, just what came with my distro) but I can't get the final anchor $ to work for me. (^ works as usual.) Behavior without anchor: $ /bin/grep -E 'tium' file tritium tertium quid Expected behavior: $... (2 Replies)
Discussion started by: CRGreathouse
2 Replies

7. Shell Programming and Scripting

scan directory

The script should _scan a specific directory _If a file name is like one provided, then run the command to send the file via CFT The name should be picked from a list. The current list is : ... (11 Replies)
Discussion started by: fireit
11 Replies

8. Shell Programming and Scripting

Perl script to scan back lines

Hi Perl gurus, I have this file to scan through. Sample lines below: 2008031A, USERNAME, 12345, give ABC, take XYZ, transaction submitted 2008031B, USERNAME, 12346, waiting for processing 2008031C, USERNAME, 12347, Retrieving response 2008031D, USERNAME, 12348, This is not a valid dealing... (3 Replies)
Discussion started by: gholdbhurg
3 Replies

9. Shell Programming and Scripting

Perl script to scan through files

Dear perl gurus, I plan to create a script that will scan through a logfile line by line. And if ever a certain line meets the below conditions, it will alert me via email. --> a) Position 10 to 13 = "ABCD" b) And also if the amount specified in position 620-640 is less than the amount in... (1 Reply)
Discussion started by: gholdbhurg
1 Replies

10. UNIX for Dummies Questions & Answers

IP Name scan

Hi. how to search a range of IP:s for their registed IP names? Like nslookup or host for all IPs 130.xxx.xxx.1 to 130.xxx.xxx.254 //nicke (2 Replies)
Discussion started by: nicke30
2 Replies
Login or Register to Ask a Question