Awk scripting and usage of regex to locate a hyperlink


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Awk scripting and usage of regex to locate a hyperlink
# 1  
Old 04-28-2007
Awk scripting and usage of regex to locate a hyperlink

Hello guys,

I need to write awk script that would take an html page and output a list of each unique http link on that webpage followed by the number of times it occurred in that file.
e.g.
-----------------------------------------
Webpage: index.html

http://www.google.com/ 3
www.supersite.com/dir/dir2/index.html 5
-----------------------------------------

To do that I'm thinking of using regular expressions.

I'm using the following regex to find a hyper link in the html file.

Code:
/<(a|A).+(href|HREF)=\"(.+?)\">/

It outputs the whole line that contains the link. Say we have the following html code:
--------------------------------------------
<html>
<p> Here is some text before the link, the <a href = "www.google.com"> link </a> Some text after the link
</html>
--------------------------------------------

The output will be:
--------------------------------------------
Here is some text before the link, the <a href = "www.google.com"> link </a> Some text after the link
--------------------------------------------

What i need is to somehow get rid of all unnecessary output leaving the target url of a link and nothing else. So that the output would be:

--------------------------------------------
www.google.com
--------------------------------------------

I've tried using the following, however if the are several links on a line only the first link is found:
{ start = index($0, "<a")
end = index($0,"\">")
len = end - start
print substr($0,start,len) }


Can somebody help me please?
Thanks
# 2  
Old 04-28-2007
One warning and only one.

The rules prohibit homwork/classroom posts. Persisting in posting this type of question will result in being banned.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Regex in Shell Scripting to pick values

Hi After lot of trial and error I am really bowled out with the requirement in hand and honestly you are my last hope Here is what I want to achieve Values *IF *VALUE MS_SQL_Statistics_Summary.Client_Count_Percent_Used *GT 70.00 *AND *VALUE... (20 Replies)
Discussion started by: radioactive9
20 Replies

2. Shell Programming and Scripting

Hi im new to bash scripting I want to know what does the regex expression do ??

# check host value regex='^(||1|2|25)(\.(||1|2|25)){3}$' if ')" != "" ]; then if ]; then echo host $host not found exit 4 fi elif ]; then echo $host is an invalid host address exit 5 fi (1 Reply)
Discussion started by: kevin298
1 Replies

3. Shell Programming and Scripting

Need EOF usage in shell scripting

Hi all, Can any one explain the usage of EOF in shell scripting?? Gone through some examples from google, but it is not clear... Examples are: 1. $ tr << EOF > abcd > efgh > iojk > EOF O/P is: ABCD EFGH IOJK 2. echo << EOF (1 Reply)
Discussion started by: divya bandipotu
1 Replies

4. Shell Programming and Scripting

Cut on last backslash on hyperlink string-sed/awk??

hyper link- abc:8081/xyz/2.5.6/rtyp-2.5.6.jar Needs to get "rtyp-2.5.6.jar" i.e character after last backslash "/" how to do this using sed/awk?? help is highly appreciated. (7 Replies)
Discussion started by: kkscm
7 Replies

5. UNIX for Dummies Questions & Answers

Usage of locate

Hi The locate command searches the pattern in all the directories. How can i make it look in for a specific directory because i know the directory in which the file exists. Thanks (1 Reply)
Discussion started by: 2002anand
1 Replies

6. AIX

How to monitor the IBM AIX server for I/O usage,memory usage,CPU usage,network..?

How to monitor the IBM AIX server for I/O usage, memory usage, CPU usage, network usage, storage usage? (3 Replies)
Discussion started by: laknar
3 Replies

7. HP-UX

how can I find cpu usage memory usage swap usage and logical volume usage

how can I find cpu usage memory usage swap usage and I want to know CPU usage above X% and contiue Y times and memory usage above X % and contiue Y times my final destination is monitor process logical volume usage above X % and number of Logical voluage above can I not to... (3 Replies)
Discussion started by: alert0919
3 Replies

8. Gentoo

cpu%/mem% usage, scripting, dzen2: howto learn bash the hard way

I am trying to write a small (and rather simple) script to gather some info about the system and piping it to dzen2 first, i want to explain some things. I know i could have used conky, but my intention was to expand my knowledge of bash, pipes and redirections inside a script, and to have fun... (14 Replies)
Discussion started by: broli
14 Replies

9. Shell Programming and Scripting

usage...sed/awk/reg-exp ..in shell scripting

in shell scripting there is extensive usage of i> regular expression ii>sed iii>awk can anyone tell me the suitable contexts ...i mean which one is suitable for what kind of operation. like the reg-exp and sed seems to be doing the same job..i.e pattern matching (1 Reply)
Discussion started by: mobydick
1 Replies

10. UNIX for Dummies Questions & Answers

Usage of set in shell scripting

I am not able to understand the mentioned usage of set. $A set $l Please reply ASAP. Need to fix sth in my code. Thanks in advance. (0 Replies)
Discussion started by: trichyselva
0 Replies
Login or Register to Ask a Question