The UNIX and Linux Forums  
Hello and Welcome from United States to the UNIX and Linux Forums! Thank You for Visiting and Joining Our Global Community.

Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
.
google unix.com



Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Investigating strange dialup activity with Wireshark iBot UNIX and Linux RSS News 0 07-01-2008 12:20 PM
man pages in AIX dlynch912 AIX 5 10-19-2005 09:04 AM
man pages dangral UNIX for Dummies Questions & Answers 4 02-04-2003 10:29 PM
man pages bensky UNIX for Dummies Questions & Answers 3 03-01-2002 06:37 AM
Man pages DPAI UNIX for Dummies Questions & Answers 2 02-17-2002 09:08 PM

 
English Japanese Spanish French German Portuguese Italian Dutch Swedish Russian Norwegian Hungarian Hebrew Danish Bulgarian Greek Powered by Powered by Google
 
LinkBack Thread Tools Search this Thread Rate Thread Display Modes
Prev Previous Post   Next Post Next
  #1 (permalink)  
Old 03-26-2009
adpe adpe is offline
Registered User
  
 

Join Date: Mar 2009
Posts: 2
Investigating web pages in awk

hello. i want to make an awk script to search an html file and output all the links (e.g .html, .htm, .jpg, .doc, .pdf, etc..) inside it. also, i want the links that will be output to be split into 3 groups (separated by an empty line), the first group with links to other webpages (.html .htm etc), the second group with links to images (.jpg .jpeg) and the third group with links to .pdf .doc or other downloadable files. and next to each link i want to output how many times each one occurs in the html file.

(i am only doing the links first, then once I have crakced this i will be able to do the other formats easily)

So I have currently got...

BEGIN{FS = " "}
{for (i=1; i<=NF;i++){if ($i ~ /^href/) {print $i}}
}
#
END{}

which prints out the word e.g href="index.html" > , I would like this to just print out...index.html and the number of times it appears in the webpage.

Any help/hints on how i could achieve the top paragraph would be a great help.

Last edited by adpe; 04-28-2009 at 02:30 PM..
 

Bookmarks

Tags
awk, html, parsing html

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT -4. The time now is 10:33 PM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited. Language Translations Powered by .
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios
The UNIX and Linux Forums Content Copyright ©1993-2009. All Rights Reserved.Ad Management by RedTyger

Content Relevant URLs by vBSEO 3.2.0