Sponsored Content
Top Forums UNIX for Dummies Questions & Answers Awk: print all URL addresses between iframe tags without repeating an already printed URL Post 302602945 by striker4o on Tuesday 28th of February 2012 06:00:09 PM
Old 02-28-2012
Yes, thank you. Appears cleaner now, although I prefer to use sort -u instead of the if.

For now I use the following:

Code:
find . -name '*.html' -or -name '*.htm' -or -name '*.php' -type f| xargs awk -F\" -v RS='<' '/^iframe src=/ {print $2}' | sort -u

My next goal in this case would actually be to strip everything else but the hostname.

Meaning whatever is between "http://" and "/".

For example, here is a sample output I have now:

Code:
http://address.com/?click=5BBB08\
http://www.facebook.com/plugins/like.php?href

I would like the output to be just:

Code:
address.com
www.facebook.com

I will then consolidate this command with another one in a script in order to implement an easy way to list all unique Iframes in the user's web space and selectively remove those of unknown source (hacked).

I am starting to understand the concept of awk and sed, but there is just so much more to learn...

Thank you for your great help. I really appreciate all the effort!
 

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

url calling and parameter passing to url in script

Hi all, I need to write a unix script in which need to call a url. Then need to pass parameters to that url. please help. Regards, gander_ss (1 Reply)
Discussion started by: gander_ss
1 Replies

2. Shell Programming and Scripting

url calling and parameter passing to url in script

Hi all, I need to write a unix script in which need to call a url. Then need to pass parameters to that url. please help. Regards, gander_ss (1 Reply)
Discussion started by: gander_ss
1 Replies

3. UNIX for Dummies Questions & Answers

ReDirecting a URL to another URL - Linux

Hello, I need to redirect an existing URL, how can i do that? There's a current web address to a GUI that I have to redirect to another webaddress. Does anyone know how to do this? This is on Unix boxes Linux. example: https://m45.testing.address.net/host.php make it so the... (3 Replies)
Discussion started by: SkySmart
3 Replies

4. UNIX for Advanced & Expert Users

Need to grab URL and place between <A></A> Tags

my output looks like: <A HREF="http://support.apple.com/kb/HT1629"> </A> <A HREF="http://support.apple.com/kb/HT1200"> </A> <A HREF="http://old.nabble.com/AFP-eating-up-CPU-td19976358.html"> </A> <A HREF="http://jochsner.dyndns.org/scripts/NHR.html"> </A> <A... (3 Replies)
Discussion started by: glev2005
3 Replies

5. Shell Programming and Scripting

how to judge wether a url is valid or not using awk

rt 3ks:confused: (6 Replies)
Discussion started by: rainboisterous
6 Replies

6. Shell Programming and Scripting

Extract URL from RSS Feed in AWK

Hi, I have following data file; <outline title="Matt Cutts" type="rss" version="RSS" xmlUrl="http://www.mattcutts.com/blog/feed/" htmlUrl="http://www.mattcutts.com/blog"/> <outline title="Stone" text="Stone" type="rss" version="RSS" xmlUrl="http://feeds.feedburner.com/STC-Art"... (8 Replies)
Discussion started by: fahdmirza
8 Replies

7. Web Development

Regex to rewrite URL to another URL based on HTTP_HOST?

I am trying to find a way to test some code, but I need to rewrite a specific URL only from a specific HTTP_HOST The call goes out to http://SUB.DOMAIN.COM/showAssignment/7bde10b45efdd7a97629ef2fe01f7303/jsmodule/Nevow.Athena The ID in the middle is always random due to the cookie. I... (5 Replies)
Discussion started by: EXT3FSCK
5 Replies

8. UNIX for Dummies Questions & Answers

URL decoding with awk

The challenge: Decode URL's, i.e. convert %HEX to the corresponding special characters, using only UNIX base utilities, and without having to type out each special character. I have an anonymous C code snippet where the author assigns each hex digit a number from 0 to 16 and then does some... (2 Replies)
Discussion started by: uiop44
2 Replies

9. Shell Programming and Scripting

awk and or sed command to sum the value in repeating tags in a XML

I have a XML in which <Amt Ccy="EUR">3.1</Amt> tag repeats. This is under another tag <Main>. I need to sum all the values of <Amt Ccy=""> (Ccy may vary) coming under <Main> using awk and or sed command. can some help? Sample looks like below <root> <Main> ... (6 Replies)
Discussion started by: bk_12345
6 Replies

10. Shell Programming and Scripting

Reading URL using Mechanize and dump all the contents of the URL to a file

Hello, Am very new to perl , please help me here !! I need help in reading a URL from command line using PERL:: Mechanize and needs all the contents from the URL to get into a file. below is the script which i have written so far , #!/usr/bin/perl use LWP::UserAgent; use... (2 Replies)
Discussion started by: scott_cog
2 Replies
HXUNPIPE(1)							  HTML-XML-utils						       HXUNPIPE(1)

NAME
hxunpipe - convert output of hxpipe back to XML format SYNOPSIS
hxunpipe [ file-or-URL ] DESCRIPTION
hxunpipe takes the output of hxpipe(1) (or of onsgmls(1)) and turns it back into XML/SGML mark-up. OPERANDS
The following operand is supported: file-or-URL The name or URL of an HTML file. If absent, standard input is read instead. EXIT STATUS
The following exit values are returned: 0 Successful completion. > 0 An error occurred in the input. ENVIRONMENT
To use a proxy to retrieve remote files, set the environment variables http_proxy and ftp_proxy. E.g., http_proxy="http://localhost:8080/" BUGS
Not all syntax errors in the input are recognized. hxunpipe can currently only retrieve remote files over HTTP. It doesn't handle password-protected files, nor files whose content depends on HTTP "cookies." SEE ALSO
hxpipe(1), onsgmls(1). 6.x 10 Jul 2011 HXUNPIPE(1)
All times are GMT -4. The time now is 03:20 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy