Sponsored Content
Top Forums UNIX for Dummies Questions & Answers Awk: print all URL addresses between iframe tags without repeating an already printed URL Post 302602945 by striker4o on Tuesday 28th of February 2012 06:00:09 PM
Old 02-28-2012
Yes, thank you. Appears cleaner now, although I prefer to use sort -u instead of the if.

For now I use the following:

Code:
find . -name '*.html' -or -name '*.htm' -or -name '*.php' -type f| xargs awk -F\" -v RS='<' '/^iframe src=/ {print $2}' | sort -u

My next goal in this case would actually be to strip everything else but the hostname.

Meaning whatever is between "http://" and "/".

For example, here is a sample output I have now:

Code:
http://address.com/?click=5BBB08\
http://www.facebook.com/plugins/like.php?href

I would like the output to be just:

Code:
address.com
www.facebook.com

I will then consolidate this command with another one in a script in order to implement an easy way to list all unique Iframes in the user's web space and selectively remove those of unknown source (hacked).

I am starting to understand the concept of awk and sed, but there is just so much more to learn...

Thank you for your great help. I really appreciate all the effort!
 

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

url calling and parameter passing to url in script

Hi all, I need to write a unix script in which need to call a url. Then need to pass parameters to that url. please help. Regards, gander_ss (1 Reply)
Discussion started by: gander_ss
1 Replies

2. Shell Programming and Scripting

url calling and parameter passing to url in script

Hi all, I need to write a unix script in which need to call a url. Then need to pass parameters to that url. please help. Regards, gander_ss (1 Reply)
Discussion started by: gander_ss
1 Replies

3. UNIX for Dummies Questions & Answers

ReDirecting a URL to another URL - Linux

Hello, I need to redirect an existing URL, how can i do that? There's a current web address to a GUI that I have to redirect to another webaddress. Does anyone know how to do this? This is on Unix boxes Linux. example: https://m45.testing.address.net/host.php make it so the... (3 Replies)
Discussion started by: SkySmart
3 Replies

4. UNIX for Advanced & Expert Users

Need to grab URL and place between <A></A> Tags

my output looks like: <A HREF="http://support.apple.com/kb/HT1629"> </A> <A HREF="http://support.apple.com/kb/HT1200"> </A> <A HREF="http://old.nabble.com/AFP-eating-up-CPU-td19976358.html"> </A> <A HREF="http://jochsner.dyndns.org/scripts/NHR.html"> </A> <A... (3 Replies)
Discussion started by: glev2005
3 Replies

5. Shell Programming and Scripting

how to judge wether a url is valid or not using awk

rt 3ks:confused: (6 Replies)
Discussion started by: rainboisterous
6 Replies

6. Shell Programming and Scripting

Extract URL from RSS Feed in AWK

Hi, I have following data file; <outline title="Matt Cutts" type="rss" version="RSS" xmlUrl="http://www.mattcutts.com/blog/feed/" htmlUrl="http://www.mattcutts.com/blog"/> <outline title="Stone" text="Stone" type="rss" version="RSS" xmlUrl="http://feeds.feedburner.com/STC-Art"... (8 Replies)
Discussion started by: fahdmirza
8 Replies

7. Web Development

Regex to rewrite URL to another URL based on HTTP_HOST?

I am trying to find a way to test some code, but I need to rewrite a specific URL only from a specific HTTP_HOST The call goes out to http://SUB.DOMAIN.COM/showAssignment/7bde10b45efdd7a97629ef2fe01f7303/jsmodule/Nevow.Athena The ID in the middle is always random due to the cookie. I... (5 Replies)
Discussion started by: EXT3FSCK
5 Replies

8. UNIX for Dummies Questions & Answers

URL decoding with awk

The challenge: Decode URL's, i.e. convert %HEX to the corresponding special characters, using only UNIX base utilities, and without having to type out each special character. I have an anonymous C code snippet where the author assigns each hex digit a number from 0 to 16 and then does some... (2 Replies)
Discussion started by: uiop44
2 Replies

9. Shell Programming and Scripting

awk and or sed command to sum the value in repeating tags in a XML

I have a XML in which <Amt Ccy="EUR">3.1</Amt> tag repeats. This is under another tag <Main>. I need to sum all the values of <Amt Ccy=""> (Ccy may vary) coming under <Main> using awk and or sed command. can some help? Sample looks like below <root> <Main> ... (6 Replies)
Discussion started by: bk_12345
6 Replies

10. Shell Programming and Scripting

Reading URL using Mechanize and dump all the contents of the URL to a file

Hello, Am very new to perl , please help me here !! I need help in reading a URL from command line using PERL:: Mechanize and needs all the contents from the URL to get into a file. below is the script which i have written so far , #!/usr/bin/perl use LWP::UserAgent; use... (2 Replies)
Discussion started by: scott_cog
2 Replies
Locale::Codes::Country(3pm)				 Perl Programmers Reference Guide			       Locale::Codes::Country(3pm)

NAME
Locale::Codes::Country - standard codes for country identification SYNOPSIS
use Locale::Codes::Country; $country = code2country('jp' [,CODESET]); # $country gets 'Japan' $code = country2code('Norway' [,CODESET]); # $code gets 'no' @codes = all_country_codes( [CODESET]); @names = all_country_names(); # semi-private routines Locale::Codes::Country::alias_code('uk' => 'gb'); Locale::Codes::Country::rename_country('gb' => 'Great Britain'); DESCRIPTION
The "Locale::Codes::Country" module provides access to several code sets that can be used for identifying countries, such as those defined in ISO 3166-1. Most of the routines take an optional additional argument which specifies the code set to use. If not specified, the default ISO 3166-1 two-letter codes will be used. SUPPORTED CODE SETS
There are several different code sets you can use for identifying countries. A code set may be specified using either a name, or a constant that is automatically exported by this module. For example, the two are equivalent: $country = code2country('jp','alpha-2'); $country = code2country('jp',LOCALE_CODE_ALPHA_2); The codesets currently supported are: alpha-2, LOCALE_CODE_ALPHA_2 This is the set of two-letter (lowercase) codes from ISO 3166-1, such as 'tv' for Tuvalu. This is the default code set. alpha-3, LOCALE_CODE_ALPHA_3 This is the set of three-letter (lowercase) codes from ISO 3166-1, such as 'brb' for Barbados. These codes are actually defined and maintained by the U.N. Statistics division. numeric, LOCALE_CODE_NUMERIC This is the set of three-digit numeric codes from ISO 3166-1, such as 064 for Bhutan. These codes are actually defined and maintained by the U.N. Statistics division. If a 2-digit code is entered, it is converted to 3 digits by prepending a 0. fips-10, LOCALE_CODE_FIPS The FIPS 10 data are two-letter (uppercase) codes assigned by the National Geospatial-Intelligence Agency. dom, LOCALE_CODE_DOM The IANA is responsible for delegating management of the top level country domains. The country domains are the two-letter (lowercase) codes from ISO 3166 with a few other additions. ROUTINES
code2country ( CODE [,CODESET] ) country2code ( NAME [,CODESET] ) country_code2code ( CODE ,CODESET ,CODESET2 ) all_country_codes ( [CODESET] ) all_country_names ( [CODESET] ) Locale::Codes::Country::rename_country ( CODE ,NEW_NAME [,CODESET] ) Locale::Codes::Country::add_country ( CODE ,NAME [,CODESET] ) Locale::Codes::Country::delete_country ( CODE [,CODESET] ) Locale::Codes::Country::add_country_alias ( NAME ,NEW_NAME ) Locale::Codes::Country::delete_country_alias ( NAME ) Locale::Codes::Country::rename_country_code ( CODE ,NEW_CODE [,CODESET] ) Locale::Codes::Country::add_country_code_alias ( CODE ,NEW_CODE [,CODESET] ) Locale::Codes::Country::delete_country_code_alias ( CODE [,CODESET] ) These routines are all documented in the Locale::Codes::API man page. alias_code ( ALIAS, CODE [,CODESET] ) Version 2.07 included 2 functions for modifying the internal data: rename_country and alias_code. Both of these could be used only to modify the internal data for country codes. As of 3.10, the internal data for all types of codes can be modified. The alias_code function is preserved for backwards compatibility, but the following two are identical: alias_code(ALIAS,CODE [,CODESET]); rename_country_code(CODE,ALIAS [,CODESET]); and the latter should be used for consistency. The alias_code function is deprecated and will be removed at some point in the future. Note: this function was previously called _alias_code, but the leading underscore has been dropped. The old name was supported for all 2.X releases, but has been dropped as of 3.00. SEE ALSO
Locale::Codes The Locale-Codes distribution. Locale::Codes::API The list of functions supported by this module. Locale::SubCountry ISO codes for country sub-divisions (states, counties, provinces, etc), as defined in ISO 3166-2. This module is not part of the Locale-Codes distribution, but is available from CPAN in CPAN/modules/by-module/Locale/ http://www.iso.org/iso/country_codes Official home page for the ISO 3166 maintenance agency. Unfortunately, they do not make the actual ISO available for free, so I cannot check the alpha-3 and numerical codes here. http://www.iso.org/iso/list-en1-semic-3.txt http://www.iso.org/iso/home/standards/country_codes/iso-3166-1_decoding_table.htm The source of ISO 3166-1 two-letter codes used by this module. http://unstats.un.org/unsd/methods/m49/m49alpha.htm The source of the official ISO 3166-1 three-letter codes and three-digit codes. For some reason, this table is incomplete! Several countries are missing from it, and I cannot find them anywhere on the UN site. I get as much of the data from here as I can. http://earth-info.nga.mil/gns/html/digraphs.htm The official list of the FIPS 10 codes. http://www.iana.org/domains/ Official source of the top-level domain names. https://www.cia.gov/library/publications/the-world-factbook/appendix/print_appendix-d.html The World Factbook maintained by the CIA is a potential source of the data. Unfortunately, it adds/preserves non-standard codes, so it is no longer used as a source of data. http://www.statoids.com/wab.html Another unofficial source of data. Currently, it is not used to get data, but the notes and explanatory material were very useful for understanding discrepancies between the sources. AUTHOR
See Locale::Codes for full author history. Currently maintained by Sullivan Beck (sbeck@cpan.org). COPYRIGHT
Copyright (c) 1997-2001 Canon Research Centre Europe (CRE). Copyright (c) 2001-2010 Neil Bowers Copyright (c) 2010-2013 Sullivan Beck This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself. perl v5.18.2 2014-01-06 Locale::Codes::Country(3pm)
All times are GMT -4. The time now is 05:36 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy