10-26-2008
extracting domain names out of a text file
I am needing to extract and list domain names out of a very large text file. The text file contains tlds .com .net .org and others as well as third level domains e.g. host1.domain.com and the names are placed within paragraphs of text.
Domains do not have a http:// prefix so I'm thinking the only thing to match on would be the tlds for example match ".com", extract everything before it up to "space" character.
How would I go about doing this?
grep, sed and awk?
Thank you gurus!
Last edited by totus; 10-26-2008 at 03:45 PM..
10 More Discussions You Might Find Interesting
1. IP Networking
hey what the hell happens if you make sure (as best one can) that a domain name like anything.com is not used at all, and you set up your own DNS and use that name without registering with a registrar, i know if the address is in use you will make some people very upset and give many internet users... (2 Replies)
Discussion started by: norsk hedensk
2 Replies
2. UNIX for Dummies Questions & Answers
Hi,
We're an internet company with several domain names. Our mail server was originally set up to deal with xxx@domain1.com email addresses which works fine.
The problem I have is that we're now also using a domain2.com, and sales@domain1.com isn't the same as sales@domain2.com.
I've added... (1 Reply)
Discussion started by: captainash
1 Replies
3. Shell Programming and Scripting
Hi,
I have to perform an iterative function on a set of 10 files. After the first round the output files are named differently than the input files.
examples
input file name = xxxx1.yyy
output file name = xxxx1_0001.yyy
I need to rename all of the output files to the original input... (5 Replies)
Discussion started by: ligander
5 Replies
4. Shell Programming and Scripting
Hello,
i have a file contains the information like below
/home/username/domain.com/log/access
/home/username/domain23.net/log/access
/home/reseller/username/domain.com/log/access
using a loop i can read every line of the file but i wants to extract domain name like(domain.com,... (3 Replies)
Discussion started by: eyes_drinker
3 Replies
5. UNIX for Dummies Questions & Answers
Hi,
I have some ps files where I want to ectract/copy a certain number from and use that number to rename the ps file.
eg:
'file.ps' contains following text:
14 (09 01 932688 0)t
the text can be variable, the only fixed element is the '14 ('. The problem is that the fixed element can appear... (7 Replies)
Discussion started by: JohnDS
7 Replies
6. UNIX for Advanced & Expert Users
Hi All,
The following is the sample xml which is generated by a tool called HUDSON when ever change occurs in SVN(Sub version namespace).
In the given XML , path/paths tags ll be vary depends on no.of changes.
now , my requirement is, need a script which can extract the payment and... (1 Reply)
Discussion started by: geervani
1 Replies
7. Shell Programming and Scripting
Hello I have a large file with lines beginning with 552, 553, 554, below is a small sample, I need to extract the data you can see below highlighted in bold from this file on the same location on every line and output it to a new file.
Thank you in advance for any help
55201KL... (2 Replies)
Discussion started by: firefox2k2
2 Replies
8. UNIX for Dummies Questions & Answers
Hi,
I am trying to extract lines from a text file given a text file containing line numbers to be extracted from the first file. How do I go about doing this? Thanks! (1 Reply)
Discussion started by: evelibertine
1 Replies
9. UNIX for Dummies Questions & Answers
I am totally new to shell scripting. I want to see people from which domain access my website. I want to generate the domain names from IP addresses in the Apache access.log file.
There are around 54 log files. I concatenate all the files into one.
I am using Ubuntu 12.04 LTS.
So I... (4 Replies)
Discussion started by: Ronni
4 Replies
10. UNIX for Dummies Questions & Answers
I have a file like this:
http://article.wn.com/view/2010/11/26/IV_drug_policy_feels_HIV_patients_Red_Cross/ http://aidsjournal.com/,www.cfpa.org.cn/page1/page2 , www.youtube.com
http://seattletimes.nwsource.com/html/jerrybrewer/2013517803_brewer25.html... (1 Reply)
Discussion started by: csim_mohan
1 Replies
LEARN ABOUT DEBIAN
net::domain::tld
Net::Domain::TLD(3pm) User Contributed Perl Documentation Net::Domain::TLD(3pm)
NAME
Net::Domain::TLD - Work with TLD names
SYNOPSIS
use Net::Domain::TLD qw(tlds tld_exists);
my @ccTLDs = tlds('cc');
print "TLD ok
" if tld_exists('ac','cc');
DESCRIPTION
The purpose of this module is to provide user with current list of
available top level domain names including new ICANN additions and ccTLDs
Currently TLD definitions have been acquired from the following sources:
http://www.icann.org/tlds/
http://www.dnso.org/constituency/gtld/gtld.html
http://www.iana.org/cctld/cctld-whois.htm
PUBLIC METHODS
Each public function/method is described here.
These are how you should interact with this module.
"tlds"
This routine returns the tlds requested.
my @all_tlds = tlds; #array of tlds
my $all_tlds = tlds; #hashref of tlds and their descriptions
my @cc_tlds = tlds('cc'); #array of just 'cc' type tlds
my $cc_tlds = tlds('cc'); #hashref of just 'cc' type tlds and their descriptions
Valid types are:
cc - country code domains
gtld_open - generic domains that anyone can register
gtld_restricted - generic restricted registration domains
new_open - recently added generic domains
new_restricted - new restricted registration domains
"tld_exists"
This routine returns true if the given domain exists and false otherwise.
die "no such domain" unless tld_exists($tld); #call without tld type
die "no such domain" unless tld_exists($tld, 'new_open'); #call with tld type
COPYRIGHT
Copyright (c) 2003-2005 Alex Pavlovic, all rights reserved. This program
is free software; you can redistribute it and/or modify it under the same terms
as Perl itself.
AUTHORS
Alexander Pavlovic <alex.pavlovic@taskforce-1.com>
Ricardo SIGNES <rjbs@cpan.org>
perl v5.10.1 2011-04-18 Net::Domain::TLD(3pm)