extracting domain names out of a text file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting extracting domain names out of a text file
# 1  
Old 10-26-2008
Question extracting domain names out of a text file

I am needing to extract and list domain names out of a very large text file. The text file contains tlds .com .net .org and others as well as third level domains e.g. host1.domain.com and the names are placed within paragraphs of text.

Domains do not have a http:// prefix so I'm thinking the only thing to match on would be the tlds for example match ".com", extract everything before it up to "space" character.

How would I go about doing this?

grep, sed and awk?

Thank you gurus!Smilie

Last edited by totus; 10-26-2008 at 03:45 PM..
# 2  
Old 10-28-2008
er, you could use any of them, but perl is better suited:
Code:
perl -n -e '/\b\S+\.(com|org|edu)\b/ && print $&,"\n"; '

# 3  
Old 10-28-2008
grep *.com
grep *.net
and so on..
# 4  
Old 10-28-2008
Hammer & Screwdriver Does this work for you?

Code:
> cat file06
blah blah www.boston.com more blah
ha ha yech yes nope not yet tomorrow
today www.unix.com future www.unix.org
forever and ever sportsillustrated.cnn.com high

> cat file06 | tr " " "\n" | grep .com
www.boston.com
www.unix.com
sportsillustrated.cnn.com

# 5  
Old 02-11-2009
Please help modify solution

I am trying to extract .co.uk domains from html,
using the command:
cat $DIR/oldfile.txt | tr " " "\n" | grep [A-Za-z0-9_\.-].co.uk > $DIR/newfile.txt

The problem is that this command matches:
/>domain.co.uk<br
/>domain.co.uk<br
/>domain.co.uk<br
etc

How do I modify my regexp to match alphanumeric chars only? (apart from the dots and possible hyphens)

Many Thanks,

Hal
# 6  
Old 02-11-2009
Well, if you change it to match alphanumeric only, then you get:
Code:
domain.co.ukbr

So I don't think that's what you want. If your grep accepts -o, you can do:
Code:
grep -o '[A-Za-z0-9_\.-]*.co.uk'

If not, use sed instead of grep:
Code:
sed 's/.*\([A-Za-z0-9_\.-]*.co.uk\).*/\1/'

# 7  
Old 02-11-2009
Thank you Otheus. Working fine with grep -o.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Extracting URL with domain

I have a file like this: http://article.wn.com/view/2010/11/26/IV_drug_policy_feels_HIV_patients_Red_Cross/ http://aidsjournal.com/,www.cfpa.org.cn/page1/page2 , www.youtube.com http://seattletimes.nwsource.com/html/jerrybrewer/2013517803_brewer25.html... (1 Reply)
Discussion started by: csim_mohan
1 Replies

2. UNIX for Dummies Questions & Answers

Get domain names from IP addresses of apache2 access.log

I am totally new to shell scripting. I want to see people from which domain access my website. I want to generate the domain names from IP addresses in the Apache access.log file. There are around 54 log files. I concatenate all the files into one. I am using Ubuntu 12.04 LTS. So I... (4 Replies)
Discussion started by: Ronni
4 Replies

3. UNIX for Dummies Questions & Answers

Extracting lines from a text file based on another text file with line numbers

Hi, I am trying to extract lines from a text file given a text file containing line numbers to be extracted from the first file. How do I go about doing this? Thanks! (1 Reply)
Discussion started by: evelibertine
1 Replies

4. Shell Programming and Scripting

help extracting text from file

Hello I have a large file with lines beginning with 552, 553, 554, below is a small sample, I need to extract the data you can see below highlighted in bold from this file on the same location on every line and output it to a new file. Thank you in advance for any help 55201KL... (2 Replies)
Discussion started by: firefox2k2
2 Replies

5. UNIX for Advanced & Expert Users

extracting the component names from SVN changes xml file

Hi All, The following is the sample xml which is generated by a tool called HUDSON when ever change occurs in SVN(Sub version namespace). In the given XML , path/paths tags ll be vary depends on no.of changes. now , my requirement is, need a script which can extract the payment and... (1 Reply)
Discussion started by: geervani
1 Replies

6. UNIX for Dummies Questions & Answers

extracting text and reusing the text to rename file

Hi, I have some ps files where I want to ectract/copy a certain number from and use that number to rename the ps file. eg: 'file.ps' contains following text: 14 (09 01 932688 0)t the text can be variable, the only fixed element is the '14 ('. The problem is that the fixed element can appear... (7 Replies)
Discussion started by: JohnDS
7 Replies

7. Shell Programming and Scripting

please help, find domain names in string

Hello, i have a file contains the information like below /home/username/domain.com/log/access /home/username/domain23.net/log/access /home/reseller/username/domain.com/log/access using a loop i can read every line of the file but i wants to extract domain name like(domain.com,... (3 Replies)
Discussion started by: eyes_drinker
3 Replies

8. Shell Programming and Scripting

processing file names using text files

Hi, I have to perform an iterative function on a set of 10 files. After the first round the output files are named differently than the input files. examples input file name = xxxx1.yyy output file name = xxxx1_0001.yyy I need to rename all of the output files to the original input... (5 Replies)
Discussion started by: ligander
5 Replies

9. UNIX for Dummies Questions & Answers

Using Sendmail for multiple domain names

Hi, We're an internet company with several domain names. Our mail server was originally set up to deal with xxx@domain1.com email addresses which works fine. The problem I have is that we're now also using a domain2.com, and sales@domain1.com isn't the same as sales@domain2.com. I've added... (1 Reply)
Discussion started by: captainash
1 Replies

10. IP Networking

using unregistered domain names

hey what the hell happens if you make sure (as best one can) that a domain name like anything.com is not used at all, and you set up your own DNS and use that name without registering with a registrar, i know if the address is in use you will make some people very upset and give many internet users... (2 Replies)
Discussion started by: norsk hedensk
2 Replies
Login or Register to Ask a Question