Trying to extract domain and tld from list of urls.


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Trying to extract domain and tld from list of urls.
# 1  
Old 05-16-2012
Trying to extract domain and tld from list of urls.

I have done a fair amount of searching the threads, but I have not been able to cobble together a solution to my challenge. What I am trying to do is to line edit a file that will leave behind only the domain and tld of a long list of urls. The list looks something like this:
Code:
www.google.com
ja.wikipedia.org
bbc.co.uk
fr-fr.facebook.com

and I would like to end up with:
Code:
google.com
wikipedia.org
bbc.co.uk
facebook.com

I prefer bash, but am learning ruby and perl....though not very good at them yet. I have used ruby's URI function to extract the input links above...is there another ruby function I am overlooking for domain.tld?

Thanks!
# 2  
Old 05-16-2012
cut command?

Code:
cut -d"." -f2,3

# 3  
Old 05-16-2012
First you are going to need to come up with some rules per exactly what you want to strip off, Do you want to just strip off the first period and what is before it?

Check out these sites:

Code:
www.iptools.com
en.wikipedia.org/wiki/URI_scheme

Also the "cut" command example that was posted will do the strip of the first period and anything before it from the bash command line:
Code:
$ link="fr-fr.facebook.com"; echo $link | cut -d"." -f2,3
facebook.com

# 4  
Old 05-16-2012
Thanks for the quick replies. Straight out of the box, the "cut" command would not work, but a workaround might be to separate my file first into all of the .com, .net, etc. and then apply customized cut commands. Let me give that a try....
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Solaris

How to list physical CPU on primary domain?

How to list physical CPU on primary domain? Sparc SPARC T5-4 psrinfo -p 1 in ILOM I see Processors: 4 / 4 (2 Replies)
Discussion started by: thomasj
2 Replies

2. Red Hat

List domain groups

Hi Need to list all gid for particular domain user. Actually in database getting error like one of the gid that user belongs is invalid. please suggest. thanks Paul (1 Reply)
Discussion started by: Mathew_paul
1 Replies

3. What is on Your Mind?

Tld.subdomain.name.subname

Way back in the early dawn of the 'net, there were two competing notations for specifying a FQDN, the familiar name.subdomain.domain.tld (such as news.bbc.co.uk) and the reversed tld.domain.subdomain.name (uk.co.bbc.news). And if memory serves, only the UK used the latter style of FQDN for a period... (0 Replies)
Discussion started by: derekludwig
0 Replies

4. Shell Programming and Scripting

Extract urls from index.html downloaded using wget

Hi, I need to basically get a list of all the tarballs located at uri I am currently doing a wget on urito get the index.html page Now this index page contains the list of uris that I want to use in my bash script. can someone please guide me ,. I am new to Linux and shell scripting. ... (5 Replies)
Discussion started by: mnanavati
5 Replies

5. Shell Programming and Scripting

Extract URLs from HTML code using sed

Hello, i try to extract urls from google-search-results, but i have problem with sed filtering of html-code. what i wont is just list of urls thay apears between ........<p><a href=" and next following " in html code. here is my code, i use wget and pipelines to filtering. wget works, but... (13 Replies)
Discussion started by: L0rd
13 Replies

6. Windows & DOS: Issues & Discussions

How to: Linux BOX in Windows Domain (w/out joining the domain)

Dear Expert, i have linux box that is running in the windows domain, BUT did not being a member of the domain. as I am not the System Administrator so I have no control on the server in the network, such as modify dns entry , add the linux box in AD and domain record and so on that relevant. ... (2 Replies)
Discussion started by: regmaster
2 Replies

7. Shell Programming and Scripting

finding and removing patterns in a large list of urls

I have a list of urls for example: Google Google Base Yahoo! Yahoo! Yahoo! Video - It's On Google The problem is that Google and Google are duplicates as are Yahoo! and Yahoo!. I'm needing to find these conical www duplicates and append the text "DUP#" in from of both Google and... (3 Replies)
Discussion started by: totus
3 Replies

8. Shell Programming and Scripting

Rsync to an external list of URLs

I'm going to have a text file formatted something like this: some_name http://www.someurl.com/ another_name http://www.anotherurl.com/ third_name http://www.thirdurl.com/ I need to write a script that can rsync from a file path I'll set, to each URL in the list. Any ideas? (8 Replies)
Discussion started by: ibsen
8 Replies

9. Solaris

List of Hostname under NIS Domain

How do I find a list of hosts under a domainname on a NIS+ I did check nisls command , I could not find any ??? (5 Replies)
Discussion started by: sriram003
5 Replies

10. Email Antispam Techniques and Email Filtering

Sendmail Access DB TLD Blocking ....

Now this is a bit tricky, but works great if you can decide which Top Level Domains or TLDs you want to receive mail We are getting so much spam from countries we never receive useful mail, I've been experimenting with blocking entire TLDs using sendmail access_db as an antispam technique. ... (0 Replies)
Discussion started by: Neo
0 Replies
Login or Register to Ask a Question