Hello everybody
I have been trying to extract the domain name from the bind query log with different options, however always get stuck with domains that end with link .co.uk or .co.nz.
I tried the following, however only provides the first level:
Is it possible to get the domain names through a command or must the list be compared to another file that contains a list of all domains on the internet?
Moderator's Comments:
Please do not use FONT and SIZE tags when posting to The UNIX & Linux Forums.
Please use CODE tags; not ICODE tags for multi-line sample input, output, and code.
Last edited by Don Cragun; 11-01-2015 at 06:13 PM..
Reason: Change ICODE tags to CODE tags, get rid of FONT and SIZE tags.
Is it possible to get the domain names through a command or must the list be compared to another file that contains a list of all domains on the internet?
You have to compare to another list that defines the sub-domains.
Follow this link, if still active, for more information.
A compilation list can be found here.
Thank you for the response Aia, however that post is quite old and does not seem to be active anymore. or have a solution as such.Also thank you for the publicsuffix list this is very helpful and has provided me with an new possible approach to the challenge.
Unfortunately I am very new to the shell scripting world and would appreciate assistance in this regard. Here is the idea:
The URL is longer than the publicsuffix listed items and the url is separated by "." so if there a possibility to grep or search the url starting from the right hand side and finding the most accurate match. Let me provide an example:
Code:
walter-producer-cdn.api.bbci.co.uk
starting from the right hand site matching agains the publicsuffix list:
publicsuffix list for uk:
Thank you RudiC, could I kindly ask you to elaborate on the code, as mentioned before, I am very new to this. I have two files the one that contains the URL and the other one the publicsuffic list. Thank you
awk '
NR==FNR {C[$0] # read first file (= NR==FNR) into the indices of the associative array C
next # stop processing the actual line; proceed with next line
}
$(NF-1) OFS $(NF) in C {print $(NF-2) OFS $(NF-1) OFS $NF
# if second last ($(NF-1) and last ($NF) fields, joint by a dot, are found in C
# print third last, second last, and last field
next # stop ... see above
}
{print $(NF-1) OFS $NF # if above doesn't apply, print second last and last fields
}
' FS="." OFS="." publicsuffix.lst raw # supply the field separators and two files to awk
This code certainly is not perfect; e.g. the co.nz is missing in the publicsuffix.lst, but it may serve as a starting point...
RudiC, thank you very much for providing this solution, it is truly appreciated. I checked through the publicsuffix list and found that the longest domain is 4 as such added this to the script you provided. Now it works and provides all the different domains. Here is the code I am now using:
Code:
awk '
NR==FNR {C[$0]
next
}
$(NF-1) OFS $(NF) in C {print $(NF-2) OFS $(NF-1) OFS $NF
next
}
$(NF-1) OFS $(NF) in C {print $(NF-3) OFS $(NF-1) OFS $NF
next
}
$(NF-1) OFS $(NF) in C {print $(NF-4) OFS $(NF-1) OFS $NF
next
}
{print $(NF-1) OFS $NF
}
' FS="." OFS="." public_suffix_list.dat url.txt
I have a file like this:
http://article.wn.com/view/2010/11/26/IV_drug_policy_feels_HIV_patients_Red_Cross/ http://aidsjournal.com/,www.cfpa.org.cn/page1/page2 , www.youtube.com
http://seattletimes.nwsource.com/html/jerrybrewer/2013517803_brewer25.html... (1 Reply)
I have a file like this:
http://hello.com www.examplecom computer Company
I wanted to keep dot (.) infront of com. to make the file like this
http://hello.com www.example.com computer Company
I applied this expression
sed -r 's/com/.com/g'but what I get is:
http://hello.com ... (4 Replies)
Hello,
Am very new to perl , please help me here !!
I need help in reading a URL from command line using PERL:: Mechanize and needs all the contents from the URL to get into a file.
below is the script which i have written so far ,
#!/usr/bin/perl
use LWP::UserAgent;
use... (2 Replies)
Hi,
I have a problem where i have to hit multiple URL that are stored in a text file (input.txt) and save their output in different text file (output.txt) somewhat like :
cat input.txt
http://192.168.21.20:8080/PPUPS/international?NUmber=917875446856... (3 Replies)
Here is what I have so far:
find . -name "*php*" -or -name "*htm*" | xargs grep -i iframe | awk -F'"' '/<iframe*/{gsub(/.\*iframe>/,"\"");print $2}'
Here is an example content of a PHP or HTM(HTML) file:
<iframe src="http://ADDRESS_1/?click=5BBB08\" width=1 height=1... (18 Replies)
I am trying to find a way to test some code, but I need to rewrite a specific URL only from a specific HTTP_HOST
The call goes out to
http://SUB.DOMAIN.COM/showAssignment/7bde10b45efdd7a97629ef2fe01f7303/jsmodule/Nevow.Athena
The ID in the middle is always random due to the cookie.
I... (5 Replies)
Dear Expert,
i have linux box that is running in the windows domain, BUT did not being a member of the domain. as I am not the System Administrator so I have no control on the server in the network, such as modify dns entry , add the linux box in AD and domain record and so on that relevant.
... (2 Replies)
Hello,
I need to redirect an existing URL, how can i do that?
There's a current web address to a GUI that I have to redirect to another webaddress. Does anyone know how to do this?
This is on Unix boxes Linux.
example:
https://m45.testing.address.net/host.php
make it so the... (3 Replies)