Sponsored Content
Top Forums Shell Programming and Scripting Get only domain from url file bind Post 302959817 by omuhans123 on Saturday 7th of November 2015 06:36:55 AM
Old 11-07-2015
Thank you RudiC, for the script and assistance, it is truly appreciated. The script works very well now and extracts the Domain from the URL.

Also thank you Don Cragun, for the assistance.

Here is the final script I am currently using that was written by RudiC:
Code:
awk '
/^\/\/|^ *$/    {next}

FNR!=NR         {for (f in FIVE)  if ($0 ~ "[.]" f "$")  {print $(NF-5), $(NF-4)                                                                                        , $(NF-3), $(NF-2), $(NF-1), $NF; next}
                 for (f in FOUR)  if ($0 ~ "[.]" f "$")  {print $(NF-4), $(NF-3)                                                                                        , $(NF-2), $(NF-1), $NF ; next}
                 for (t in THREE) if ($0 ~ "[.]" t "$")  {print $(NF-3), $(NF-2)                                                                                        , $(NF-1), $NF; next}
                 for (t in TWO)   if ($0 ~ "[.]" t "$")  {print $(NF-2), $(NF-1)                                                                                        , $NF; next}
                 for (o in ONE)   if ($0 ~ "[.]" o "$")  {print $(NF-1), $NF; ne                                                                                        xt}
                 next
                }

/^\*/           {next}

NF==5           {FIVE[$0]}
NF==4           {FOUR[$0]}
NF==3           {THREE[$0]}
NF==2           {TWO[$0]}
NF==1           {ONE[$0]}
' FS="." OFS="." public_suffix_list.dat rawfile

---------- Post updated 11-07-15 at 01:36 PM ---------- Previous update was 11-06-15 at 02:53 PM ----------

Hi RudiC and Don Cragun, could I kindly ask you one final favor to optimize the script that I have currently. The objective is to take the raw log from BIND and enrich this with extraction of the URL and adding content categorization to this. Then writing these to different files to summarize this. The challenge is that with the script below it processes 3.83 lines a second and I have 9 million lines a day Smilie

The input log from the DNS1 file look like the following:
Code:
04-Nov-2015 08:28:39.261 queries: info: client 192.168.169.122#59319: query: istatic.eshopcomp.com IN A + (10.10.80.50)
04-Nov-2015 08:28:39.269 queries: info: client 192.168.212.136#48872: query: idsync.rlcdn.com IN A + (10.10.80.50)
04-Nov-2015 08:28:39.269 queries: info: client 192.168.19.61#53970: query: 3-courier.sandbox.push.apple.com IN A + (10.10.80.50)
04-Nov-2015 08:28:39.270 queries: info: client 192.168.169.122#59319: query: ajax.googleapis.com IN A + (10.10.80.50)
04-Nov-2015 08:28:39.272 queries: info: client 192.168.251.24#37028: query: um.simpli.fi IN A + (10.10.80.50)
04-Nov-2015 08:28:39.272 queries: info: client 192.168.251.24#37028: query: www.wtp101.com IN A + (10.10.80.50)
04-Nov-2015 08:28:39.273 queries: info: client 192.168.251.24#37028: query: magnetic.t.domdex.com IN A + (10.10.80.50)
04-Nov-2015 08:28:39.273 queries: info: client 172.25.111.175#59612: query: api.smoot.apple.com IN A + (10.10.80.50)
04-Nov-2015 08:28:39.275 queries: info: client 192.168.7.181#45913: query: www.miniclip.com IN A + (10.10.80.50)

Code:
while read -r line
do
dt=$(awk -F " " '/ / {print $1}' <<< $line) #Reading the date from the log file into variable
tm=$(awk -F " " '/ / {print $2}' <<< $line) #Reading the time from the log file into variable
ipt=$(awk -F " " '/ / {print $6}'<<< $line) #Reading the IP address from the log file into variable
ip=$(cut -d'#' -f1 <<< $ipt) #removing the port from the IP address and write into variable
url=$(awk -F " " '/ / {print $8}' <<< $line) #Reading the URL from the log file into variable
type=$(awk -F " " '/ / {print $10}' <<< $line) #Reading the Record Type from the log file into variable

echo $url > temp-url #Writing the URL into temp file as I could not get the variable automatically reading this into the awk statement below

dom=$(awk '
/^\/\/|^ *$/    {next}

FNR!=NR         {for (f in FIVE)  if ($0 ~ "[.]" f "$")  {print $(NF-5), $(NF-4), $(NF-3), $(NF-2), $(NF-1), $NF; next}
                 for (f in FOUR)  if ($0 ~ "[.]" f "$")  {print $(NF-4), $(NF-3), $(NF-2), $(NF-1), $NF ; next}
                 for (t in THREE) if ($0 ~ "[.]" t "$")  {print $(NF-3), $(NF-2), $(NF-1), $NF; next}
                 for (t in TWO)   if ($0 ~ "[.]" t "$")  {print $(NF-2), $(NF-1), $NF; next}
                 for (o in ONE)   if ($0 ~ "[.]" o "$")  {print $(NF-1), $NF; next}
                 next
                }

/^\*/           {next}

NF==5           {FIVE[$0]}
NF==4           {FOUR[$0]}
NF==3           {THREE[$0]}
NF==2           {TWO[$0]}
NF==1           {ONE[$0]}
' FS="." OFS="." public_suffix_list.dat temp-url) #extracting the Domain from the URL

ct=$(grep -i -r $dom /opt/URL/BL/ | cut -d'/' -f5 | uniq -d | head ) #Here I am using http://www.shalla.de/ categorization database to look at every domain and read the folder location to add the category it is in

echo $dt,$tm,$ip,$url,$dom,$type,$ct >> DNS1_Logs 	#Rewriting the log file that contains now also the domain and category of the lookup and removing unnecessary information
echo $dom >> DNS1_DOM								#Wringing on the Domain names into separate file
echo $dom,$ct >> DNS1_CT							#Wringing on the Domain and category names into separate file
done < DNS1

sort DNS1_DOM | uniq -cd | sort -nr > DNS1_Sort 	#Sorting the Domains to get the most utilized once

Thank you very much already in advance.

Last edited by omuhans123; 11-07-2015 at 07:48 AM..
 

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

url calling and parameter passing to url in script

Hi all, I need to write a unix script in which need to call a url. Then need to pass parameters to that url. please help. Regards, gander_ss (1 Reply)
Discussion started by: gander_ss
1 Replies

2. Shell Programming and Scripting

url calling and parameter passing to url in script

Hi all, I need to write a unix script in which need to call a url. Then need to pass parameters to that url. please help. Regards, gander_ss (1 Reply)
Discussion started by: gander_ss
1 Replies

3. UNIX for Dummies Questions & Answers

ReDirecting a URL to another URL - Linux

Hello, I need to redirect an existing URL, how can i do that? There's a current web address to a GUI that I have to redirect to another webaddress. Does anyone know how to do this? This is on Unix boxes Linux. example: https://m45.testing.address.net/host.php make it so the... (3 Replies)
Discussion started by: SkySmart
3 Replies

4. Windows & DOS: Issues & Discussions

How to: Linux BOX in Windows Domain (w/out joining the domain)

Dear Expert, i have linux box that is running in the windows domain, BUT did not being a member of the domain. as I am not the System Administrator so I have no control on the server in the network, such as modify dns entry , add the linux box in AD and domain record and so on that relevant. ... (2 Replies)
Discussion started by: regmaster
2 Replies

5. Web Development

Regex to rewrite URL to another URL based on HTTP_HOST?

I am trying to find a way to test some code, but I need to rewrite a specific URL only from a specific HTTP_HOST The call goes out to http://SUB.DOMAIN.COM/showAssignment/7bde10b45efdd7a97629ef2fe01f7303/jsmodule/Nevow.Athena The ID in the middle is always random due to the cookie. I... (5 Replies)
Discussion started by: EXT3FSCK
5 Replies

6. UNIX for Dummies Questions & Answers

Awk: print all URL addresses between iframe tags without repeating an already printed URL

Here is what I have so far: find . -name "*php*" -or -name "*htm*" | xargs grep -i iframe | awk -F'"' '/<iframe*/{gsub(/.\*iframe>/,"\"");print $2}' Here is an example content of a PHP or HTM(HTML) file: <iframe src="http://ADDRESS_1/?click=5BBB08\" width=1 height=1... (18 Replies)
Discussion started by: striker4o
18 Replies

7. Shell Programming and Scripting

Hit multiple URL from a text file and store result in other test file

Hi, I have a problem where i have to hit multiple URL that are stored in a text file (input.txt) and save their output in different text file (output.txt) somewhat like : cat input.txt http://192.168.21.20:8080/PPUPS/international?NUmber=917875446856... (3 Replies)
Discussion started by: mukulverma2408
3 Replies

8. Shell Programming and Scripting

Reading URL using Mechanize and dump all the contents of the URL to a file

Hello, Am very new to perl , please help me here !! I need help in reading a URL from command line using PERL:: Mechanize and needs all the contents from the URL to get into a file. below is the script which i have written so far , #!/usr/bin/perl use LWP::UserAgent; use... (2 Replies)
Discussion started by: scott_cog
2 Replies

9. UNIX for Dummies Questions & Answers

Putting the colon infront of the URL domain

I have a file like this: http://hello.com www.examplecom computer Company I wanted to keep dot (.) infront of com. to make the file like this http://hello.com www.example.com computer Company I applied this expression sed -r 's/com/.com/g'but what I get is: http://hello.com ... (4 Replies)
Discussion started by: csim_mohan
4 Replies

10. UNIX for Dummies Questions & Answers

Extracting URL with domain

I have a file like this: http://article.wn.com/view/2010/11/26/IV_drug_policy_feels_HIV_patients_Red_Cross/ http://aidsjournal.com/,www.cfpa.org.cn/page1/page2 , www.youtube.com http://seattletimes.nwsource.com/html/jerrybrewer/2013517803_brewer25.html... (1 Reply)
Discussion started by: csim_mohan
1 Replies
All times are GMT -4. The time now is 04:32 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy