Get only domain from url file bind


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Get only domain from url file bind
# 29  
Old 11-22-2015
Hi RudiC, Thank you for that hint, I was not aware of this command. I have now added the following script to split the files:
Code:
#!/usr/bin/bash
fspec=2015-11-09.querylogs
num_files=10

# Work out lines per file.
total_lines=$(wc -l <${fspec})
((lines_per_file = (total_lines + num_files - 1) / num_files))

# Split the actual file, maintaining lines.
split --lines=${lines_per_file} ${fspec} new.

Now there are 10 files, how to I start 10 scripts to run through these files?

Output files:
Code:
Total lines     = 27622480
Lines  per file = 2762248
   2762248 new.aa
   2762248 new.ab
   2762248 new.ac
   2762248 new.ad
   2762248 new.ae
   2762248 new.af
   2762248 new.ag
   2762248 new.ah
   2762248 new.ai
   2762248 new.aj
  27622480 total

# 30  
Old 11-22-2015
Back in post #14 I showed you a way to get rid of lots of extra processes you are invoking for each line you read. In post #16 you said my suggestion improved performance of your script by a factor of 5. But, in your latest script in post #24 in this thread, you have thrown away that improvement and added several more duplicated invocations of geoiplookup piped through various combinations of awk, cut, awk again, and tr. Why? If you would have taken that earlier advice and applied it to the geoiplookup invocations as well, you would easily get an improvement by at least a factor of 10.

Your script in post #24 has two distinct parts. The 1st 49 lines process the stuff we have been talking about in this thread. The last line processes a file that is not mentioned in the rest of the script. Why not move that command to the start of your script and run it asynchronously (and add a wait at the end of your script if your script should not be allowed to terminate before that command completes)?

How many cores can you use on your system? If you want to run thirty copies of your script simultaneously, you will probably be fighting against yourself if you don't have at least thirty cores at your disposal.
# 31  
Old 11-22-2015
Quote:
Originally Posted by omuhans123
Hi RudiC, Thank you for that hint, I was not aware of this command. I have now added the following script to split the files:
Code:
#!/usr/bin/bash
fspec=2015-11-09.querylogs
num_files=10

# Work out lines per file.
total_lines=$(wc -l <${fspec})
((lines_per_file = (total_lines + num_files - 1) / num_files))

# Split the actual file, maintaining lines.
split --lines=${lines_per_file} ${fspec} new.

Now there are 10 files, how to I start 10 scripts to run through these files?

Output files:
Code:
Total lines     = 27622480
Lines  per file = 2762248
   2762248 new.aa
   2762248 new.ab
   2762248 new.ac
   2762248 new.ad
   2762248 new.ae
   2762248 new.af
   2762248 new.ag
   2762248 new.ah
   2762248 new.ai
   2762248 new.aj
  27622480 total

In your latest script (n post #24) there is no reference to $fspec or to any files ending with .querylogs.

If you want to run that script 10 times in parallel, are the new.* files supposed to be used as replacements for the file named tmp or for the file named DNS1_DOM or for some other input file?

What is the name of your current script?

How do you invoke your current script?

In what directory are the new.* files located?

In what directory do you invoke your current script?

Given that there are two independent actions occurring in that script, why would you want to run one of those actions 10 times instead of just running it once?
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Extracting URL with domain

I have a file like this: http://article.wn.com/view/2010/11/26/IV_drug_policy_feels_HIV_patients_Red_Cross/ http://aidsjournal.com/,www.cfpa.org.cn/page1/page2 , www.youtube.com http://seattletimes.nwsource.com/html/jerrybrewer/2013517803_brewer25.html... (1 Reply)
Discussion started by: csim_mohan
1 Replies

2. UNIX for Dummies Questions & Answers

Putting the colon infront of the URL domain

I have a file like this: http://hello.com www.examplecom computer Company I wanted to keep dot (.) infront of com. to make the file like this http://hello.com www.example.com computer Company I applied this expression sed -r 's/com/.com/g'but what I get is: http://hello.com ... (4 Replies)
Discussion started by: csim_mohan
4 Replies

3. Shell Programming and Scripting

Reading URL using Mechanize and dump all the contents of the URL to a file

Hello, Am very new to perl , please help me here !! I need help in reading a URL from command line using PERL:: Mechanize and needs all the contents from the URL to get into a file. below is the script which i have written so far , #!/usr/bin/perl use LWP::UserAgent; use... (2 Replies)
Discussion started by: scott_cog
2 Replies

4. Shell Programming and Scripting

Hit multiple URL from a text file and store result in other test file

Hi, I have a problem where i have to hit multiple URL that are stored in a text file (input.txt) and save their output in different text file (output.txt) somewhat like : cat input.txt http://192.168.21.20:8080/PPUPS/international?NUmber=917875446856... (3 Replies)
Discussion started by: mukulverma2408
3 Replies

5. UNIX for Dummies Questions & Answers

Awk: print all URL addresses between iframe tags without repeating an already printed URL

Here is what I have so far: find . -name "*php*" -or -name "*htm*" | xargs grep -i iframe | awk -F'"' '/<iframe*/{gsub(/.\*iframe>/,"\"");print $2}' Here is an example content of a PHP or HTM(HTML) file: <iframe src="http://ADDRESS_1/?click=5BBB08\" width=1 height=1... (18 Replies)
Discussion started by: striker4o
18 Replies

6. Web Development

Regex to rewrite URL to another URL based on HTTP_HOST?

I am trying to find a way to test some code, but I need to rewrite a specific URL only from a specific HTTP_HOST The call goes out to http://SUB.DOMAIN.COM/showAssignment/7bde10b45efdd7a97629ef2fe01f7303/jsmodule/Nevow.Athena The ID in the middle is always random due to the cookie. I... (5 Replies)
Discussion started by: EXT3FSCK
5 Replies

7. Windows & DOS: Issues & Discussions

How to: Linux BOX in Windows Domain (w/out joining the domain)

Dear Expert, i have linux box that is running in the windows domain, BUT did not being a member of the domain. as I am not the System Administrator so I have no control on the server in the network, such as modify dns entry , add the linux box in AD and domain record and so on that relevant. ... (2 Replies)
Discussion started by: regmaster
2 Replies

8. UNIX for Dummies Questions & Answers

ReDirecting a URL to another URL - Linux

Hello, I need to redirect an existing URL, how can i do that? There's a current web address to a GUI that I have to redirect to another webaddress. Does anyone know how to do this? This is on Unix boxes Linux. example: https://m45.testing.address.net/host.php make it so the... (3 Replies)
Discussion started by: SkySmart
3 Replies

9. Shell Programming and Scripting

url calling and parameter passing to url in script

Hi all, I need to write a unix script in which need to call a url. Then need to pass parameters to that url. please help. Regards, gander_ss (1 Reply)
Discussion started by: gander_ss
1 Replies

10. UNIX for Advanced & Expert Users

url calling and parameter passing to url in script

Hi all, I need to write a unix script in which need to call a url. Then need to pass parameters to that url. please help. Regards, gander_ss (1 Reply)
Discussion started by: gander_ss
1 Replies
Login or Register to Ask a Question