Help for capturing a URL from a line from large log file

 
Thread Tools Search this Thread
Operating Systems Linux Red Hat Help for capturing a URL from a line from large log file
# 1  
Old 02-09-2011
Help for capturing a URL from a line from large log file

Can someone please help me how do I find a URL from lines of log file and write all the output to a new file?

For e.g - Log file has similar entries,

PHP Code:
39.155.67.5 - - [21/march/2010:00:00:53 +0100"GET /abc/login?service=http://161.120.36.39/CORPHR/TMA2007/default.asp HTTP/1.1" 401 3218
54.155.63.9 
- - [21/march/2011:00:00:54 +0100"GET /abc/login?service=http://161.120.36.39/CORPHR/TMA2007/default.asp HTTP/1.1" 401 3218
106155.30.5 
- - [21/march/2011:00:00:54 +0100"GET /abc/themes/testtdomain/cas.css HTTP/1.1" 200 1219
62.155.67.5 
- - [21/march/2010:00:00:54 +0100"GET /abc/themes/testtdomain/fondCas.jpg HTTP/1.1" 200 29659
79.180.10.116 
- - [21/march/2011:00:00:58 +0100"GET /abc/status.jsp HTTP/1.0" 200 104
90.155.78.5 
- - [21/Jan/2011:00:01:27 +0100"GET /abc/login?service=http://hpm.testt-domain.com/portal/ABC/ABCLogin.aspx HTTP/1.1" 401 3169
28.155.93.5 
- - [21/Jan/2011:00:03:07 +0100"POST /abc/login?service=http%3A%2F%2Fuat-live.testt-domain.com%2Fj_string_cas_security_check HTTP/1.1" 302 -
46.155.84.5 - - [21/Jan/2011:00:03:07 +0100"GET /abc/login?service=http%3A%2F%2Fuat-test.testt-domain.com%2Fj_string_cas_security_check HTTP/1.1" 401 3281 
What I need to capture is folllowing out of these lines AFTER service=

Output -
http://hpm.testt-domain.com/portal/
http://161.120.36.39/CORPHR/
http://161.120.36.39/CORPHR/

I have with help from other users on this forums managed to get it till http://hpm.testt-domain.com though am not able to get till http://hpm.testt-domain.com/portal

Following was provided to me as help which was really helpful.

sed -n 's!.*service=\(http://[^:]*\):.*!\1!p' logfile > newfile

Any help would be highly appreciated.

Thanks,
Andy
# 2  
Old 02-09-2011
Hi,

Test if this works:

Code:
$ perl -ne 'print "$1\n" if m|(http://(?:([^/]+/){2}))|' infile
http://161.120.36.39/CORPHR/
http://161.120.36.39/CORPHR/
http://hpm.testt-domain.com/portal/

Regards,
Birei
# 3  
Old 02-10-2011
Thanks Birei and I will report you the output in few hours.. Too bad that we have some urgent meetings coming up and I don't have time to test this but can't wait for sure!!!

Thanks for the quick reply and I shall get back on this shortly!!


Cheers,
Andy

---------- Post updated 02-10-11 at 11:36 AM ---------- Previous update was 02-09-11 at 06:24 PM ----------

Ok, following is what worked for me after MUCH required help from Franklin!!
I used the same script provided by Franklin to get my URLs filtered -
PHP Code:
sed -'s!.*service=\(http://[^/]*/[^/]*/\).*!\1!p' file 
However, I still haven't tested the script part provided by Franklin yet and I will post the output later.
Another problem I faced while trying to filter output received after using Franklin's script was I had few URLs with BIG strings with special characters and had to use following to get rid of them. (To get rid of &, ? and ' ' basically AND have them sorted)

PHP Code:
cat old.file awk -\& '{print $1}' awk -\? '{print $1}' |awk -' ' '{print $1}'sort -output.txt 
One more problem I have is while trying to remove duplicate lines, I need to treat lower and upper cases in URLs carefully as below -
For following duplicate lines, I need to have only two URLs since currently they are all being treated as UNIQUE URLs.
(note: Separate IPs don't matter since I am only concerned with lower and upper case letters)

PHP Code:
56.555.72.69/crm_ababcdves/
81.745.42.59/CRM_Ababcdves/
38.475.62.19/squitv3/
92.625.42.89/Squitv3/
37.288.30.12/cview/
63.598.30.89/Cview/
85.048.30.52/CView
So final output should be -
PHP Code:
56.555.72.69/crm_ababcdves/
38.475.62.19/squitv3/
37.288.30.12/cview
Now if someone can help me with this, that would be really great since I am a newbie on these things so far though getting better since past few days.

Cheers,
Andy
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Reading URL using Mechanize and dump all the contents of the URL to a file

Hello, Am very new to perl , please help me here !! I need help in reading a URL from command line using PERL:: Mechanize and needs all the contents from the URL to get into a file. below is the script which i have written so far , #!/usr/bin/perl use LWP::UserAgent; use... (2 Replies)
Discussion started by: scott_cog
2 Replies

2. Shell Programming and Scripting

capturing the value in file before string(*) and the similar value in next line only

I've the output in the file like below. I want to capture the value in file before string(*) and the similar value in next line only. cat test1.txt 0003 Not Visible (M) 0 00 03F 0005 Not Visible (M) 0 00 040 - AVAILABLE 0 00... (1 Reply)
Discussion started by: sai_1712
1 Replies

3. Shell Programming and Scripting

Parse large file on line count (random lines)

I have a file that needs to be parsed into multiple files every time there line contains a number 1. the problem i face is the lines are random and the file size is random. an example is that on line 4, 65, 187, 202 & 209 are number 1's so there has to be file breaks between all those to create 4... (6 Replies)
Discussion started by: darbs121
6 Replies

4. Shell Programming and Scripting

Capturing errors messages into log file

Can we capture and write all the error messages which were being displayed on the command prompt screen during execution of a program into a log file? If yes, can anyone please let me know on how to do it? I am using ksh and working on AIX server. Thank you in advance. (4 Replies)
Discussion started by: vpv0002
4 Replies

5. Shell Programming and Scripting

Find line number of bad data in large file

Hi Forum. I was trying to search the following scenario on the forum but was not able to. Let's say that I have a very large file that has some bad data in it (for ex: 0.0015 in the 12th column) and I would like to find the line number and remove that particular line. What's the easiest... (3 Replies)
Discussion started by: pchang
3 Replies

6. Shell Programming and Scripting

Updating a line in a large csv file, with sed/awk?

I have an extremely large csv file that I need to search the second field, and upon matches update the last field... I can pull the line with awk.. but apparently you cant use awk to directly update the file? So im curious if I can use sed to do this... The good news is the field I want to... (5 Replies)
Discussion started by: trey85stang
5 Replies

7. Shell Programming and Scripting

remove a specific line in a LARGE file

Hi guys, i have a really big file, and i want to remove a specific line. sed -i '5d' fileThis doesn't really work, it takes a lot of time... The whole script is supposed to remove every word containing less than 5 characters and currently looks like this: #!/bin/bash line="1"... (2 Replies)
Discussion started by: blubbiblubbkekz
2 Replies

8. Shell Programming and Scripting

Searching a specific line in a large file

Hey All Can any one please suggest the procedure to search a part of line in a very large file in which log entries are entered with very high speed. i have trued with grep and egrep grep 'text text text' <file-name> egrep 'text text text' <file-name> here 'text text text' is... (4 Replies)
Discussion started by: NIMISH AGARWAL
4 Replies

9. Shell Programming and Scripting

capturing line from script output and appending to a file

Hi all, I did some searching in this forum but can't find anything that matches the issue I'm bumping heads with. On a CentOS4/Postfix (and bash everywhere) mail gateway box I run a command periodically to purge the Postfix queue of messages "From:MAILER-DAEMON". This is the one line'r... (6 Replies)
Discussion started by: wally_welder
6 Replies

10. UNIX for Dummies Questions & Answers

Large file need to append to each line

I have a few large files that need to have a ,A appended to the end of each record. I though about using sed but never used it before and the man is not intuitive nor have I found examples. This is what I tried for file name bob sed '\a,A' bob from what I get if you do not supply a range... (1 Reply)
Discussion started by: r1500
1 Replies
Login or Register to Ask a Question