How to use GREP to extract URL from file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to use GREP to extract URL from file
# 1  
Old 09-08-2012
How to use GREP to extract URL from file

Hi All ,






Here is what I want to do:

Given a line:

98.70.217.222 - - [08/Jul/2012:09:14:29 +0000] "GET /liveupdate-aka.symantec.com/1340071490jtun_nav2k8enn09m25.m25?h=abcdefgh HTTP/1.1" 200 159229484 "-" "hBU1OhDsPXknMepDBJNScBj4BQcmUz5TwAAAAA" "-"

1. Get the URL component: "/liveupdate-aka.symantec.com/1340071490jtun_nav2k8enn09m25.m25?h=abcdefgh"

2. Check if the URL component excluding query string which is "/liveupdate-aka.symantec.com/1340071490jtun_nav2k8enn09m25.m25" is greater than 800 characters

I am using the command below to do this but it is checking if the entire URL component is greater than 800 characters and not excluding the Query string:

gunzip -c * |cut -d ' ' -f7|sort -n|uniq -c|grep '^.*\/[^?]*'|grep '.\{800,\}'

3. Also check is each component in the URL is greater than 200 characters

For eg. Is "/liveupdate-aka.symantec.com/1340071490jtun_nav2k8enn09m25.m25" greater than 800 characters and

is "liveupdate-aka.symantec.com" geater than 200 characters

is "1340071490jtun_nav2k8enn09m25.m25" greater than 200 characters.


It would be good if we can do all of the above in one command if possible.
# 2  
Old 09-08-2012
Quote:
Originally Posted by Naks_Sh10
...3. Also check is each component in the URL is greater than 200 characters

For eg. Is "/liveupdate-aka.symantec.com/1340071490jtun_nav2k8enn09m25.m25" greater than 800 characters and

is "liveupdate-aka.symantec.com" geater than 200 characters

is "1340071490jtun_nav2k8enn09m25.m25" greater than 200 characters.


It would be good if we can do all of the above in one command if possible.
Code:

$
$ cat f10
98.70.217.222 - - [08/Jul/2012:09:14:29 +0000] "GET /liveupdate-aka.symantec.com/1340071490jtun_nav2k8enn09m25.m25?h=abcdefgh HTTP/1.1" 200 159229484 "-" "hBU1OhDsPXknMepDBJNScBj4BQcmUz5TwAAAAA" "-"
$
$
$ perl -lne 's/^.*GET (.*?)\?.*/$1/; print $_, " is ",length($_)<=800?"not":"", " greater than 800 char";
             s/^\///; map {print $_, " is ", length($_)<=200?"not":"", " greater than 200 char"} split/\//
            ' f10
/liveupdate-aka.symantec.com/1340071490jtun_nav2k8enn09m25.m25 is not greater than 800 char
liveupdate-aka.symantec.com is not greater than 200 char
1340071490jtun_nav2k8enn09m25.m25 is not greater than 200 char
$
$

tyler_durden
# 3  
Old 09-10-2012
Hi Tyler,
Thanks for the reply but I am still facing one problem as I am not even a starter in perl. Smilie

$ cat sample1.log
98.70.217.222 - - [08/Jul/2012:09:14:29 +0000] "GET /liveupdate-aka.symantec.com/1340071490jtun_nav2k8enn09m25.m25liveupdate-aka.symantec.com1340071490jtun_nav2k8enn09m25.m25liveupdate-aka.symantec.com1340071490jtun_nav2k8enn09m25?h=jshdjahsdieal HTTP/1.1" 200 159229484 "-" "hBU1OhDsPXknMepDBJNScBj4BQcmUz5TwAAAAA" "-"

$ perl -lne 's/^.*GET (.*?)\?.*/$1/; print $_, " is ",length($_)>=200?"":"", " greater than 200 char"; s/^\///; map {print $_, " is ", length($_)>=200?"":"", " greater than 200 char"} split/\//' sample1.log

/liveupdate-aka.symantec.com/1340071490jtun_nav2k8enn09m25.m25liveupdate-aka.symantec.com1340071490jtun_nav2k8enn09m25.m25liveupdate-aka.symantec.com1340071490jtun_nav2k8enn09m25 is greater than 200 char

liveupdate-aka.symantec.com is greater than 200 char


1340071490jtun_nav2k8enn09m25.m25liveupdate-aka.symantec.com1340071490jtun_nav2k8enn09m25.m25liveupdate-aka.symantec.com1340071490jtun_nav2k8enn09m25 is greater than 200 char

Now if you see the BLUE line, it's length is not greater than 200 char but still its showing this in answer set. Smilie
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Script extract text from txt file with grep

All, I require a script that grabs some text from the gitHub API and will grep (or other function) for a string a characters that starts with (") quotes followed by two letters, may contain a pipe |, and ending with ) . What i have so far is below but it's not returning anything. ... (4 Replies)
Discussion started by: ChocoTaco
4 Replies

2. Shell Programming and Scripting

Reading URL using Mechanize and dump all the contents of the URL to a file

Hello, Am very new to perl , please help me here !! I need help in reading a URL from command line using PERL:: Mechanize and needs all the contents from the URL to get into a file. below is the script which i have written so far , #!/usr/bin/perl use LWP::UserAgent; use... (2 Replies)
Discussion started by: scott_cog
2 Replies

3. Shell Programming and Scripting

Extract values from multi lined url source

Hello, I want extract multi values from multi url source to a csv text. Thank you very much for help. my curl code : curl "http://www.web.com/cities//city.html Source code: div class="clear"></div> <table class="listing-details"> <tr> ... (1 Reply)
Discussion started by: hoo
1 Replies

4. Shell Programming and Scripting

Need help in writing a script that do internal grep on a log file..and extract acct no's from it..

I need to write a script, which takes the input a log file and create output file with acct no's line by line from selected records with text like (in red) : 18:51:18 | 217863|Acct 0110855565|RC 17608| 16 Subs| 1596 UsgRecs| 2 Secs| 430 CPUms| prmis2:26213 <MoveUsage d aemon needs to run... (7 Replies)
Discussion started by: rkrish
7 Replies

5. Shell Programming and Scripting

Need help please with Grep/Sed command to extract text and numbers from a file

Hello All, I need to extract lines from a file that contains ALPHANUMERIC and the length of Alphanumeric is set to 16. I have pasted the sample of the lines from the text file that I have created. My problem is that sometimes 16 appears in other part of the line. I'm only interested to... (14 Replies)
Discussion started by: mnassiri
14 Replies

6. Shell Programming and Scripting

How to extract url from html page?

for example, I have an html file, contain <a href="http://awebsite" id="awebsite" class="first">website</a>and sometime a line contains more then one link, for example <a href="http://awebsite" id="awebsite" class="first">website</a><a href="http://bwebsite" id="bwebsite"... (36 Replies)
Discussion started by: 14th
36 Replies

7. Shell Programming and Scripting

Extract URL from RSS Feed in AWK

Hi, I have following data file; <outline title="Matt Cutts" type="rss" version="RSS" xmlUrl="http://www.mattcutts.com/blog/feed/" htmlUrl="http://www.mattcutts.com/blog"/> <outline title="Stone" text="Stone" type="rss" version="RSS" xmlUrl="http://feeds.feedburner.com/STC-Art"... (8 Replies)
Discussion started by: fahdmirza
8 Replies

8. UNIX for Advanced & Expert Users

how to grep/read a file inside compressed tgz without extract?

Hi all, I would like to ask whether in Unix shell/perl have any functions or command to allow grep/cat/read a file inside compressed .tgz without extract it? I know we can tar tvf a compressed tgz but this only allow we read the path/filename contained inside the tarball. If we want to read... (3 Replies)
Discussion started by: mayshy
3 Replies

9. Shell Programming and Scripting

How to open an url and grep for a word

Hi All, I am new to shell scripting,Could any of you help me on this below :confused: -------------------- I need to write a shell script where i have open an url and grep for a particular word in the url and want to display it in a flatfile. Eg: Want to open yahoo.com and grep for yahoo... (1 Reply)
Discussion started by: sumithra
1 Replies

10. Shell Programming and Scripting

SED extract url - please help a lamer

Hello everybody. I have lines that looks something like this: <done16=""118"" done18=""$ title=""thisisatitle"" href=""/JoeBanana" alt=""Joe""><done16=""118"" done18=""$ title=""thisisatitle"" href=""/GeraldGiraffe" alt=""Gerald""> What kind of SED command would I need to use to extract... (4 Replies)
Discussion started by: digi
4 Replies
Login or Register to Ask a Question