Checking a pattern in file and the count of characters


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Checking a pattern in file and the count of characters
# 1  
Old 09-21-2012
Checking a pattern in file and the count of characters

I am having a zipped file which has the following URL contents -

98.70.217.222 - - [08/Jul/2012:09:14:29 +0000] "GET /liveupdate-aka.symantec.com/1340071490jtun_nav2k8enn09m25.m25?h=abcdefgh HTTP/1.1" 200 159229484 "-" "hBU1OhDsPXknMepDBJNScBj4BQcmUz5TwAAAAA" "-"

In this line here is we only need to consider the components marked in BOLD above so basically:
/liveupdate-aka.symantec.com/1340071490jtun_nav2k8enn09m25.m25?h=abcdefgh : is called the URL

200: is called the response code.
h=abcdefgh : is called the query string.


I am trying to write a script which does the following:
1.) Count of each URL which have a count of 10000 or greater than 10000 that have resulted in a non successful response code( basically a non – 200, 206 or 304 response code) and do not contain the following patterns in the URL : '/F200%5E*', '/F0%5E*' and '/F100%5E*'

2.) Count of each URL excluding the query string with 800 characters in length and do not contain the following patterns in the URL : '/F200%5E*', '/F0%5E*' and '/F100%5E*'

I tried to do this with the following command:
Code:
gunzip -c * |cut -d ' ' -f7|sort -n|uniq -c|grep '^.*\/[^?]*'|grep '.\{800,\}'

it needs some changes to get the desired output.

Your help is appreciated.
Thx
Moderator's Comments:
Mod Comment Please view this code tag video for how to use code tags when posting code and data.

Last edited by vbe; 09-21-2012 at 09:32 AM..
# 2  
Old 09-21-2012
This is for the first one:

Code:
awk '$9~/(200|206|304)/&&$7!~/(F200%5E|F0%5E|F100%5E)/{a[$7]++}END{for(i in a)if(a[i]>=10000){print a[i],i}}' file

You can do something similar for the second one
# 3  
Old 09-21-2012
Last I read http, it was the URI, the part of the URL within the host.

Is this one task, not two?

sed can do the work of both grep and cut so that only the desired URLs are clean on the sed output to sort for a most popular over 9999 list:
Code:
sed '
    s/.*+0000\] "GET \(\/[^ ]*\) HTTP\/[0-9.]*" \([1-9][0-9]*\) .*/\1\2/
    t n
    d
    :n
    / 20[06]$/d
    / 304$/d
    /^\/F[21]00%5E/d
    /^\/F0%5E/d
    /[^ ]\{99\}[^ ]\{99\}[^ ]\{99\}[^ ]\{99\}[^ ]\{99\}[^ ]\{99\}[^ ]\{99\}[^ ]\{99\}[^ ]\{8\}/d
    s/ .*//
  ' | sort | uniq -c | grep '^ *[1-9][0-9][0-9][0-9][0-9]' | sort -nr

Sometimes, for speed, I break up the sed and use a long pipe of mixed sed and grep to speed tings up and multiprocess, as the last 5 lies of this sed are essentially "grep -v". Putting the best eliminator first speeds things up. For many gzipped files, in bash and /dev/fd/# UNIX's, you can go parallel dividing the files into (#cores x 2) lists (assuming 50% i/o bound processing) and replacing the first 'sort' with:
Code:
sort -m <(
    gzcat $list1 | ... |sort
 ) <(
    gzcat $list2 | ... |sort
 ) <(
    gzcat $list3 | ... |sort
 ) <(
    gzcat $list4 | ... |sort
 )

# 4  
Old 09-21-2012
Quote:
Originally Posted by Subbeh
This is for the first one:

Code:
awk '$9~/(200|206|304)/&&$7!~/(F200%5E|F0%5E|F100%5E)/{a[$7]++}END{for(i in a)if(a[i]>=10000){print a[i],i}}' file

You can do something similar for the second one
Thx for your reply Subbeh but when I tried this command for a[i]>=10 its not giving me any result.
Code:
awk '$9~/(200|206|304)/&&$7!~/(F200%5E|F0%5E|F100%5E)/{a[$7]++}END{for(i in a)if(a[i]>=10){print a[i],i}}' url.log

Code:
$ cat url.log
98.70.217.222 - - [08/Jul/2012:09:14:29 +0000] "GET /liveupdate-aka.symantec.com/1340071490jtun_nav2k8enn09m25.m25?h=abcdefgh HTTP/1.1" 200 159229484 "-" "hBU1OhDsPXknMepDBJNScBj4BQcmUz5TwAAAAA" "-"

# 5  
Old 09-21-2012
Try changing the number (10 in your case) to 1 and see what happens. The first column of the output should show the total per url

If you only need the first part of the url without "?h=abcdefgh" use this:
Code:
awk '$9~/(200|206|304)/&&$7!~/(F200%5E|F0%5E|F100%5E)/{gsub(/\?.*/,"",$7)a[$7]++}END{for(i in a)if(a[i]>=1){print a[i],i}}' file


Last edited by Subbeh; 09-21-2012 at 10:46 AM..
This User Gave Thanks to Subbeh For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Pattern count in file

hi , I have a below file which contain as Use descriptive thread titles when posting Urgent. For example, do not post questions with subjects like "Help Me!", "Urgent Urgent Urgent" . or "Doubt". Post subjects like "Execution Problems with Cron" or "Help with Backup Shell Script".... (7 Replies)
Discussion started by: Jewel
7 Replies

2. Shell Programming and Scripting

Identify file pattern, take count of pattern, then act

Guys - Need your ideas on a section of code to finish something up. To make a long story short, I'm parsing a print output file that goes to pre-printed forms. I'm intercepting it, parsing it, formatting it, cutting it up into individual pages, grabbing the text I want in zones, building an... (3 Replies)
Discussion started by: ampsys
3 Replies

3. Shell Programming and Scripting

Help with listing file name containing particular text and count of lines with 10 characters.

Hi, I've 2 queries. I need to list files which doesn't contain a particular text in the content. For example say, I need to list files which doesn't contain string "abc" from all files ending with *.bad. How can I do that? Also, I want to display number of lines in a file which has atleast... (2 Replies)
Discussion started by: Gangadhar Reddy
2 Replies

4. Shell Programming and Scripting

awk script to count characters in file 1 in file 2

I need a scripting AWK to compare 2 files. file 1 and 2 are list of keywords 1 is a b c d 2 is aa aaa b bb ccc d I want the AWK script to give us the number of times every keyword in file 1 occurs in file 2. output should be a 2 (7 Replies)
Discussion started by: anhtt
7 Replies

5. Shell Programming and Scripting

Search for a pattern in a String file and count the occurance of each pattern

I am trying to search a file for a patterns ERR- in a file and return a count for each of the error reported Input file is a free flowing file without any format example of output ERR-00001=5 .... ERR-01010=10 ..... ERR-99999=10 (4 Replies)
Discussion started by: swayam123
4 Replies

6. Shell Programming and Scripting

Count characters in a csv file and add an word.

Hello, I want to add a sentence to "post column" those who are only less than 30 characters.Thank you very much for your help. "category","title","post" "Z","Zoo","test 54325 test 45363mc." "Z","Zen","rs2w3rsj 2d342dg 2d3s4f23 d23423s23h 2s34s2423g ds232d34 2342." "Z","Zet","test4444... (3 Replies)
Discussion started by: hoo
3 Replies

7. Shell Programming and Scripting

Include special system characters in file count

Hi, I have a script that checks the length of each record/line in file - This seems to be working when there are no special systems character that are invisible or hidden. awk -v file=$file '{ if (filename==file) { k+=$5 if (length() <= 10 ){print size length(), "bytes " k} }... (2 Replies)
Discussion started by: asemota
2 Replies

8. Shell Programming and Scripting

Checking existence of file using file pattern

Hi Experts:), I need to check the existense of file using patterns.How can i do it? Ex: if my current directory has a number of files of pattern (ins_*), i need to check the existense of atleast one file. pls reply me. (3 Replies)
Discussion started by: spkandy
3 Replies

9. Shell Programming and Scripting

how to count characters by line of file ?

Hello, Member or professional need help how to count characters by line of file Example of the file is here cdr20080817164322811681txt cdr20080817164322811txt cdr20080817164322811683txt cdr20080817164322811684txt I want to count the characters by line of file . The output that I... (4 Replies)
Discussion started by: ooilinlove
4 Replies

10. UNIX for Dummies Questions & Answers

Checking for a file in file pattern before deleting it

Hi, I need a script where I have delete all the files of type abc*.* from the directory /lmn/opq (passed as parameter to script) But I need to check if there is file of type abc*.* existing in the directory or not before I use the rm abc*.* command. Thanks (1 Reply)
Discussion started by: dsrookie
1 Replies
Login or Register to Ask a Question