Pythonic Parsing


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Pythonic Parsing
# 1  
Old 09-27-2017
Pythonic Parsing

Experts and All,

Hello !

I am trying to fabricate a simple shell script in python that has taken me almost 5 hours to complete. I am using python 3.6.

So, I am trying to read a file, parse the log file and trying to answer this basic question of how many GET's and how many POST's are there and sort them in the ascending order.

I pieced everything together here and it works fine but I know for sure that I have unnecessarily made it complicated than it is supposed to be.

1. Why should I push the data into list (wordstring) ?
2. Why is that I am not able to parse out if it is a get or post method from httpd log file ?

Please, show me the way and if you can, explain it to me in detail or just point me to the correct documentation site atleast.

Code:
manoharmahostav@ma-host:~/files$ python  log_file_analyse.py 
Stuff

GET: 1595922
PUT:      30
POST:      26

manoharmahostav@ma-host:
manoharmahostav@ma-host:~/files$ cat log_file_analyse.py 
#!/usr/bin/env python

import collections
from collections import Counter
from collections import defaultdict

#fname = 'testfile.txt'
fname = 'apache.log'

wordstring = []
c = collections.Counter()

with open(fname, 'r') as fh:
    for line in fh:
       if len(line.strip()):
           splitlines = line.split('"')[1]
           another = splitlines.split()[0]
           wordstring.append(another) 
           

c = Counter(wordstring)
print("Stuff")

for letter, count in c.most_common(30):
    print( '%s: %7d' % (letter, count))

manoharmahostav@ma-host:~/files$ 
manoharmahostav@ma-host:~/files$ head testfile.txt
64.242.88.10 - - [07/Mar/2004:16:05:49 -0800] "GET /twiki/bin/edit/Main/Double_bounce_sender?topicparent=Main.ConfigurationVariables HTTP/1.1" 401 12846
64.242.88.10 - - [07/Mar/2004:16:06:51 -0800] "GET /twiki/bin/rdiff/TWiki/NewUserTemplate?rev1=1.3&rev2=1.2 HTTP/1.1" 200 4523
64.242.88.10 - - [07/Mar/2004:16:10:02 -0800] "GET /mailman/listinfo/hsdivision HTTP/1.1" 200 6291
64.242.88.10 - - [07/Mar/2004:16:11:58 -0800] "GET /twiki/bin/view/TWiki/WikiSyntax HTTP/1.1" 200 7352
64.242.88.10 - - [07/Mar/2004:16:20:55 -0800] "GET /twiki/bin/view/Main/DCCAndPostFix HTTP/1.1" 200 5253

Moderator's Comments:
Mod Comment Please use CODE tags (not ICODE tags) for full line and multi-line displays.


---------- Post updated at 02:41 AM ---------- Previous update was at 12:41 AM ----------

After due efforts, here is what I have and this looks a bit cleaner but this is not any faster than the previous version that I posted in here.

Any help in getting a performance improvement would be much appreciated.
Sincerely,
Manohar.



Code:

manoharmahostav@ma-host:~/files$ cat abc.py 
#!/usr/bin/env python

import collections
from collections import Counter


somelist = []

with open('apache.log', 'r') as f:
     for line in f:
         splitlines = line.split('"')
         pat = splitlines[1]
         pat2 = pat.split(' ')[0]
         somelist.append(pat2)         


a = Counter(somelist)

print('Most Common:')
for d, b in a.most_common(10):
    print('%s: %10d' %(d, b))
manoharmahostav@ma-host:~/files$ 
manoharmahostav@ma-host:~/files$ python abc.py 
Most Common:
GET:    1595922
PUT:         30
POST:         26


Last edited by Don Cragun; 09-27-2017 at 03:56 AM.. Reason: Change CODE tags to ICODE tags.
# 2  
Old 09-27-2017
Try using a dictionary instead of list. Since dictionary uses unique key and value, it is an efficient replacement in this case.

Code:
from collections import Counter


somelist = {}

with open('apache.log', 'r') as f:
     for line in f:
         pattern = line.split('"')[1].split(' ')[0]
         somelist[pattern] = somelist.get(pattern,0) + 1

a = Counter(somelist)

print('Most Common:')
for d, b in a.most_common(10):
    print('%s: %10d' %(d, b))

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Python: make dual vector dot-product more pythonic

I have this dot product, calculating weighted means, and is applied to two columns in a list: # calculate weighted values in sequence for i in range(len(temperatures)-len(weights)): temperatures.append(sum(*temperatures for j in range(len(weights))])) temperatures.append(sum(*temperatures... (1 Reply)
Discussion started by: figaro
1 Replies

2. Shell Programming and Scripting

parsing

Can some body show me a sed command to remove everyhing upto a '/' and leave the rest of the line. cat data.out This is the directory /tmp/xxx/yy.ksh I only want to get the fullpath name /tmp/xxx.yy.ksh Thanks in advance to all who answer. (3 Replies)
Discussion started by: BeefStu
3 Replies

3. Shell Programming and Scripting

Parsing of file for Report Generation (String parsing and splitting)

Hey guys, I have this file generated by me... i want to create some HTML output from it. The problem is that i am really confused about how do I go about reading the file. The file is in the following format: TID1 Name1 ATime=xx AResult=yyy AExpected=yyy BTime=xx BResult=yyy... (8 Replies)
Discussion started by: umar.shaikh
8 Replies

4. Shell Programming and Scripting

Perl parsing compared to Ksh parsing

#! /usr/local/bin/perl -w $ip = "$ARGV"; $rw = "$ARGV"; $snmpg = "/usr/local/bin/snmpbulkget -v2c -Cn1 -Cn2 -Os -c $rw"; $snmpw = "/usr/local/bin/snmpwalk -Os -c $rw"; $syst=`$snmpg $ip system sysName sysObjectID`; sysDescr.0 = STRING: Cisco Internetwork Operating System Software... (1 Reply)
Discussion started by: popeye
1 Replies

5. Shell Programming and Scripting

Need some help with parsing

I have a big xml file with little formatting in it. It contains over 600 messages that I need to break each message out in its own separate file. The xml file looks in the middle of it something like this: </Title></Msg><Msg><Opener> Hello how are you?<Title> Some says hello</Title><Body>... (3 Replies)
Discussion started by: quixoticking11
3 Replies

6. Shell Programming and Scripting

Parsing question

Hi Guys, I was wondering if you could help me out - I have a directory /home/users/datafiles/ which contain files "dat dd-mm-yy.xls" I am trying to write a script which does the following - (1) loops through all the files (2) retrieves the dd-mm-yy string and converts it into a... (12 Replies)
Discussion started by: muser
12 Replies

7. Shell Programming and Scripting

Parsing problem

Hi, i need to parse a string which looks like this "xyz","1233","cm_asdfasdf" (2 Replies)
Discussion started by: Sushir03
2 Replies

8. Shell Programming and Scripting

Parsing problem

I need to separate out the contents in the string "xyz","1233","cm_asdfasdf" as xyz,1233,cm_asdfasdf Can anyone help me on this?? (1 Reply)
Discussion started by: Sushir03
1 Replies

9. Shell Programming and Scripting

Parsing Files

I have two text files. I need to parse the data. It's names of file and I am using it to rename files. I have file1 containing the original file name and file2 containing the renamed name of the file. I need to parse them together in one file, which will be easy to use the mv command. This is... (4 Replies)
Discussion started by: almeidamik
4 Replies

10. UNIX for Dummies Questions & Answers

parsing

Hi, I want to parse this file.... ( 0 , 0 ) =>heading1 ( 0 , 1 ) =>value1.1a ( 0 , 2 ) =>value2.1a ( 1 , 0 ) =>heading2 ( 1 , 1 ) =>value1.1b ( 1 , 2 ) =>value2.1b ( 2 , 0 ) =>heading3 ( 2 , 1 ) =>value1.1c ( 2 , 2 ) =>value2.1c ( 3 , 0 ) =>heading4 ( 3 , 1 ) =>value1.1d ( 3 , 2... (15 Replies)
Discussion started by: tungaw2004
15 Replies
Login or Register to Ask a Question