Python Reading Individual files and Regex through them


 
Thread Tools Search this Thread
Top Forums Programming Python Reading Individual files and Regex through them
# 1  
Old 11-05-2013
Python Reading Individual files and Regex through them

As a newbie to Python, I am trying to write a script in which is will add all the log files (*.log) from within a directory to a list[], open the files and search for an ip using a regex and single it out (appending the ip's to the list[]). So far, I have:
Code:
import re, os
def list_files()
content = []
    for files in os.walk('var/www/html/data/customer/log'):
        content.append(files)
      return content
lfiles = list_files()
lfiles
file1.log
file2.log
file3.log
file4.log

lfiles[0]
file1.log
file2.log
file3.log
file4.log

At this point I would imagine I need to open these files and regex pulling the ips(this is the part that gets me) So maybe:

http://smallbusiness.chron.com/read-...hon-29648.html

then add somewhere into the function:
Code:
regexp = re.findall(r"(?<=\:\ )\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}",files)

and add whatever else I need to get this done.

Last edited by metallica1973; 11-05-2013 at 05:50 PM..
# 2  
Old 11-05-2013
Why add the filenames to a list? Why not just use the filenames, when you get them?
# 3  
Old 11-05-2013
Many thanks for the reply. Can you show me an example?

Quote:
Why add the filenames to a list? Why not just use the filenames, when you get them?
I was trying to get fancy and as you can see, I dug myself into a hole Smilie

Maybe something like this:
Code:
import os, re

def list_files()
 ips = []
 for subdir, dirs, files in os.walk('var/www/html/data/customer/log'):
    for file in files:
        f=open(file, 'r')
        lines=f.readlines()
        regexp = re.findall(r"(?<=\:\ )\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}",file)
        f.close()
        ips.append(file)
 return ips


Last edited by metallica1973; 11-06-2013 at 06:58 PM..
# 4  
Old 11-07-2013
I made an adjustment that was recommended by someone else and this is what I get:

Code:
import os,re

def list_files():
 ips = []
 for subdir, dirs, files in os.walk('var/www/html/data/customer/log'):
    for file in files:
        f=open(file, 'r')
        lines=f.readlines()
        regexp = re.findall(r"(?<=\:\ )\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}",file)
        f.close()
        ips.append(regexp)
 return ips

and when I use the function I get this error:
Code:
 list_files()
---------------------------------------------------------------------------
IOError                                   Traceback (most recent call last)
/home/Python/banned-scraper/<ipython-input-9-52cf17baf819> in <module>()
----> 1 list_files()

/home/Python/banned-scraper/<ipython-input-8-9070553f1ea7> in list_files()
      5  for subdir, dirs, files in os.walk('var/www/html/data/customer/log'):
      6     for file in files:
----> 7         f=open(file, 'r')
      8         lines=f.readlines()
      9         regexp = re.findall(r"(?<=\:\ )\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}",file)

IOError: [Errno 2] No such file or directory: 'eval7577:1.18595.dbg'

???
# 5  
Old 11-19-2013
Code:
for subdir, dirs, files in os.walk('var/www/html/data/customer/log'):
    for file in files:
        regexp = re.findall(r"10.7.0.145", open(file, "r").read())
        print " Here is whats inside of %s = %s" % (regexp,file)
   ....:         
  Here is whats inside of [] = file3
 Here is whats inside of [] = file6
 Here is whats inside of [] = file7
 Here is whats inside of [] = file1
 Here is whats inside of ['10.7.0.145'] = file9
 Here is whats inside of [] = file5
 Here is whats inside of [] = file8
 Here is whats inside of [] = file10
 Here is whats inside of [] = file2
 Here is whats inside of [] = file4

I made some progress but cant figure out how to just print the file containing the regular expression. So it prints something like this:
Code:
Here is whats inside of ['10.7.0.145'] = file9

only

---------- Post updated at 05:38 PM ---------- Previous update was at 03:35 PM ----------

I figured it out with some help. re.findall returns a list. So I need to only print
Code:
 [0]

which gets the first (and only) item in matches. The if-statement in place, the print line will only be run if matches is non-empty.

Code:
for subdir, dirs, files in os.walk('.'):
    for file in files:
       matches = re.findall(r"10.7.0.145", open(file).read())
       if matches:
           print " I found what I was looking for %s = %s" % (file,matches[0])

returns:
Code:
Here is whats inside of file9 = 10.7.0.145

objective complete.
This User Gave Thanks to metallica1973 For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Python with Regex and Excel

Hello I have a big excel file for Ticket Data Analysis. The idea is to make meaningful insight from Resolution Field. Now as people write whatever they feel like while resolving the ticket it makes quite a task. 1. They may or may not tag it with something like below within the resolution... (1 Reply)
Discussion started by: radioactive9
1 Replies

2. Programming

Python Regex List Creation

Here is a snippet of my code: blahblahblah... blah for link in goodies.soup.find_all('a'): blah.append(link.get('href')) blah=list(set(blah)) which gives my list of urls. So now I use a regex to search for the relevant urls which I want in a list: for r... (0 Replies)
Discussion started by: metallica1973
0 Replies

3. Shell Programming and Scripting

Python Regex Removing One Too Many...

Well, I'm a python noob and my last post here I was introduced to Regex. I thought this would be easy since I knew Regex with Bash. However, I've been banging my head a while to extract an ip address from ifconfig with this: #!/usr/bin/python import re import subprocess from subprocess... (5 Replies)
Discussion started by: Azrael
5 Replies

4. Programming

Python reading from a file

Hello everyone, I've been learning some python (I was using other commercial software before), and doing plots from data stored on files as X and Y pairs has not been an issue. Know, I have some files that look like this: <Descriptive string> <some "random" number> <number of X values:nx>... (0 Replies)
Discussion started by: jaldo0805
0 Replies

5. Shell Programming and Scripting

Python Newbie Question Regex

I starting teaching myself python and am stuck on trying to understand why I am not getting the output that I want. Long story short, I am using PDB for debugging and here my function in which I am having my issue: import re ... ... ... def find_all_flvs(url): soup =... (1 Reply)
Discussion started by: metallica1973
1 Replies

6. Shell Programming and Scripting

Find regex, place on individual lines and insert blank line before

Hello, I have a file that I want to be able to insert a new line before every instance of a regex. I can get it to do this for each line that contains the regex, but not for each instance. Contents of infile: Test this 1... Test this 2... Test this 3... Test this 4... Test this... (2 Replies)
Discussion started by: deneuve01
2 Replies

7. Shell Programming and Scripting

Apply 'awk' to all files in a directory or individual files from a command line

Hi All, I am using the awk command to replace ',' by '\t' (tabs) in a csv file. I would like to apply this to all .csv files in a directory and create .txt files with the tabs. How would I do this in a script? I have the following script called "csvtabs": awk 'BEGIN { FS... (4 Replies)
Discussion started by: ScKaSx
4 Replies

8. Shell Programming and Scripting

Python Regex

I have the below string and regex. However I cant understand why it works the way it does. IP has been changed for safety ;) String = NowSMS Error Report. Error initializing SMSC Interface 'SMPP - 10.15.8.10:17600'. Interface is not available. Regex = (.+\.)\s(.+) I get two... (1 Reply)
Discussion started by: barney34
1 Replies

9. Shell Programming and Scripting

Help Reading XML files in Python: Urgent

OK so my objective is to create a python program that will parse an XML file(input.xml), then the program will create an mxml(output.mxml) file. In the program (.py) i need to read between CDATA so that I would get an output the CDATA code in the input.xml INPUT.XML <!]> ... (2 Replies)
Discussion started by: dcfivesixfive
2 Replies

10. UNIX for Dummies Questions & Answers

Create individual tgz files from a set of files

Hello I have a ton of files in a directory of the format app.log.2008-04-04 I'd like to run a command that would archive each of these files as app.log.2008-04-04.tgz I tried a few combinations of find with xargs etc but no luck. Thanks Amit (4 Replies)
Discussion started by: amitg
4 Replies
Login or Register to Ask a Question