![]() |
|
|
|
|
|||||||
| Forums | Portal | Register | Rules & FAQ | Contribute | Members List | Arcade | Search | Today's Posts | Mark Forums Read |
| UNIX for Dummies Questions & Answers If you're not sure where to post a UNIX or Linux question, post it here. All UNIX and Linux newbies welcome !! |
|
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Adding Multiple Lines to Multiple Files | dayinthelife | Shell Programming and Scripting | 2 | 06-04-2008 08:50 AM |
| Script to Scan proclog files | deeprajn95 | Shell Programming and Scripting | 3 | 05-12-2008 03:25 AM |
| Perl script to scan through files | gholdbhurg | Shell Programming and Scripting | 1 | 03-05-2008 06:53 PM |
| Multiple search in multiple files | maxvirrozeito | Shell Programming and Scripting | 2 | 12-13-2007 09:32 AM |
| Searching multiple files with multiple expressions | Anahka | Shell Programming and Scripting | 6 | 01-07-2004 02:24 PM |
|
|
LinkBack | Thread Tools | Display Modes |
|
|||
|
Scan Multiple Dir/Files
Hi gang,
I have a project I would like to work on as I learn perl & ruby scripting. Maybe a big bite to chew off at first, but that's how I like to learn. Attack a real world problem. I would like to enhance our response to spam attacks here at our office where we run mail, dhcp, dns servers. I would like to read the contents of each file (ascii) in each subdirectory of a given directory. My goal is to look for common IP address, email address, subject in the header. If common is found, list the file, location of file, and those lines of the file. This way I can see if I have a real problem with a particular email/IP address. So, starting from the root of /var/mail/mess: Search all files in: /var/mail/mess/0 /var/mail/mess/1 /var/mail/mess/2 etc... Any ideas on the best way to approach this? I am a noob, and getting familiar with perl & ruby. Thanks! tonyd |
| Forum Sponsor | ||
|
|
|
|||
|
You don't give enough detail and you haven't done any work at all that we can help you with. That's probably why you haven't gotten any replies yet.
Having said that, start by picking a language. Then read its docs to figure out how to walk a directory tree. Then write code that walks the tree and lists each file. Once you get that far, you shouldn't have too much trouble opening each file for reading so you can get to the next step. Once you've done all that, you'll have some half-working steaming pile of code. At that point, you'll have more specific questions and we can provide more specific answers. None of the above should be difficult if you just look at a basic tutorial or two on your chosen language. Have fun! ShawnMilo |
|
|||
|
You are not clear what you want. The plain ascii text files you are talking about are all email messages. If you look at the headers of emails ascii files, you will see that it usually have fqdn rather than ip address (someone please correct me if I am wrong) and there can be multiple such entries depending on the route the email has taken. Which one do you want? Let me tell you there is no easy way to figure this out.....
Again there can be multiple email adddresses in each file if the email was addressed to more than one recipient..... If you know a particular IP address or email address or subject line and you simply want to find out which file(s) have them then you can simply use the GNU grep to recursively do that for this: grep -r <ip|email|subject> /var/mail/mess/* |
|
|||
|
Actually the Received: headers have plenty of IP addresses. I would assume the task would be to find them all and figure out which ones exist in large enough quantities to signal that there is more than an occasional problem. Of course, spammers know you are going to do this, so they often try specifically to spread out their activities in order to be able to fly below the radar. But really, Shawn already posted a reasonable plan. Let's see your first cut at the code.
|
|
|||
|
Your right, I didn't give you much to go on. Here's what I came up with. Open to any suggestions based on your experience. Thanks!
tonyd Code:
#!/usr/local/bin/ruby -w
require 'find'
@results = Array.new
# Iterate through the child directories & call the parse file method
def scan_dirs
root = "/var/qmail/queue/mess"
Find.find(root) do |file|
parse_file(file)
end
# Sort on the second element in our array
@results.sort! {|x, y| y[1] <=> x[1]}
print_results
end
# Parse each file for the information we want
def parse_file(path)
file = path[(path.length-7), path.length]
sourceip = ""
email = ""
subject = ""
line_no = 0
File.open(path, 'r').each do |line|
line = line.strip # Remove any \n\r nil, etc
line_no += 1
if line_no == 1
if line.match("invoked for bounce")
# Internal Bounce Msg
sourceip = "SMTP"
end
end
if (line_no == 2 and sourceip.empty?)
if line.match("webmail.internet.net")
sourceip = "Webmail"
else
sourceip = line.scan(/\b(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\b/)
if sourceip.empty?
sourceip = "No Source IP**"
end
end
end
if (line.match("SquirrelMail") and sourceip == "Webmail") or
(line.match("From:") and sourceip != "Webmail")
if email.empty?
email = get_email(line)
end
end
if line.match("Subject:") and subject.empty?
subject = truncate(line,50)
end
if line_no == 20 #Nothing more we want to read in the file
@results << ["#{file}", "#{sourceip}", "#{email}", "#{subject}"]
line_no = 0
return
end
end
end
# Truncate subject line
def truncate(string, width)
if string.length <= width
string
else
string[0, width-3] + "..."
end
end
# Print out results
def print_results
print "\e[2J\e[f"
print "Mess#".ljust(10," ")
print "Source".ljust(18," ")
print "Email Addrress".ljust(30, " ")
print "Subject".ljust(50, " ")
1.times { print "\n" }
111.times { print "-" }
1.times { print "\n" }
@results.each do |line|
print line[0].ljust(10," ")
print line[1].ljust(18," ")
print line[2].ljust(30, " ")
print line[3].ljust(50, " ")
1.times { print "\n" }
end
end
# Get email address from line/string
def get_email(line_to_parse)
# Pull the email address from the line
line_to_parse.scan(/\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b/i).flatten
end
# Ok, begin our scan
scan_dirs
exit
|
|
|||
|
If you plan to do this on a massive scale, it might make sense to parse the messages as they come in, and index the results. The actual search is then kind of trivial, and much faster.
Me, I would make the regexes muuuch tighter, and I guess I would stop the parse loop at the neck (first empty line separates headers from body) rather than arbitrarily scan 20 lines. |
|
|||
|
@era, thanks for your reply. My goal with this script/utility is to be able to do a quick scan of the mail queue when we get an alert from Nagios that the smtp queue has reached a warning threshold capacity. Not so much to realtime anything. And the queue can change every second. So anything indexed would quickly become invalid. Any messages hanging out in the queue for more than a few seconds is usually a result of messages not being delivered due to an invalid address (not always, but as a gen rule). Spammers blast emails. So often when I look at the queue I can see 50/100/200 emails from the same ip/email address. With qmHandle -l I get a list, but it's the entire header of each email. That's mostly usless if you want a quick visual to see pattern. A sorted list with just source ip, email, subject can give you a quick heads up.
Can you give me an forexample on how you would tighten up the regex expressions? I'm not too knowledgable on regular expressions. Still in the learning curve. And I appreciate any feedback as I've not done this before. Thanks! tonyd |
|||
| Google UNIX.COM |
| Tags |
| regex, regular expressions |
| Thread Tools | |
| Display Modes | |
|
|