The UNIX and Linux Forums  

Go Back   The UNIX and Linux Forums > Top Forums > UNIX for Dummies Questions & Answers
Google UNIX.COM


UNIX for Dummies Questions & Answers If you're not sure where to post a UNIX or Linux question, post it here. All UNIX and Linux newbies welcome !!

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Adding Multiple Lines to Multiple Files dayinthelife Shell Programming and Scripting 2 06-04-2008 08:50 AM
Script to Scan proclog files deeprajn95 Shell Programming and Scripting 3 05-12-2008 03:25 AM
Perl script to scan through files gholdbhurg Shell Programming and Scripting 1 03-05-2008 06:53 PM
Multiple search in multiple files maxvirrozeito Shell Programming and Scripting 2 12-13-2007 09:32 AM
Searching multiple files with multiple expressions Anahka Shell Programming and Scripting 6 01-07-2004 02:24 PM

Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 03-27-2008
Registered User
 

Join Date: Mar 2008
Posts: 3
Scan Multiple Dir/Files

Hi gang,

I have a project I would like to work on as I learn perl & ruby scripting. Maybe a big bite to chew off at first, but that's how I like to learn. Attack a real world problem.

I would like to enhance our response to spam attacks here at our office where we run mail, dhcp, dns servers. I would like to read the contents of each file (ascii) in each subdirectory of a given directory. My goal is to look for common IP address, email address, subject in the header. If common is found, list the file, location of file, and those lines of the file. This way I can see if I have a real problem with a particular email/IP address.

So, starting from the root of /var/mail/mess:
Search all files in:
/var/mail/mess/0
/var/mail/mess/1
/var/mail/mess/2
etc...

Any ideas on the best way to approach this? I am a noob, and getting familiar with perl & ruby. Thanks!
tonyd
Reply With Quote
Forum Sponsor
  #2 (permalink)  
Old 03-28-2008
Registered User
 

Join Date: Jun 2006
Posts: 154
You don't give enough detail and you haven't done any work at all that we can help you with. That's probably why you haven't gotten any replies yet.

Having said that, start by picking a language. Then read its docs to figure out how to walk a directory tree. Then write code that walks the tree and lists each file. Once you get that far, you shouldn't have too much trouble opening each file for reading so you can get to the next step.

Once you've done all that, you'll have some half-working steaming pile of code. At that point, you'll have more specific questions and we can provide more specific answers. None of the above should be difficult if you just look at a basic tutorial or two on your chosen language.

Have fun!

ShawnMilo
Reply With Quote
  #3 (permalink)  
Old 03-28-2008
Registered User
 

Join Date: Mar 2008
Location: Bangalore
Posts: 12
You are not clear what you want. The plain ascii text files you are talking about are all email messages. If you look at the headers of emails ascii files, you will see that it usually have fqdn rather than ip address (someone please correct me if I am wrong) and there can be multiple such entries depending on the route the email has taken. Which one do you want? Let me tell you there is no easy way to figure this out.....

Again there can be multiple email adddresses in each file if the email was addressed to more than one recipient.....

If you know a particular IP address or email address or subject line and you simply want to find out which file(s) have them then you can simply use the GNU grep to recursively do that for this:

grep -r <ip|email|subject> /var/mail/mess/*
Reply With Quote
  #4 (permalink)  
Old 03-28-2008
era era is offline
Herder of Useless Cats
 

Join Date: Mar 2008
Location: /there/is/only/bin/sh
Posts: 3,111
Actually the Received: headers have plenty of IP addresses. I would assume the task would be to find them all and figure out which ones exist in large enough quantities to signal that there is more than an occasional problem. Of course, spammers know you are going to do this, so they often try specifically to spread out their activities in order to be able to fly below the radar. But really, Shawn already posted a reasonable plan. Let's see your first cut at the code.
Reply With Quote
  #5 (permalink)  
Old 03-28-2008
Registered User
 

Join Date: Mar 2008
Posts: 3
Smile

Your right, I didn't give you much to go on. Here's what I came up with. Open to any suggestions based on your experience. Thanks!

tonyd
Code:
#!/usr/local/bin/ruby -w
require 'find'

@results = Array.new

# Iterate through the child directories & call the parse file method
def scan_dirs
	root = "/var/qmail/queue/mess"
	Find.find(root) do |file|
		parse_file(file)
	end
	# Sort on the second element in our array
	@results.sort! {|x, y| y[1] <=> x[1]}
	print_results
end

# Parse each file for the information we want
def parse_file(path)
	
	file =	path[(path.length-7), path.length]
	sourceip = ""
	email = ""
	subject = ""
	line_no = 0

	File.open(path, 'r').each do |line|
		
		line = line.strip # Remove any \n\r nil, etc
		line_no += 1
		
		if line_no == 1
			if line.match("invoked for bounce")
				# Internal Bounce Msg
				sourceip = "SMTP"
			end
		end
		
		if (line_no == 2 and sourceip.empty?)
			if line.match("webmail.internet.net")
				sourceip = "Webmail"
			else
				sourceip = line.scan(/\b(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\b/)
				if sourceip.empty?
					sourceip = "No Source IP**"
				end
			end
		end

		if (line.match("SquirrelMail") and sourceip == "Webmail") or
			 (line.match("From:") and sourceip != "Webmail")
			 if email.empty?
			 	  email = get_email(line)
			 end
		end

		if line.match("Subject:") and subject.empty? 
			subject = truncate(line,50)
		end

		if line_no == 20 #Nothing more we want to read in the file
		@results << ["#{file}", "#{sourceip}", "#{email}", "#{subject}"]
			line_no = 0
			return
		end
	end
end

# Truncate subject line
def truncate(string, width)
  if string.length <= width
    string
  else
    string[0, width-3] + "..."
  end
end

# Print out results
def print_results
	print "\e[2J\e[f"
	
	print "Mess#".ljust(10," ")
	print "Source".ljust(18," ")
	print "Email Addrress".ljust(30, " ")
	print "Subject".ljust(50, " ")
	1.times { print "\n" }
	111.times { print "-" }
	1.times { print "\n" }
	
	@results.each do |line|
		print line[0].ljust(10," ")
		print line[1].ljust(18," ")
		print line[2].ljust(30, " ")
		print line[3].ljust(50, " ")
	
		1.times { print "\n" }
	end
end

# Get email address from line/string
def get_email(line_to_parse)
	# Pull the email address from the line
	line_to_parse.scan(/\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b/i).flatten
end

# Ok, begin our scan
scan_dirs
exit
Reply With Quote
  #6 (permalink)  
Old 03-28-2008
era era is offline
Herder of Useless Cats
 

Join Date: Mar 2008
Location: /there/is/only/bin/sh
Posts: 3,111
If you plan to do this on a massive scale, it might make sense to parse the messages as they come in, and index the results. The actual search is then kind of trivial, and much faster.

Me, I would make the regexes muuuch tighter, and I guess I would stop the parse loop at the neck (first empty line separates headers from body) rather than arbitrarily scan 20 lines.
Reply With Quote
  #7 (permalink)  
Old 03-29-2008
Registered User
 

Join Date: Mar 2008
Posts: 3
Smile

@era, thanks for your reply. My goal with this script/utility is to be able to do a quick scan of the mail queue when we get an alert from Nagios that the smtp queue has reached a warning threshold capacity. Not so much to realtime anything. And the queue can change every second. So anything indexed would quickly become invalid. Any messages hanging out in the queue for more than a few seconds is usually a result of messages not being delivered due to an invalid address (not always, but as a gen rule). Spammers blast emails. So often when I look at the queue I can see 50/100/200 emails from the same ip/email address. With qmHandle -l I get a list, but it's the entire header of each email. That's mostly usless if you want a quick visual to see pattern. A sorted list with just source ip, email, subject can give you a quick heads up.

Can you give me an forexample on how you would tighten up the regex expressions? I'm not too knowledgable on regular expressions. Still in the learning curve. And I appreciate any feedback as I've not done this before. Thanks!

tonyd
Reply With Quote
Google UNIX.COM
Reply

Tags
regex, regular expressions

Thread Tools
Display Modes




All times are GMT -7. The time now is 11:35 AM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited.
The UNIX and Linux Forums Content Copyright ©1993-2008 The CEP Blog All Rights Reserved -Ad Management by RedTyger Visit The Global Fact Book

Content Relevant URLs by vBSEO 3.2.0