Scan Multiple Dir/Files


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Scan Multiple Dir/Files
# 1  
Old 03-27-2008
Scan Multiple Dir/Files

Hi gang,

I have a project I would like to work on as I learn perl & ruby scripting. Maybe a big bite to chew off at first, but that's how I like to learn. Attack a real world problem.

I would like to enhance our response to spam attacks here at our office where we run mail, dhcp, dns servers. I would like to read the contents of each file (ascii) in each subdirectory of a given directory. My goal is to look for common IP address, email address, subject in the header. If common is found, list the file, location of file, and those lines of the file. This way I can see if I have a real problem with a particular email/IP address.

So, starting from the root of /var/mail/mess:
Search all files in:
/var/mail/mess/0
/var/mail/mess/1
/var/mail/mess/2
etc...

Any ideas on the best way to approach this? I am a noob, and getting familiar with perl & ruby. Thanks!
tonyd
# 2  
Old 03-28-2008
You don't give enough detail and you haven't done any work at all that we can help you with. That's probably why you haven't gotten any replies yet.

Having said that, start by picking a language. Then read its docs to figure out how to walk a directory tree. Then write code that walks the tree and lists each file. Once you get that far, you shouldn't have too much trouble opening each file for reading so you can get to the next step.

Once you've done all that, you'll have some half-working steaming pile of code. At that point, you'll have more specific questions and we can provide more specific answers. None of the above should be difficult if you just look at a basic tutorial or two on your chosen language.

Have fun!

ShawnMilo
# 3  
Old 03-28-2008
You are not clear what you want. The plain ascii text files you are talking about are all email messages. If you look at the headers of emails ascii files, you will see that it usually have fqdn rather than ip address (someone please correct me if I am wrong) and there can be multiple such entries depending on the route the email has taken. Which one do you want? Let me tell you there is no easy way to figure this out.....

Again there can be multiple email adddresses in each file if the email was addressed to more than one recipient.....

If you know a particular IP address or email address or subject line and you simply want to find out which file(s) have them then you can simply use the GNU grep to recursively do that for this:

grep -r <ip|email|subject> /var/mail/mess/*
# 4  
Old 03-28-2008
Actually the Received: headers have plenty of IP addresses. I would assume the task would be to find them all and figure out which ones exist in large enough quantities to signal that there is more than an occasional problem. Of course, spammers know you are going to do this, so they often try specifically to spread out their activities in order to be able to fly below the radar. But really, Shawn already posted a reasonable plan. Let's see your first cut at the code.
# 5  
Old 03-29-2008
Bug

Your right, I didn't give you much to go on. Here's what I came up with. Open to any suggestions based on your experience. Thanks!

tonyd
Code:
#!/usr/local/bin/ruby -w
require 'find'

@results = Array.new

# Iterate through the child directories & call the parse file method
def scan_dirs
	root = "/var/qmail/queue/mess"
	Find.find(root) do |file|
		parse_file(file)
	end
	# Sort on the second element in our array
	@results.sort! {|x, y| y[1] <=> x[1]}
	print_results
end

# Parse each file for the information we want
def parse_file(path)
	
	file =	path[(path.length-7), path.length]
	sourceip = ""
	email = ""
	subject = ""
	line_no = 0

	File.open(path, 'r').each do |line|
		
		line = line.strip # Remove any \n\r nil, etc
		line_no += 1
		
		if line_no == 1
			if line.match("invoked for bounce")
				# Internal Bounce Msg
				sourceip = "SMTP"
			end
		end
		
		if (line_no == 2 and sourceip.empty?)
			if line.match("webmail.internet.net")
				sourceip = "Webmail"
			else
				sourceip = line.scan(/\b(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\b/)
				if sourceip.empty?
					sourceip = "No Source IP**"
				end
			end
		end

		if (line.match("SquirrelMail") and sourceip == "Webmail") or
			 (line.match("From:") and sourceip != "Webmail")
			 if email.empty?
			 	  email = get_email(line)
			 end
		end

		if line.match("Subject:") and subject.empty? 
			subject = truncate(line,50)
		end

		if line_no == 20 #Nothing more we want to read in the file
		@results << ["#{file}", "#{sourceip}", "#{email}", "#{subject}"]
			line_no = 0
			return
		end
	end
end

# Truncate subject line
def truncate(string, width)
  if string.length <= width
    string
  else
    string[0, width-3] + "..."
  end
end

# Print out results
def print_results
	print "\e[2J\e[f"
	
	print "Mess#".ljust(10," ")
	print "Source".ljust(18," ")
	print "Email Addrress".ljust(30, " ")
	print "Subject".ljust(50, " ")
	1.times { print "\n" }
	111.times { print "-" }
	1.times { print "\n" }
	
	@results.each do |line|
		print line[0].ljust(10," ")
		print line[1].ljust(18," ")
		print line[2].ljust(30, " ")
		print line[3].ljust(50, " ")
	
		1.times { print "\n" }
	end
end

# Get email address from line/string
def get_email(line_to_parse)
	# Pull the email address from the line
	line_to_parse.scan(/\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b/i).flatten
end

# Ok, begin our scan
scan_dirs
exit

# 6  
Old 03-29-2008
If you plan to do this on a massive scale, it might make sense to parse the messages as they come in, and index the results. The actual search is then kind of trivial, and much faster.

Me, I would make the regexes muuuch tighter, and I guess I would stop the parse loop at the neck (first empty line separates headers from body) rather than arbitrarily scan 20 lines.
# 7  
Old 03-29-2008
Bug

@era, thanks for your reply. My goal with this script/utility is to be able to do a quick scan of the mail queue when we get an alert from Nagios that the smtp queue has reached a warning threshold capacity. Not so much to realtime anything. And the queue can change every second. So anything indexed would quickly become invalid. Any messages hanging out in the queue for more than a few seconds is usually a result of messages not being delivered due to an invalid address (not always, but as a gen rule). Spammers blast emails. So often when I look at the queue I can see 50/100/200 emails from the same ip/email address. With qmHandle -l I get a list, but it's the entire header of each email. That's mostly usless if you want a quick visual to see pattern. A sorted list with just source ip, email, subject can give you a quick heads up.

Can you give me an forexample on how you would tighten up the regex expressions? I'm not too knowledgable on regular expressions. Still in the learning curve. And I appreciate any feedback as I've not done this before. Thanks!

tonyd
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. AIX

Assign read write permission to the user for specific dir and it's sub dir and files in AIX

I have searched this quite a long time but couldn't find the right method for me to use. I need to assign read write permission to the user for specific directories and it's sub directories and files. I do not want to use ACL. I do not want to assign user the same group of that directories too.... (0 Replies)
Discussion started by: blinkingdan
0 Replies

2. Shell Programming and Scripting

KSH - Find paths of multiple files in CC (dir and sub-dir))

Dear Members, I have a list of xml files like abc.xml.table prq.xml.table ... .. . in a txt file. Now I have to search the file(s) in all directories and sub-directories and print the full path of file in a output txt file. Please help me with the script or command to do so. ... (11 Replies)
Discussion started by: Yoodit
11 Replies

3. UNIX for Dummies Questions & Answers

How to list all files in dir and sub-dir's recursively along with file size?

I am very new to unix as well as shell scripting. I have to write a script for the following requirement. In have to list all the files in directory and its sub directories along with file path and size of the file Please help me in this regard and many thanks in advance. (3 Replies)
Discussion started by: nmakkena
3 Replies

4. Shell Programming and Scripting

moving files from a dir in one machine to a dir in another machines

Hi, I am a unix newbie.I need to write a shell script to move my oracle READ WRITE datafiles from one serevr to another. I need to move it from /u01/oradata/W1KK/.. to /u01/oradata/W2KK, /u02/oradata/W1KK/.. to /u02/oradata/W2KK. That is, I actaully am moving my datafiles from one database to... (2 Replies)
Discussion started by: mathews
2 Replies

5. Shell Programming and Scripting

find string from multiple dir and redirect to new files

Hi, I am new to script and I want find one string from multiple files in diff directories and put that out put to new file. Like I have A,B & C directories and each has multiple files but one file is unic in all the directories like COMM.txt Now I want write script to find the string... (8 Replies)
Discussion started by: Mahessh123
8 Replies

6. Shell Programming and Scripting

A script to find dir, delete files in, and then del dir?

Hello!! I have directories from 2008, with files in them. I want to create a script that will find the directoried from 2008 (example directory: drwxr-xr-x 2 isplan users 1024 Nov 21 2008 FILES_112108), delete the files within those directories and then delete the directories... (3 Replies)
Discussion started by: bigben1220
3 Replies

7. Shell Programming and Scripting

replace string in multiple files, dir and subdir

Hello, I have a directory www with multiple directories. Every directory has site name with .htm, .html, .php files or sub directories with .htm, .php, .html file as example - www - sitename 1 - site 1 - sitename 2 - sitename 3 What I'm looking for is a... (7 Replies)
Discussion started by: andyjill
7 Replies

8. Shell Programming and Scripting

need to move files of particular day from one dir to another dir

Hi, I have hundered's of files of the name CMP_PORT_IN_P200903271623042437_20090328122430_err.xml in error directory of todays date ie 20090328 and in the file name 5th field specifies date only now i want to move all files of 20090328 to another directory i.e reprocess directory. So... (3 Replies)
Discussion started by: ss_ss
3 Replies

9. Shell Programming and Scripting

How to copy specified files from list of files from dir A to dir B

Hello, fjalkdsjfkldsajflkajdskl (3 Replies)
Discussion started by: pmeesara
3 Replies

10. Shell Programming and Scripting

Scripting help for rm log files from multiple dir

Hi, I'm quite new to scripting and need some help. I need to have one script that will check specific directories for files older than one month and then have the script delete them. I have written the script below but it only does one directory. I don't quite know how to make it so it... (7 Replies)
Discussion started by: morgadoa
7 Replies
Login or Register to Ask a Question