Dear unix forum members,
I'm working on a script that will parse a mail machine's logs and print a list of email addresses in this format:
sender@domain,recipient@domain
The logs look something like this:
06:50:04 0048317AC863: client=localhost.com[127.0.0.1]
06:50:04 0048317AC863: message-id=<user@domain>
06:50:04 0048317AC863: from=<user@domain>,
06:50:04 0048317AC863: to=<user@domain>,
06:50:06 0048317AC863: to=<user@domain>,
06:50:18 0048317AC863: to=<user@domain>,
06:50:18 0048317AC863: to=<user@domain>,
06:50:18 0048317AC863: removed
The "from" and "to" are on different lines and there is another challenge which is that the results should be limited to messages who have 5 or fewer recipients.
I thought it would be easy enough, and I wrote a script that first gets a list of the tag numbers ( 0048317AC863
which belong to messages with 5 or fewer recipients
#!/bin/sh
grep "to=<" /data/log/maillog | grep postfix | grep -vi noqueue | awk '{print $6}' | sort |uniq -c > all_ids
cat all_ids |awk '{print " "$1, $2}' | egrep " 1 | 2 | 3 | 4 | 5 " | cut -f 3 -d " " > ids
Very crude and spaghetti like...and even worse is the FOR loop that follows, which involves grepping through the entire 4000mb maillog file 33,000 times in order to print the sender and recipient addresses.
Needless to say, its not an efficient script, there must be a better way. Please help!! Any responses are appreciated, maybe someone can just point me in the right direction?
Thanks,
JJ