how to monitor 25 different digitalUnix and Sunsolaris machines


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting how to monitor 25 different digitalUnix and Sunsolaris machines
# 1  
Old 12-24-2002
how to monitor 25 different digitalUnix and Sunsolaris machines

Hi

I am a new Junior System Administartor..currently in our team we have around 25 different machines comprising of Sun-Solaris and digital Unix machines...and every morning we telent into the system to check
1)all demons are workin fine
2)all cpus are up
3)the memory is okay

But telnetting into 25 machines is a pain..i am planning to write a shell script taht will automatically sned out a mail to me incase there is a problem with any of the above..any ideas on how to start this..i would just need a pointer or an idea..then i would convert the idea into a shell script...or is there any better way to do it..

Hi i tried swatch..but unfortunately there are some perl modules which i am unable to compile because of gcc problems on sun solaris..

so somebody has a precompiled version of swatch for sun solaris

regards
Hrishy

Last edited by xiamin; 12-24-2002 at 09:49 AM..
# 2  
Old 12-24-2002
First of all there lots of products that do this-- Patrol, Openview, Big Brother, etc...

But here is an approach I that I once used:

First, I wrote a shell script that would check for various things. It wrote to an output file called something like status.`hostname`. The script would touch this file to make sure that it existed. And if it found anything that it didn't like, it would add an error line to the file.

I ported the script to all of our systems and a cron job would run it once an hour on the hour.

At 15 minutes after the hour, one system would run another script. This was an automated ftp job that would retrieve (and delete from the original host) all of the status files.

It is hard to detect errors during an automated ftp job. But after it runs, it is easy to see if the script obtained a status.`hostname` file for each hostname. If not, that is an error.

Also if the status files had any lines, that was an error.

The script would page me if it detected any errors.

This was a little clunky, but I had a monitoring system up and running in less than half a day.
# 3  
Old 12-24-2002
For constant monitoring and no cost, check out Big Brother . You can monitor processes, cpu, disk...fantastic product.

You would still need something to check some other things (you could implement into BB but it's up to you).

This was written back in 1995 - not too many changes since then. Give you a quick snapshot of what has happened in the last 24 hours (runs once every 24 hours). Could be improved but one does not always have the time! Sorry it's in csh - but it's more for knowledge then use - all the servers (over 60) send the snapshot report to one server - a cron job collects all the info - if it doesn't find a report from a server in it's list, it reports that too.
This is the script that runs on each server - if nothing is wrong it sends a zero byte file (which proves the network connection is working). If you want the other script, post back. This works on Solaris 2.6 - does not have to run as root. Also have one for HP.

#!/bin/csh -f
# Created 09/21/95 HOG A script file to gather info from all Unix Systems
# ========SET UP SYMBOLS===========================
set defdir="/tmp"
set node="`hostname`"
set today="`date '+%m%d%y'`"
set theday="`date +'%d'`"
set thedate="`date +'%b %e'`"
set themonth="`date +'%m'`"
set theyear="`date +'%Y'`"
set tmpfile = "$defdir/SI$node.$today"
set y2kfile = "/opt/Y2K/sunscan.$node-$theyear.$themonth.$theday-*/README.$node"
set dailycopy = "oven:/usr/local/sysconfigs/daily"
set monthcopy = "oven:/usr/local/sysconfigs"
set fsmin = "5000"
/usr/bin/rm $defdir/SI$node.*
/usr/bin/touch $tmpfile
#
# ========RUN FOLLOWING COMMANDS ON ALL SYSTEMS====
# Check uptime
set lastboot = `/usr/bin/who -b | awk '{print $4" "$5}'`
if ("$lastboot" == "$thedate") echo "`/usr/bin/who -b`" >> $tmpfile
# Check space on local filesystems
set filesys = `df -bl |grep dsk|grep -v vol|/usr/bin/awk '{print $1}'`
foreach fs ($filesys)
set fs1 = `/usr/bin/df -kl $fs|grep dsk|/usr/bin/awk '{print $4}'`
set fson = `/usr/bin/df -kl $fs|grep dsk|/usr/bin/awk '{print $6}'`
if ($fs1 < $fsmin) then
echo "$fson is at $fs1 kilobytes" >> $tmpfile
endif
end
# Check for OV status
if (-e /opt/OV/bin/ovstatus) then
if ("$node" == "casc-nms128") then
# do nothing - loaded but not running
else
set ovstat = `/opt/OV/bin/ovstatus |/usr/bin/grep -c RUNNING`
if ($ovstat < 5) echo "Only $ovstat OV processes running. Please
check." >> $tmpfile
endif
endif
# Check on meta disks
if (-e /usr/opt/SUNWmd/sbin/metastat) then
set mdstat = `/usr/opt/SUNWmd/sbin/metastat|/usr/bin/grep "State:"|awk '
{print $2}'|/usr/bin/grep -cv "Okay"`
if ($mdstat > 0) then
/usr/bin/echo "$mdstat errors found in metastat" >> $tmpfile
endif
endif
# Check on volume manager disks - normal user can't run vxdisk
if (-e /usr/sbin/vxprint) then
set vxstat = `/usr/sbin/vxprint |grep -ic "recover"`
if ($vxstat > 0) then
/usr/bin/echo "$vxstat errors found in vxprint" >> $tmpfile
endif
endif
# Check prtdiag for errors
if (-e /usr/platform/`uname -i`/sbin/prtdiag) then
set prtdiag = "/usr/platform/`uname -i`/sbin/prtdiag"
set prtdiagstat = `$prtdiag | grep -c "No failures found in System"`
if ($prtdiagstat < 1) then
/usr/bin/echo "Prtdiag shows system errors" >> $tmpfile
endif
endif
#
/usr/bin/rcp $tmpfile $dailycopy
if ("$theday" == "01" && "$node" != "oven") then
if (-e /opt/Y2K) then
/usr/bin/rcp $y2kfile $monthcopy
endif
endif
if (-e /tmp/$node.all) then
/usr/bin/rcp /tmp/$node.all $monthcopy
/usr/bin/mv /tmp/$node.all /tmp/$node.old
endif
# ==================================================
exit
# 4  
Old 12-25-2002
Hi

Thank you very very much....i am also lokin at NAGIOS..i am awaiting your reviews on this one..and yeah..thank yopu for that cool cool..script and merry xams to all these wonderful people here..;-D

regards
Hrishy
# 5  
Old 12-27-2002
Quote:
Thank you very very much....i am also lokin at NAGIOS
I was going to suggest the same thing. We used Netsaint at my last job, written by the same person that is maintaining Nagios. It was very detailed and made looking after boxes around the country very easy.
Login or Register to Ask a Question

Previous Thread | Next Thread

8 More Discussions You Might Find Interesting

1. Infrastructure Monitoring

Searching for Saas Monitor service which monitor my servers which are sitting in different providers

Sorry if this is the wrong forum Searching for Saas Monitor service which monitor my servers which are sitting in different providers . This monitor tool will take as less CPU as possible , and will send info about the server to main Dashboard. The info I need is CPU / RAM / my servers status (... (1 Reply)
Discussion started by: umen
1 Replies

2. UNIX for Advanced & Expert Users

Mounting SunSolaris Filesystem on Linux Ubuntu Server

Can someone please help me out here. I have SunSolaris server that has a ridiculous about of space on it. several hundred gigabytes of space. There are lots of partitions on this server that has at least 100Gs on them. I want to mount just one of these partitions on my Linux server so I can... (4 Replies)
Discussion started by: SkySmart
4 Replies

3. UNIX for Advanced & Expert Users

Problem with grep command options in Sunsolaris

Hi Experts I need the following output from grep command of Sunsolaris on a set of input files. Output:........ 1st search string from file1 2nd search string from file1 3rd search string from file1 1st search string from file2 2nd search string from file2 3rd search string from... (3 Replies)
Discussion started by: ks_reddy
3 Replies

4. Programming

GUI applications on SunSolaris and RedHat Linux

Hello, I want know about building a product on Sun solaris and Redhat Linux. Product would contain C,C++, Java, UNIX Shell scripts and so on. It will not be a client server programme. Thanks! Shafi (5 Replies)
Discussion started by: shafi2all
5 Replies

5. Solaris

Sunsolaris shell script runs only as super user

Hi Friends, I am new to Sun solaris unix.I am facing problem while runing my kornshell script just as an ordinary user.The script works fine while i am working as a super user.the script just uses awk to check the first charcter of a file and then copies the file to another folder. Do i... (4 Replies)
Discussion started by: gjithin
4 Replies

6. Solaris

Find cmd working in Linux and not in SunSolaris 5.8

find . -type f -mtime -1 -ls command not working in sun solaris 5.8 (4 Replies)
Discussion started by: navjotbaweja
4 Replies

7. Solaris

SunSolaris-v5.9: Default Security Settings

I am working on SunSolaris- v5.9 and am trying to obtain default security settings (including password settings). Although in the AIX environment, to obtain default setting the following commands are used: /etc/security/user /etc/security/limits /etc/security/environ... (1 Reply)
Discussion started by: eysheikah
1 Replies

8. UNIX for Dummies Questions & Answers

Where can I download a DigitalUnix

Excuse me I'm a chinese UNIX fans. Where can I download a DigitalUnix new version. Is it language supportd chinese ? Thanks a lots! (1 Reply)
Discussion started by: 91service
1 Replies
Login or Register to Ask a Question