Display which test in the if/else is failing


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Display which test in the if/else is failing
# 1  
Old 01-21-2011
Display which test in the if/else is failing

So I have a script that monitors my drives (/dev/sda and /dev/sdb) using smartctl (smartmontools). I'm by no means an expert in scripting, so this was my attempt at creating a way to email me if one of the values in smartctl output goes above a set threshold.

My question is, I'm trying to edit the "Subject" line of the email that it sends, so that I can tell which 'test' is failing. Is it temperature_celsius, etc? Is there a way to name each one and then use a variable like $failedtest in the subject line?

I'm at a loss for how to write this..
Code:
#!/bin/sh

export PATH=/ffp/bin:/ffp/sbin:$PATH
/ffp/bin/touch /ffp/tmp/smartmessage1
/ffp/bin/touch /ffp/tmp/smartmessage2
SMARTMESSAGE1=/ffp/tmp/smartmessage1
SMARTMESSAGE2=/ffp/tmp/smartmessage2
MAILMESSAGE=/ffp/tmp/mailmessage
FROMADDR=fromemail@gmail.com
SUBJECT="[ALERT!] SMART Monitoring as of [`date`]"
TO_EMAIL_ADDR=toemail@gmail.com
DEVA=/dev/sda
DEVB=/dev/sdb
LOG=/mnt/HD_a2/logs/smartctl_mail.log

echo [`date`] SMART Monitoring script started, variables set... >> $LOG


/ffp/sbin/smartctl -d marvell -a $DEVA > $SMARTMESSAGE1
echo [`date`] /dev/sda scanned. >> $LOG
/ffp/sbin/smartctl -d marvell -a $DEVB > $SMARTMESSAGE2
echo [`date`] /dev/sdb scanned. >> $LOG

cat $SMARTMESSAGE1 > $MAILMESSAGE
cat $SMARTMESSAGE2 >> $MAILMESSAGE

if [ `cat $SMARTMESSAGE1 | grep Raw_Read_Error_Rate | /ffp/bin/awk '{print $10}'` -gt 0 \
        -o `cat $SMARTMESSAGE1 | grep Reallocated_Sector_Ct | /ffp/bin/awk '{print $10}'` -gt 0 \
        -o `cat $SMARTMESSAGE1 | grep Seek_Error_Rate | /ffp/bin/awk '{print $10}'` -gt 0 \
        -o `cat $SMARTMESSAGE1 | grep Spin_Retry_Count | /ffp/bin/awk '{print $10}'` -gt 0 \
        -o `cat $SMARTMESSAGE1 | grep Calibration_Retry_Count | /ffp/bin/awk '{print $10}'` -gt 0 \
        -o `cat $SMARTMESSAGE1 | grep Temperature_Celsius | /ffp/bin/awk '{print $10}'` -gt 40 \
        -o `cat $SMARTMESSAGE1 | grep Reallocated_Event_Count | /ffp/bin/awk '{print $10}'` -gt 0 \
        -o `cat $SMARTMESSAGE1 | grep Current_Pending_Sector | /ffp/bin/awk '{print $10}'` -gt 0 \
        -o `cat $SMARTMESSAGE1 | grep Offline_Uncorrectable | /ffp/bin/awk '{print $10}'` -gt 0 \
        -o `cat $SMARTMESSAGE1 | grep UDMA_CRC_Error_Count | /ffp/bin/awk '{print $10}'` -gt 0 \
        -o `cat $SMARTMESSAGE1 | grep Multi_Zone_Error_Rate | /ffp/bin/awk '{print $10}'` -gt 0 \
        -o `cat $SMARTMESSAGE2 | grep Raw_Read_Error_Rate | /ffp/bin/awk '{print $10}'` -gt 0 \
        -o `cat $SMARTMESSAGE2 | grep Reallocated_Sector_Ct | /ffp/bin/awk '{print $10}'` -gt 0 \
        -o `cat $SMARTMESSAGE2 | grep Seek_Error_Rate | /ffp/bin/awk '{print $10}'` -gt 0 \
        -o `cat $SMARTMESSAGE2 | grep Spin_Retry_Count | /ffp/bin/awk '{print $10}'` -gt 0 \
        -o `cat $SMARTMESSAGE2 | grep Calibration_Retry_Count | /ffp/bin/awk '{print $10}'` -gt 0 \
        -o `cat $SMARTMESSAGE2 | grep Temperature_Celsius | /ffp/bin/awk '{print $10}'` -gt 40 \
        -o `cat $SMARTMESSAGE2 | grep Reallocated_Event_Count | /ffp/bin/awk '{print $10}'` -gt 0 \
        -o `cat $SMARTMESSAGE2 | grep Current_Pending_Sector | /ffp/bin/awk '{print $10}'` -gt 0 \
        -o `cat $SMARTMESSAGE2 | grep Offline_Uncorrectable | /ffp/bin/awk '{print $10}'` -gt 0 \
        -o `cat $SMARTMESSAGE2 | grep UDMA_CRC_Error_Count | /ffp/bin/awk '{print $10}'` -gt 0 \
        -o `cat $SMARTMESSAGE2 | grep Multi_Zone_Error_Rate | /ffp/bin/awk '{print $10}'` -gt 0 ] ;

then

echo [`date`] Problems found...sending mail. >>$LOG

/ffp/bin/mailx -s "$SUBJECT" \
-S smtp-use-starttls \
-S ssl-verify=ignore \
-S smtp-auth=login \
-S smtp=smtp://smtp.gmail.com:587 \
-S from="$FROMADDR" \
-S smtp-auth-user=useremail@gmail.com \
-S smtp-auth-password=pw \
-S ssl-verify=ignore \
$TO_EMAIL_ADDR < $MAILMESSAGE

echo [`date`] Message Sent! >> $LOG

else

echo [`date`] No problems found...not sending mail. >>$LOG

fi

rm $SMARTMESSAGE1 $SMARTMESSAGE2 $MAILMESSAGE

exit 0

In case it matters, here is an output of `smartctl -d marvell -a /dev/sda`:


Code:
smartctl 5.39.1 2010-01-28 r3054 [arm-unknown-linux-uclibc] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Caviar Blue Serial ATA family
Device Model:     WDC WD6400AAKS-00H2B0
Serial Number:    WD-WMASY7478202
Firmware Version: 07.04C07
User Capacity:    640,135,028,736 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Fri Jan 21 09:55:36 2011 GMT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84) Offline data collection activity
                                        was suspended by an interrupting command from host.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                 (12360) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 145) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x303f) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   166   165   021    Pre-fail  Always       -       4675
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       848
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   100   253   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   098   098   000    Old_age   Always       -       2082
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       27
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       26
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       848
194 Temperature_Celsius     0x0022   109   100   000    Old_age   Always       -       38
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      1322         -
# 2  Short offline       Completed without error       00%      1321         -
# 3  Short offline       Completed without error       00%      1319         -
# 4  Short offline       Aborted by host               10%      1318         -
# 5  Short offline       Aborted by host               10%      1318         -
# 6  Conveyance offline  Completed without error       00%       328         -
# 7  Short offline       Completed without error       00%       328         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Any cleanup/shorter command suggestions are always welcome too, this was just the best I could come up with.

Thanks

Last edited by bound4h; 01-21-2011 at 10:59 AM..
# 2  
Old 01-21-2011
I parse those datalines and comparing values.
I also create variable which name is column ATTRIBUTE_NAME. Possible to use variable and comparing those. Only example if needed.
Code:
# ... set variables and run commands, look original solution

msg=/var/tmp/$$.msg.txt
> $msg
err=""

for f in $SMARTMESSAGE1 $SMARTMESSAGE2
do
    data=0
    cat $f | while read line
    do
       # only datalines include values ...
        case "$line" in
                ID*ATTRIBUTE_NAME*) data=1 ; continue ;;
                SMART*Error*Log*Ver*) data=0 ; continue ;;
                "") continue ;;
        esac
        ((data==0)) && continue

        flds=($line)  # parse to array
        f2="${flds[1]}"
         nflds=${#flds[*]}
        ((nflds=nflds-1))
        lastfld="${flds[$nflds]}"
        # extra lines, example to create variable and set value
        f2=${f2//-/_}  # cariablename can't include - => change it to the _
        eval $f2=\""$lastfld"\"
        # extra end
        case "$f2" in
                Temperature_Celsius) ((lastfld > 40 )) && echo "$f $f2 $lastfld" >> $msg  && err="$err $f2";;
                *) ((lastfld > 0 )) && echo "$f $f2 $lastfld" >> $msg  && err="$err $f2" ;;
        esac
    done
done

SUBJECT="$SUBJECT $err"
lines=$(cat $msg | wc -l)
# if msg file include lines - we have something to tell
if ((lines < 1 ))
then
        # no problem
        echo "$(date) no problem" >> $LOG
        rm -f $msg 2>/dev/null
        exit 0
fi

# Houston - we have problem
# send mail
echo "Some problem, sending mail " >&2
date >> $msg
# sendmail here

# rm tmp file
rm -f $msg 2>/dev/null

# 3  
Old 01-21-2011
Thank you for taking the time to write that code, however I am trying to understand it.

Sorry to sound so dumb, but how do I 'implement' that into my code? What does it do and what parts of my code need to be taken out to make this work?

Thanks
# 4  
Old 01-22-2011
the code given as solution is nothing but reading the output file from /tmp it is indisual script just using variable of origional script you can put in end of your origional script.

---------- Post updated at 01:07 PM ---------- Previous update was at 01:00 PM ----------

the code given as solution is nothing but reading the output file from /tmp it is indisual script just using variable of origional script you can put in end of your origional script.

---------- Post updated at 01:07 PM ---------- Previous update was at 01:07 PM ----------

the code given as solution is nothing but reading the output file from /tmp it is indisual script just using variable of origional script you can put in end of your origional script.
# 5  
Old 01-22-2011
All stuff before
Code:
cat $SMARTMESSAGE1 > $MAILMESSAGE
cat $SMARTMESSAGE2 >> $MAILMESSAGE

from original and mail sending.
# 6  
Old 01-22-2011
You could use a function to log the messages...
Code:
logmessage(){
  echo "[`date`] $@" >> "$LOG"
}
logmessage "SMART Monitoring script started, variables set..."
logmessage "bla bla"

Also all those awks are expensive and your script will take a long time to run. You might want to consider using a case statement instead. Further, you could use a list of devices so the script becomes generic and it becomes easier to log to what device an error belongs and what the error message was. For example (not tested):

Code:
PROBLEM=false
DEVS="/dev/sda /dev/sdb"
for dev in $DEVS; do
  smartctl -d marvell -a $dev | tee -a "$MAILMESSAGE" | 
  while read line; do
    case $line in 
      *Calibration_Retry_Count*|\
      *Current_Pending_Sector* |\
      *Multi_Zone_Error_Rate*  |\
      *Offline_Uncorrectable*  |\
      *Raw_Read_Error_Rate*    |\
      *Reallocated_Event_Count*|\
      *Reallocated_Sector_Ct*  |\
      *Seek_Error_Rate*        |\
      *Spin_Retry_Count*       |\
      *UDMA_CRC_Error_Count*)
         if [ "${line##* }" -gt 0 ]; then                 # {line##* } is the last word on the line
           printf "%s: %s\n" "$dev" "$line"
           PROBLEM=true
         fi ;;
      *Temperature_Celsius*)
         if [ "${line##* }" -gt 40 ]; then
           printf "%s: %s\n" "$dev" "$line"
           PROBLEM=true
         fi ;;
    esac
  done 
  logmessage "$dev scanned."
done >> "$MAILMESSAGE"

The variable PROBLEM indicates if a problem occurred. You can test for it like this:
Code:
if $PROBLEM; then ...


Last edited by Scrutinizer; 01-22-2011 at 06:16 AM..
Login or Register to Ask a Question

Previous Thread | Next Thread

6 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

PERL DBD make test on Linux failing

I am installing Oracle DBD to PERL 5.16.3 and during make test , I am running into this error :rm -f blib/arch/auto/DBD/Oracle/Oracle.so LD_RUN_PATH="/opt/oracle/product/11.2.0/racdb11204/lib" gcc -m32 -shared -O2 -L/usr/local/lib -fstack-protector Oracle.o dbdimp.o oci8.o -o... (3 Replies)
Discussion started by: talashil
3 Replies

2. UNIX for Dummies Questions & Answers

Script test failing

Testing some old script developed by different user. #!/usr/bin/sh case "$0" in */*) cmd="$0";; *) cmd=`which "$0"`;; esac dir=`dirname "$cmd"` node="$dir/." echo $node below two simple tests are failing, I am not seeing any Control+M characters in the script file and I am not able... (4 Replies)
Discussion started by: srimitta
4 Replies

3. Shell Programming and Scripting

Test file script - if evaluation failing

I have a following script to evaluate if file exist in the directory and then archive it. #!/bin/bash #master directory scriptdir="/flex/sh/interfaces" #change this path only - all other paths are connected with it filedir="/flex/interfaces" #change this path only - all other paths are... (3 Replies)
Discussion started by: viallos
3 Replies

4. Shell Programming and Scripting

Prefixing test case methods with letter 'test'

Hi, I have a Python unit test cases source code file which contains more than a hundred test case methods. In that, some of the test case methods already have prefix 'test' where as some of them do not have. Now, I need to add the string 'test' (case-sensitive) as a prefix to those of the... (5 Replies)
Discussion started by: royalibrahim
5 Replies

5. Shell Programming and Scripting

How to check weather a string is like test* or test* ot *test* in if condition

How to check weather a string is like test* or test* ot *test* in if condition (5 Replies)
Discussion started by: johnjerome
5 Replies

6. Shell Programming and Scripting

Test on string containing spacewhile test 1 -eq 1 do read a $a if test $a = quitC then break fi d

This is the code: while test 1 -eq 1 do read a $a if test $a = stop then break fi done I read a command on every loop an execute it. I check if the string equals the word stop to end the loop,but it say that I gave too many arguments to test. For example echo hello. Now the... (1 Reply)
Discussion started by: Max89
1 Replies
Login or Register to Ask a Question