Shell Programming and Scripting

BSD, Linux, and UNIX shell scripting — Post awk, bash, csh, ksh, perl, php, python, sed, sh, shell scripts, and other shell scripting languages questions here.

Failure rate of a node / Data center


👤 Login to reply

    #1  
Old 1 Week Ago
chercheur111 chercheur111 is offline
Registered User
 
Failure rate of a node / Data center

Hi,

Please, i have a history of the state of each node in my data center. an history about the failure of my cluster (UN: node up, DN: node down).
Here is some lines of the history:



Code:
08:51:36 UN 127.0.0.1    08:51:36 UN 127.0.0.2    08:51:36 UN 127.0.0.3    08:53:50 DN 127.0.0.1    08:53:50 DN 127.0.0.2    08:53:50 DN 127.0.0.3


I'd like from this history, deduce the failure rate of each node. How can i do that please ? i have for example, to use AI technologies ML or i have to sum UN of each node and divide it on the number of line.
Thank you so much for help. Kind regards.
Sponsored Links
    #2  
Old 1 Week Ago
Don Cragun's Unix or Linux Image
Don Cragun Don Cragun is offline Forum Staff  
Administrator
 
What operating system are you using?

What shell are you using?

How do you expect to deduce a failure rate from a single point in time? Are you instead maybe looking for a percentage of network node failures at this point in time?

What output are you hoping to produce from the sample input you have provided?

What have you tried on your own to get the output you want?
Sponsored Links
    #3  
Old 1 Week Ago
chercheur111 chercheur111 is offline
Registered User
 
What operating system are you using?


Linux OS (Ubuntu distribution)


What shell are you using?

shell bash

How do you expect to deduce a failure rate from a single point in time? Are you instead maybe looking for a percentage of network node failures at this point in time?


This is only a simple example. I will generate an history of some days.


What output are you hoping to produce from the sample input you have provided?

The MTBF (mean-time-between-failures) of each node.


What have you tried on your own to get the output you want?
    #4  
Old 1 Week Ago
Chubler_XL's Unix or Linux Image
Chubler_XL Chubler_XL is offline Forum Staff  
Moderator
 
How about this:

Code:
awk '
{
  for(i= 1; i< NF - 1; i+=3) {
    now=$i
    split($i, tm, ":")
    now=tm[1]*3600+tm[2]*60+tm[3]
    status=$(i+1)
    host=$(i+2)
    if(lastTime[host])
        totalTime[host] += now - lastTime[host]
    lastTime[host]=now
    if(status == "DN") Failure[host]++
    Reading[host]++
  }
}
END {
    for(host in lastTime)
       if (Failure[host])
           if (Failure[host] == Reading[host])
               print host " = 0"
           else
               print host " = " totalTime[host] / Failure[host]
       else
           print host " = No Failures"
}' infile


Infile:
Code:
08:51:36 DN 127.0.0.1 08:51:36 UN 127.0.0.2 08:51:36 UN 127.0.0.3 08:53:50 DN 127.0.0.1 08:53:50 DN 127.0.0.2 08:53:50 UN 127.0.0.3
08:58:36 DN 127.0.0.1 08:58:36 DN 127.0.0.2 08:58:36 UN 127.0.0.2

Result:
Code:
127.0.0.1 = 0
127.0.0.2 = 210
127.0.0.3 = No Failures


Last edited by Chubler_XL; 1 Week Ago at 07:58 PM.. Reason: Host always down should have zero
The Following User Says Thank You to Chubler_XL For This Useful Post:
chercheur111 (1 Week Ago)
Sponsored Links
    #5  
Old 1 Week Ago
chercheur111 chercheur111 is offline
Registered User
 
Quote:
Originally Posted by Chubler_XL View Post
How about this:

Code:
awk '
{
  for(i= 1; i< NF - 1; i+=3) {
    now=$i
    split($i, tm, ":")
    now=tm[1]*3600+tm[2]*60+tm[3]
    status=$(i+1)
    host=$(i+2)
    if(lastTime[host])
        totalTime[host] += now - lastTime[host]
    lastTime[host]=now
    if(status == "DN") Failure[host]++
    Reading[host]++
  }
}
END {
    for(host in lastTime)
       if (Failure[host])
           if (Failure[host] == Reading[host])
               print host " = 0"
           else
               print host " = " totalTime[host] / Failure[host]
       else
           print host " = No Failures"
}' infile

Infile:
Code:
08:51:36 DN 127.0.0.1 08:51:36 UN 127.0.0.2 08:51:36 UN 127.0.0.3 08:53:50 DN 127.0.0.1 08:53:50 DN 127.0.0.2 08:53:50 UN 127.0.0.3
08:58:36 DN 127.0.0.1 08:58:36 DN 127.0.0.2 08:58:36 UN 127.0.0.2

Result:
Code:
127.0.0.1 = 0
127.0.0.2 = 210
 127.0.0.3 = No Failures


Thank you so much for help.
Kind regards.
Sponsored Links
    #6  
Old 2 Days Ago
chercheur111 chercheur111 is offline
Registered User
 
Please, can you explain me why we have this value:


127.0.0.1 = 0

?
Sponsored Links
    #7  
Old 1 Day Ago
Don Cragun's Unix or Linux Image
Don Cragun Don Cragun is offline Forum Staff  
Administrator
 
You said you want the MTBF for each node. The node 127.0.0.1 was always down (for all three times it appeared in the data in post #4 and for both times it appeared in the data in post #1).

If a node is never up, isn't the mean time between failures zero? What value were you expecting?
Sponsored Links
👤 Login to reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Problem in RedHat Cluster Node while network Failure or in Hang mode hirenkmistry Red Hat 0 05-06-2013 12:29 PM



All times are GMT -4. The time now is 09:30 AM.

Unix & Linux Forums Content Copyright©1993-2018. All Rights Reserved.
×
UNIX.COM Login
Username:
Password:  
Show Password





Not a Forum Member?
Forgot Password?