Failure rate of a node / Data center


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Failure rate of a node / Data center
# 1  
Old 07-11-2018
Failure rate of a node / Data center

Hi,

Please, i have a history of the state of each node in my data center. an history about the failure of my cluster (UN: node up, DN: node down).
Here is some lines of the history:



Code:
08:51:36 UN 127.0.0.1    08:51:36 UN 127.0.0.2    08:51:36 UN 127.0.0.3    08:53:50 DN 127.0.0.1    08:53:50 DN 127.0.0.2    08:53:50 DN 127.0.0.3


I'd like from this history, deduce the failure rate of each node. How can i do that please ? i have for example, to use AI technologies ML or i have to sum UN of each node and divide it on the number of line.
Thank you so much for help. Kind regards.
# 2  
Old 07-11-2018
What operating system are you using?

What shell are you using?

How do you expect to deduce a failure rate from a single point in time? Are you instead maybe looking for a percentage of network node failures at this point in time?

What output are you hoping to produce from the sample input you have provided?

What have you tried on your own to get the output you want?
# 3  
Old 07-12-2018
What operating system are you using?


Linux OS (Ubuntu distribution)


What shell are you using?

shell bash

How do you expect to deduce a failure rate from a single point in time? Are you instead maybe looking for a percentage of network node failures at this point in time?


This is only a simple example. I will generate an history of some days.


What output are you hoping to produce from the sample input you have provided?

The MTBF (mean-time-between-failures) of each node.


What have you tried on your own to get the output you want?
# 4  
Old 07-12-2018
How about this:

Code:
awk '
{
  for(i= 1; i< NF - 1; i+=3) {
    now=$i
    split($i, tm, ":")
    now=tm[1]*3600+tm[2]*60+tm[3]
    status=$(i+1)
    host=$(i+2)
    if(lastTime[host])
        totalTime[host] += now - lastTime[host]
    lastTime[host]=now
    if(status == "DN") Failure[host]++
    Reading[host]++
  }
}
END {
    for(host in lastTime)
       if (Failure[host])
           if (Failure[host] == Reading[host])
               print host " = 0"
           else
               print host " = " totalTime[host] / Failure[host]
       else
           print host " = No Failures"
}' infile


Infile:
Code:
08:51:36 DN 127.0.0.1 08:51:36 UN 127.0.0.2 08:51:36 UN 127.0.0.3 08:53:50 DN 127.0.0.1 08:53:50 DN 127.0.0.2 08:53:50 UN 127.0.0.3
08:58:36 DN 127.0.0.1 08:58:36 DN 127.0.0.2 08:58:36 UN 127.0.0.2

Result:
Code:
127.0.0.1 = 0
127.0.0.2 = 210
127.0.0.3 = No Failures


Last edited by Chubler_XL; 07-12-2018 at 08:58 PM.. Reason: Host always down should have zero
This User Gave Thanks to Chubler_XL For This Post:
# 5  
Old 07-13-2018
Quote:
Originally Posted by Chubler_XL
How about this:

Code:
awk '
{
  for(i= 1; i< NF - 1; i+=3) {
    now=$i
    split($i, tm, ":")
    now=tm[1]*3600+tm[2]*60+tm[3]
    status=$(i+1)
    host=$(i+2)
    if(lastTime[host])
        totalTime[host] += now - lastTime[host]
    lastTime[host]=now
    if(status == "DN") Failure[host]++
    Reading[host]++
  }
}
END {
    for(host in lastTime)
       if (Failure[host])
           if (Failure[host] == Reading[host])
               print host " = 0"
           else
               print host " = " totalTime[host] / Failure[host]
       else
           print host " = No Failures"
}' infile

Infile:
Code:
08:51:36 DN 127.0.0.1 08:51:36 UN 127.0.0.2 08:51:36 UN 127.0.0.3 08:53:50 DN 127.0.0.1 08:53:50 DN 127.0.0.2 08:53:50 UN 127.0.0.3
08:58:36 DN 127.0.0.1 08:58:36 DN 127.0.0.2 08:58:36 UN 127.0.0.2

Result:
Code:
127.0.0.1 = 0
127.0.0.2 = 210
 127.0.0.3 = No Failures


Thank you so much for help.
Kind regards.
# 6  
Old 07-21-2018
Please, can you explain me why we have this value:


127.0.0.1 = 0

?
# 7  
Old 07-21-2018
You said you want the MTBF for each node. The node 127.0.0.1 was always down (for all three times it appeared in the data in post #4 and for both times it appeared in the data in post #1).

If a node is never up, isn't the mean time between failures zero? What value were you expecting?
This User Gave Thanks to Don Cragun For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

6 More Discussions You Might Find Interesting

1. What is on Your Mind?

OUTAGE: Data Center Problem Resolved.

There was a problem with our data center today, creating a site outage (server unreachable). That problem has been resolved. Basically, it seems to have been a socially engineered denial-of-service attack against UNIX.com; which I stopped as soon as I found out what the problem was. Total... (2 Replies)
Discussion started by: Neo
2 Replies

2. What is on Your Mind?

Resolved: Issue in Server Data Center

Dear All, There was a problem in the data center data, which caused the server to be unreachable for about an hour. Server logs show the server did not crash or go down. Hence, I assume there was a networking issue at the data center. Still waiting for final word on what happened. ... (4 Replies)
Discussion started by: Neo
4 Replies

3. What is on Your Mind?

Cut Over to New Data Center and Upgraded OS Done. :)

Three days ago we received an expected notice from our long time data center that they were going dark on Sept 12th. About one and a half hours ago, after three days of marathon work, I just cut over the unix.com to a new data center with a completely new OS and Ubuntu distribution. (22 Replies)
Discussion started by: Neo
22 Replies

4. Red Hat

Problem in RedHat Cluster Node while network Failure or in Hang mode

Hi, We are having many RedHat linux Server with Cluster facility for availability of service like HTTPD / MySQL. We face some issue while some issue related to power disturbance / fluctuation or Network failure. There is two Cluster Node configured in... (0 Replies)
Discussion started by: hirenkmistry
0 Replies

5. HP-UX

Need to set up a HP cluster system in a data center

What are the server requirements, Software requirements, Network requirements etc, Please help me.. as 'm new 'm unable to get things done @ my end alone. Please refrain from typing subjects completely in upper case letters to get more attention, ty. (5 Replies)
Discussion started by: Sounddappan
5 Replies

6. Virtualization and Cloud Computing

Cloud Enabling Computing for the Next Generation Data Center

Hear how the changing needs of massive scale-out computing is driving a transfomation in technology and learn how HP is supporting this new evolution of the web. More... (1 Reply)
Discussion started by: Linux Bot
1 Replies
Login or Register to Ask a Question