👤
Home Man
Search
Today's Posts
Register

BSD, Linux, and UNIX shell scripting — Post awk, bash, csh, ksh, perl, php, python, sed, sh, shell scripts, and other shell scripting languages questions here.

Failure rate of a node / Data center

Tags
awk, solved

👤 Login to reply

 
Thread Tools Search this Thread
# 1  
Old 07-11-2018
Failure rate of a node / Data center

Hi,

Please, i have a history of the state of each node in my data center. an history about the failure of my cluster (UN: node up, DN: node down).
Here is some lines of the history:



Code:
08:51:36 UN 127.0.0.1    08:51:36 UN 127.0.0.2    08:51:36 UN 127.0.0.3    08:53:50 DN 127.0.0.1    08:53:50 DN 127.0.0.2    08:53:50 DN 127.0.0.3


I'd like from this history, deduce the failure rate of each node. How can i do that please ? i have for example, to use AI technologies ML or i have to sum UN of each node and divide it on the number of line.
Thank you so much for help. Kind regards.
# 2  
Old 07-11-2018
What operating system are you using?

What shell are you using?

How do you expect to deduce a failure rate from a single point in time? Are you instead maybe looking for a percentage of network node failures at this point in time?

What output are you hoping to produce from the sample input you have provided?

What have you tried on your own to get the output you want?
# 3  
Old 07-12-2018
What operating system are you using?


Linux OS (Ubuntu distribution)


What shell are you using?

shell bash

How do you expect to deduce a failure rate from a single point in time? Are you instead maybe looking for a percentage of network node failures at this point in time?


This is only a simple example. I will generate an history of some days.


What output are you hoping to produce from the sample input you have provided?

The MTBF (mean-time-between-failures) of each node.


What have you tried on your own to get the output you want?
# 4  
Old 07-12-2018
How about this:

Code:
awk '
{
  for(i= 1; i< NF - 1; i+=3) {
    now=$i
    split($i, tm, ":")
    now=tm[1]*3600+tm[2]*60+tm[3]
    status=$(i+1)
    host=$(i+2)
    if(lastTime[host])
        totalTime[host] += now - lastTime[host]
    lastTime[host]=now
    if(status == "DN") Failure[host]++
    Reading[host]++
  }
}
END {
    for(host in lastTime)
       if (Failure[host])
           if (Failure[host] == Reading[host])
               print host " = 0"
           else
               print host " = " totalTime[host] / Failure[host]
       else
           print host " = No Failures"
}' infile


Infile:
Code:
08:51:36 DN 127.0.0.1 08:51:36 UN 127.0.0.2 08:51:36 UN 127.0.0.3 08:53:50 DN 127.0.0.1 08:53:50 DN 127.0.0.2 08:53:50 UN 127.0.0.3
08:58:36 DN 127.0.0.1 08:58:36 DN 127.0.0.2 08:58:36 UN 127.0.0.2

Result:
Code:
127.0.0.1 = 0
127.0.0.2 = 210
127.0.0.3 = No Failures


Last edited by Chubler_XL; 07-12-2018 at 07:58 PM.. Reason: Host always down should have zero
The Following User Says Thank You to Chubler_XL For This Useful Post:
chercheur111 (07-13-2018)
# 5  
Old 07-13-2018
Quote:
Originally Posted by Chubler_XL
How about this:

Code:
awk '
{
  for(i= 1; i< NF - 1; i+=3) {
    now=$i
    split($i, tm, ":")
    now=tm[1]*3600+tm[2]*60+tm[3]
    status=$(i+1)
    host=$(i+2)
    if(lastTime[host])
        totalTime[host] += now - lastTime[host]
    lastTime[host]=now
    if(status == "DN") Failure[host]++
    Reading[host]++
  }
}
END {
    for(host in lastTime)
       if (Failure[host])
           if (Failure[host] == Reading[host])
               print host " = 0"
           else
               print host " = " totalTime[host] / Failure[host]
       else
           print host " = No Failures"
}' infile

Infile:
Code:
08:51:36 DN 127.0.0.1 08:51:36 UN 127.0.0.2 08:51:36 UN 127.0.0.3 08:53:50 DN 127.0.0.1 08:53:50 DN 127.0.0.2 08:53:50 UN 127.0.0.3
08:58:36 DN 127.0.0.1 08:58:36 DN 127.0.0.2 08:58:36 UN 127.0.0.2

Result:
Code:
127.0.0.1 = 0
127.0.0.2 = 210
 127.0.0.3 = No Failures


Thank you so much for help.
Kind regards.
# 6  
Old 4 Weeks Ago
Please, can you explain me why we have this value:


127.0.0.1 = 0

?
# 7  
Old 4 Weeks Ago
You said you want the MTBF for each node. The node 127.0.0.1 was always down (for all three times it appeared in the data in post #4 and for both times it appeared in the data in post #1).

If a node is never up, isn't the mean time between failures zero? What value were you expecting?
The Following User Says Thank You to Don Cragun For This Useful Post:
chercheur111 (3 Weeks Ago)
👤 Login to reply

« Previous Thread | Next Thread »
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Problem in RedHat Cluster Node while network Failure or in Hang mode hirenkmistry Red Hat 0 05-06-2013 12:29 PM
Need to set up a HP cluster system in a data center Sounddappan HP-UX 5 06-14-2010 06:25 AM
Cloud Enabling Computing for the Next Generation Data Center Linux Bot Virtualization and Cloud Computing 1 05-28-2008 12:43 PM


All times are GMT -4. The time now is 08:13 PM.

Unix & Linux Forums Content Copyright©1993-2018. All Rights Reserved.
×
UNIX.COM Login
Username:
Password:  
Show Password