Dependability state model

A dependability state diagram is a method for modelling a system as a Markov chain. It is used in reliability engineering for availability and reliability analysis.{{cite book

| author = Bjarne E. Helvik

| title = Dependable Computing Systems and Communication Networks

| quote =

| publisher = Gnist Tapir

| year = 2007

| pages =

| url =

| doi =

}}

File:Dep-state-model.png

It consists of creating a finite-state machine which represent the different

states a system may be in. Transitions between states happen as a result of events from underlying Poisson processes with different intensities.

Example

File:Dep-state-model-example.png

A redundant computer system consist of identical two-compute nodes, which each fail with an intensity of \lambda. When failed, they are repaired one at the time by a single repairman with negative exponential distributed repair times with expectation \mu^{-1}.

  • state 0: 0 failed units, normal state of the system.
  • state 1: 1 failed unit, system operational.
  • state 2: 2 failed units. system not operational.

Intensities from state 0 and state 1 are 2\lambda, since each compute node has a failure intensity of \lambda. Intensity from state 1 to state 2 is \lambda.

Transitions from state 2 to state 1 and state 1 to state 0 represent the repairs of the compute nodes and have the intensity \mu, since only a single unit is repaired at the time.

= Availability =

The asymptotic availability, i.e. availability over a long period, of the system is equal to the probability that the model is in state 1 or state 2.

This is calculated by making a set of linear equations of the state transition and solving the linear system.

The matrix is constructed with a row for each state. In a row, the intensity into the state is set in the column with the same index, with a negative term.

: \mathbf{A_0} = \begin{bmatrix}

0 & -\mu & 0 \\

-\lambda & 0 & -\mu \\

0 & \lambda & 0

\end{bmatrix}.

The identities cells balance the sum of their column to 0:

: \mathbf{A_1} = \begin{bmatrix}

(\lambda) & -\mu & 0 \\

-\lambda & (\lambda+\mu) & -\mu \\

0 & -\lambda & (\mu) \\

\end{bmatrix}.

In addition the equality clause must be taken into account:

: \sum_n P_n = 1.

By solving this equation, the probability of being in state 1 or state 2 can be found, which

is equal to the long-term availability of the service.

= Reliability =

The reliability of the system is found by making the failure states absorbing, i.e. removing all outgoing state transitions.

For this system the function is:

:

R(t) = e^{-\lambda t} \,

Criticism

Finite state models of systems are subject to state explosion. To create

a realistic model of a system one ends up with a model with so many states that it is infeasible to solve or draw the model.

References