Failure Model

Basic idea

Classification of how processes and channels can fail, used to decide what an algorithm must tolerate. Three top-level classes: omission (didn’t do the thing), arbitrary/Byzantine (did the wrong thing), and timing (did the right thing at the wrong time).

Key facts

Failure Model

The failures in processes and channels are presented using the following taxonomy:

Omissions Failures

Omission failures refers to cases where a process or a communication channel fails to perform what is expected to do.

Communication omission failure could be to:

Arbitrary Failures (Byzantine failure)

Refers to any type of failure that can occur in a system. Could be due to:

Omission and arbitrary failures

Class of failureAffectsDescription
Fail-stopProcessProcess halts and remains halted. Other processes may detect this state
CrashProcessProcess halts and remains so. Other processes may not detect this state
OmissionChannelA message inserted in an outgoing message buffer never arrives at the other end’s incoming message buffer
Send-omissionProcessProcess attempts send but message not placed in outgoing buffer
Receive-omissionProcessMessage received in incoming buffer but process does not receive it
ArbitraryProcess or channel

Timing Failures

Occurs when time limits set on process execution time, message delivery time and clock rate drift. They are particularly relevant to synchronous systems and less relevant to asynchronous systems since the latter usually places no or less strict bounds on timing

Class of failureAffectsDescription
ClockProcessProcess’s local clock exceeds the bounds on its rate of drift from real time
PerformanceProcessProcess exceeds the bounds on the interval between two steps
PerformanceChannelA message’s transmission takes longer than the stated bound

Siblings