









Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
An overview of fault tolerant systems, including definitions, types of faults and errors, and key ingredients for error processing and fault treatment. It covers both hardware and software faults, and discusses the need for fault tolerance in critical applications, harsh environments, and complex systems.
Typology: Lecture notes
1 / 16
This page cannot be seen from the preview
Don't miss anything!










Instructor: Engr. Nabiha Faisal
causes results in Fault Error Failure Fault – is a defect within the system Error – is observed by a deviation from the expected behaviour of the system Failure occurs when the system can no longer perform as required (does not meet spec) Fault Tolerance – is ability of system to provide a service, even in the presence of errors Terminology of Fault Tolerance
Types of Fault ( wr t attributes) Type of failure Description Crash failure Amnesia crash Pause crash Halting crash A server halts, but is working correctly until it halts Lost all history, must be reboot Still remember state before crash, can be recovered Hardware failure, must be replaced or re-installed Omission failure Receive omission Send omission A server fails to respond to incoming requests A server fails to receive incoming messages A server fails to send messages Timing failure A server's response lies outside the specified time interval Response failure Value failure State transition failure The server's response is incorrect The value of the response is wrong The server deviates from the correct flow of control Arbitrary failure A server may produce arbitrary responses at arbitrary times
FAULT TOLERANCE Error processing: error removal, before failure occurs Fault treatment: avoiding fault(s) to be activated again
ERROR PROCESSING Error recovery: errorfree state substituted to erroneous state Error detection: (^) identification of erroneous state(s) Backward recovery: system brought back in state visited before error occurrence Recovery points (checkpoint) Forward recovery:Erroneous state is discarded and correct one is determined Without losing any computation. Error diagnosis: damage assessment