Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Fault Tolerance - Advanced Operating Systems - Lecture Slides, Slides of Advanced Operating Systems

Islamic University of Science & Technology Advanced Operating Systems

Main points of this lecture are: Fault Tolerance, Dependable Systems, Types of Failures, Failure Models, Arbitrary Failure, Omission Failure, Failure Masking by Redundancy, Triple Modular Redundancy, Hierarchical Groups, Agreement in Faulty Systems

Typology: Slides

2012/2013

Uploaded on 04/23/2013

atasi 🇮🇳

4.6

(32)

134 documents

1 / 62

This page cannot be seen from the preview

Don't miss anything!

CIS 620

Advanced Operating Systems

Lecture 11 – Fault Tolerance

Docsity.com

Discover Slides of Advanced Operating Systems Islamic University of Science & Technology

Partial preview of the text

Download Fault Tolerance - Advanced Operating Systems - Lecture Slides and more Slides Advanced Operating Systems in PDF only on Docsity!

CIS 620

Advanced Operating Systems

Lecture 11 – Fault Tolerance

Fault Tolerance

Dependable systems have the following

requirements

Availability
Reliability
Safety
Maintainability

Failure Models

Different types of failures.

Type of failure Description Crash failure A server halts, but is working correctly until it halts

Omission failure Receive omission Send omission

A server fails to respond to incoming requests A server fails to receive incoming messages A server fails to send messages Timing failure A server's response lies outside the specified time interval

Response failure Value failure State transition failure

The server's response is incorrect The value of the response is wrong The server deviates from the correct flow of control Arbitrary failure A server may produce arbitrary responses at arbitrary times

Failure Masking by Redundancy

Triple modular redundancy.

Agreement in Faulty Systems

The Byzantine generals problem for 3 loyal generals and1 traitor. a) The generals announce their troop strengths (in units of 1 kilosoldiers). b) The vectors that each general assembles based on (a) c) The vectors that each general receives in step 3.

Agreement in Faulty Systems

The same as in previous slide, except now with 2 loyal generals and one traitor.

RPC Failures

Lost request message.
- This is easy if known. That is, if we are sure the request was lost.
- Also easy if idempotent and we think it might be lost. - Simply retransmit the request. - Assumes the client still knows the request.
Lost reply message.
- If it is known the reply was lost, have server retransmit.

RPC Failures

Assumes the server still has the reply.
How long should the server hold the reply?
- Wait forever for the reply to be ack'ed? No!
- Discard after "enough" time.
- Discard after we receive another request from this client.
- Ask the client if the reply was received.
- Keep resending reply.
What if we are not sure of whether we lost the request or the reply?
If the server is stateless, it doesn't know and the client can't tell!
If idempotent, simply retransmit the request.

RPC Failures

From databases, we get the idea of transactions and commits. - This really does solve the problem but is not cheap.
Fairly easy to get “at least once” (try request again if timer expires) or “at most once (give up if timer expires)” semantics. Hard to get “exactly once” without transactions.
To be more precise. A transaction either happens exactly once or not at all (sounds like at most once) and the client knows which.

RPC Failures

Client crashes
- Orphan computations exist.
- Again transactions work but are expensive.
- We can have the rebooted client start another epoch and all computations of previous epoch are killed and clients resubmit. - It is better is to let old computations with owners that can be found continue.
- This isn’t a great solution.

Basic Reliable-Multicasting

Schemes

A simple solution to reliable multicasting when all receivers are known and are assumed not to fail

a) Message transmission b) Reporting feedback Docsity.com

Nonhierarchical Feedback Control

Several receivers have scheduled a request for retransmission, but the first retransmission request leads to the suppression of others.

Virtual Synchrony

The logical organization of a distributed system to distinguish between message receipt and message delivery

Virtual Synchrony

The principle of virtual synchronous multicast.

Fault Tolerance - Advanced Operating Systems - Lecture Slides, Slides of Advanced Operating Systems

Related documents

Partial preview of the text

Download Fault Tolerance - Advanced Operating Systems - Lecture Slides and more Slides Advanced Operating Systems in PDF only on Docsity!

CIS 620

Advanced Operating Systems

Lecture 11 – Fault Tolerance

Fault Tolerance

requirements

Failure Models

Failure Masking by Redundancy

Agreement in Faulty Systems

Agreement in Faulty Systems

RPC Failures

RPC Failures

RPC Failures

RPC Failures

Basic Reliable-Multicasting

Schemes

Nonhierarchical Feedback Control

Virtual Synchrony

Virtual Synchrony