
Fault Tolerance in Distributed Systems
Pankaj Jalote(Author)
Prentice Hall (Publisher)
Published on 1. May 1994
Book
Paperback/Softback
448 pages
978-0-13-301367-2 (ISBN)
Description
Covers software fault tolerance with emphasis on distributed systems. Key topics covered include fail stop processors, stable storage, reliable communication, synchronized clocks and failure detection.
More details
Language
English
Place of publication
Upper Saddle River
United States
Publishing group
Pearson Education (US)
Target group
College/higher education
Dimensions
Height: 241 mm
Width: 184 mm
Thickness: 22 mm
Weight
830 gr
ISBN-13
978-0-13-301367-2 (9780133013672)
Schweitzer Classification
Content
1. Introduction.
Basic Concepts and Definitions. Phases in Fault Tolerance. Overview of Hardware Fault Tolerance. Reliability and Availability. Summary.
2. Distributed Systems.
System Model. Interprocess Communication. Ordering of Events and Logical Clocks. Execution Model and System State. Summary.
3. Basic Building Blocks.
Byzantine Agreement. Synchronized Clocks. Stable Storage. Fail Stop Processors. Failure Detection and Fault Diagnosis. Reliable Message Delivery. Summary.
4. Reliable, Atomic, and Causal Broadcast.
Reliable Broadcast. Atomic Broadcast. Causal Broadcast.
5. Recovering A Consistent State.
Asynchronous Checkpointing and Rollback. Distributed Checkpointing. Summary.
6. Atomic Actions.
Atomic Actions and Serializability. Atomic Actions in a Centralized System. Commit Protocols. Atomic Actions on Decentralized Data. Summary.
7. Data Replication And Resiliency.
Optimistic Approaches. Primary Site Approach. Resiliency with Active Replicas. Voting. Degree of Replication. Summary.
8. Process Resiliency.
Resilient Remote Procedure Call. Resiliency with Asynchronous Communication. Resiliency with Synchronous Message Passing. Total Failure and Last Process to Fail. Summary.
9. Software Design Faults.
Approaches for Uniprocess Software. Backward Recovery in Concurrent Systems. Forward Recovery in Concurrent Systems. Summary.
Bibliography.
Basic Concepts and Definitions. Phases in Fault Tolerance. Overview of Hardware Fault Tolerance. Reliability and Availability. Summary.
2. Distributed Systems.
System Model. Interprocess Communication. Ordering of Events and Logical Clocks. Execution Model and System State. Summary.
3. Basic Building Blocks.
Byzantine Agreement. Synchronized Clocks. Stable Storage. Fail Stop Processors. Failure Detection and Fault Diagnosis. Reliable Message Delivery. Summary.
4. Reliable, Atomic, and Causal Broadcast.
Reliable Broadcast. Atomic Broadcast. Causal Broadcast.
5. Recovering A Consistent State.
Asynchronous Checkpointing and Rollback. Distributed Checkpointing. Summary.
6. Atomic Actions.
Atomic Actions and Serializability. Atomic Actions in a Centralized System. Commit Protocols. Atomic Actions on Decentralized Data. Summary.
7. Data Replication And Resiliency.
Optimistic Approaches. Primary Site Approach. Resiliency with Active Replicas. Voting. Degree of Replication. Summary.
8. Process Resiliency.
Resilient Remote Procedure Call. Resiliency with Asynchronous Communication. Resiliency with Synchronous Message Passing. Total Failure and Last Process to Fail. Summary.
9. Software Design Faults.
Approaches for Uniprocess Software. Backward Recovery in Concurrent Systems. Forward Recovery in Concurrent Systems. Summary.
Bibliography.