Distributed Systems

Name: Distributed Systems | Theory and Applications
Brand: Wiley
Price: 84.99 EUR
Availability: OnlineOnly

Theory and Applications

Ratan K. Ghosh Hiranmay Ghosh(Autor*in)

Wiley (Verlag)

1. Auflage

Erschienen am 7. Februar 2023

560 Seiten

E-Book

ePUB mit Adobe-DRM

Systemvoraussetzungen

978-1-119-82595-1 (ISBN)

84,99 €inkl. 7% MwSt.

Systemvoraussetzungen

für ePUB mit Adobe-DRM

E-Book Einzellizenz

Als Download verfügbar

Beschreibung

Weitere Details

Weitere Ausgaben

Personen

Inhalt

1
Introduction

A distributed system consists of many independent units, each performing a different function. The units work in coordination with each other to realize the system's goals. We find many examples of distributed systems in nature. For instance, a human body consists of several autonomous components such as eyes and ears, hands and legs, and other internal organs. Yet, coordinated by the brain, it behaves as a single coherent entity. Some distributed systems may have hierarchic organizations. For example, the coordinated interaction among human beings performing various roles realizes the goals of human society. We find such well-orchestrated activities in lower forms of animals too. For example, in a beehive an ensemble of bees exhibit coordinated and consistent social behaviors fulfilling their goals of foraging.

Inspired by nature, researchers have developed a distributed systems paradigm for solving complex multi-dimensional computation problems. This book aims to provide a narrative for the various aspects of distributed systems and the computational models for interactions at multiple levels of abstractions. We also describe the application of such models in realizing practical distributed systems. In our journey through the book, we begin with the low-level interaction of the system components to achieve performance through parallelism and concurrency. We progressively ascend to higher levels of abstractions to address the issues of knowledge, autonomy, and trust, which are essential for large distributed systems spanning multiple administrative domains.

1.1 Advantages of Distributed Systems

A distributed system offers many advantages. Let us illustrate them with a simple example. Figure 1.1 depicts a distributed system for evaluation of simple arithmetic expressions. The expression-evaluator in the system divides the problem into smaller tasks of multiplications and additions and engages other modules, namely, a set of adders and multipliers, to solve them. Hosting the modules on different computers connected over a network is possible. It schedules the activities of those modules and communicates the final result to the user. We can notice several advantages of a distributed computing even through this trivial example:

Figure 1.1 Illustrating distributed computing.

Performance enhancement: The system may engage multiple components to perform subtasks, e.g., multiplications, in parallel, resulting in performance improvement. However, the distribution of the components over multiple hardware elements causes increased communication overheads. So, an analysis of trade-off is necessary between parallel computation and communication.
Specialization and autonomy: Each module may be designed independently for performing a specific task, e.g., addition or multiplication. A component can implement any specific algorithm irrespective of the type of algorithms deployed in the other modules. So, localization of task-dependent knowledge and the local optimization of the modules for performance enhancements are possible. It simplifies the design of the system. The modules can even be implemented on disparate hardware and in different programming environments by various developers. A change in one module does not affect others, so long as the interfaces remain unchanged.
Geographic distribution and transparency: It is possible to locate the components on machines at various geographical locations and administrative domains. The geographical distribution of the components is generally transparent to the applications, introducing flexibility of dynamic redistribution. For example, the a piece of computation can be scheduled on a computing node that has the least load at a given point of time, and can be shifted to another node in case of a failure. It results in reuse and optimal utilization of the resources. As another example, the replicas of a storage system can be distributed across multiple geographical locations to guard against accidental data loss.
Dynamic binding and optimization: A distributed system can have a pool of similar computational resources, such as adders and multipliers. These resources may be dynamically associated with different computing problems at different points in time. Further, even similar resources, like the multipliers, may have different performance metrics, like speed and accuracy. The system can choose an optimal set of modules in a specific problem context. Such optimum and dynamic binding of the resources leads to improvement of overall system performance.
Fault tolerance: The availability of a pool of similar resources aids in fault tolerance in the system. If one of the system components fails, then the task can migrate to another component. The system can experience a graceful performance degradation in such cases, rather than a system failure.
Openness, scalability, and dynamic reconfigurability: A distributed system can be designed as an open system, where individual components can interact with a set of standard protocols. It facilitates the independent design of the components. Loose coupling between the system components helps in scalability. Further, we can replace deprecated components by new components without shutting down a system.

1.2 Defining Distributed Systems

Leslie Lamport's seminal work [Lamport 2019] laid down the theoretical foundations of time, clock, and event ordering in a distributed system. Lamport realized that the concept of sequential time and system state does not work in distributed systems. A failure in a distributed system is one of the toughest problems to understand. The failure is meaningful only in the context of time. Whether a computing system or a link has failed is indistinguishable from an unusually late response. Lamport recognized the importance of failure detection and recovery in a distributed system through the following famous quip [Malkh 2013]:

"A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable."

Understandably, fault tolerance [Neiger and Toueg 1988, Xiong et al. 2009], which includes detection of failures and recovery from faults, is a dominant area of research in distributed systems.

There are many technical-sounding definitions, but all seem to converge on the importance of fault tolerance in distributed systems. We plan to discuss fault tolerance in this book sometime later. However, to get a flavor of different ways of defining a distributed system, let us examine a few of those found in the literature [Kshemkalyani and Singhal 2011].

Definition 1.1 (Collection and coordination): A distributed system is a collection of computers not sharing a common memory or a common physical clock that communicates by messages over a communication network and where each computer has its memory and runs on its OS. Typically computers are semi-automatic, loosely coupled when they cooperate to address a problem collectively.

Definition 1.2 (Single system view): A collection of independent computers that appear to the users of the system as a single coherent computer.

Definition 1.3 (Collection): A term used to describe a wide range of computer systems from a weakly coupled system such as a wide area network to strongly coupled systems such local area network, to very strongly coupled multiprocessor systems.

The running idea behind all three definitions stated earlier is to capture certain basic characteristics of a distributed system; namely,

There is no common clock in a distributed system.
It consists of several networked autonomous computers, each having its clock, memory, and OS.
It does not have a shared memory.
The computers of a distributed can communicate and coordinate through message passing over network links.

However, we feel that the definitions are still inadequate in missing out on two key aspects of Lamport's observation of a distributed system. We propose the following new definition.

Definition 1.4 (Proposed definition): A distributed system consists of several independent, geographically dispersed, and networked computing elements such as computers, smartphones, sensors, actuators, and embedded electronic devices. These devices communicate among themselves through message passing to coordinate and cooperate in satisfying common computing goals, notwithstanding the occasional failures of a few links or devices.

The proposed definition covers the basic characteristics of a collection of networked computing devices. It indicates that a collections of independent components integrated as a unified system is a distributed system that

Subsumes Definitions 1.3 and 1.2,
Covers coordination aspect as in Definition 1.1,
Includes fault tolerance and message passing aspects of Lamport's observation.

1.3 Challenges of a Distributed System

Some of the well-understood bottlenecks for implementing a distributed system are the following:

Centralized algorithms: A single computer is responsible for program control decisions. These algorithms are suitable for client-server model of computation where a server may be overwhelmed by many...

Systemvoraussetzungen

Als PDF speichern Als Link merken

Distributed Systems

Beschreibung

Weitere Details

Weitere Ausgaben

Personen

Inhalt

1 Introduction

1.1 Advantages of Distributed Systems

1.2 Defining Distributed Systems

1.3 Challenges of a Distributed System

Systemvoraussetzungen

1
Introduction