Resource Management on Distributed Systems

Name: Resource Management on Distributed Systems | Principles and Techniques
Brand: Wiley-Blackwell
Price: 103.99 EUR
Availability: OnlineOnly

Principles and Techniques

Shikharesh Majumdar(Autor*in)

Wiley-Blackwell (Verlag)

1. Auflage

Erschienen am 6. September 2024

520 Seiten

E-Book

ePUB mit Adobe-DRM

Systemvoraussetzungen

978-1-119-91295-8 (ISBN)

103,99 €inkl. 7% MwSt.

Systemvoraussetzungen

für ePUB mit Adobe-DRM

E-Book Einzellizenz

Als Download verfügbar

Beschreibung

Weitere Details

Weitere Ausgaben

Person

Inhalt

Preface

The availability of processors, memory, and high-speed interconnection networks at a reasonable cost is continuously increasing the use of parallel and distributed systems. Appropriate management of resources is crucial, however, for effectively harnessing the power of the underlying resource pool. While resource management techniques on conventional single processor systems is covered in many standard operating systems books, there is comparatively less coverage of resource management on parallel and distributed systems in the currently existing books. This book aims at addressing this gap and describing algorithms and techniques for managing resources on parallel and distributed systems, including grids, clouds, smart facilities, and parallel processing-based platforms for big data analytics.

The book focuses on resource management on distributed systems, which is of interest to students, researchers, and industrial practitioners who work with systems comprising multiple resources that may range from clusters to clouds to smart facilities. In addition to a discussion of existing knowledge in the area, the book includes material based on research results that have made significant contributions to the state of the art. The book describes key concepts as well as summarizes research results. The key features of the book include the following:

An introduction to five general principles of resource management that will be adapted by the techniques described in the following chapters.
A description of the resource management techniques and algorithms with a discussion of their impact on system performance, energy usage, and cost. When applicable, attention is paid to the trade-offs among these three characteristics. Special techniques for achieving system scalability are discussed.
Results of applying the techniques on simulated as well as real systems (including clouds and big data processing platforms).
In addition to data processing on cloud environments topics that include the following are discussed:
1. Big data platforms and frameworks, e.g. MapReduce.
2. Sensor-based smart systems that are becoming an important component of a smart society.
Research results that include the description of the different experiments and pointers to research papers that provide supporting documentation for the research described. Insights into system behavior and performance resulting from these research results are provided.

The organization of the book is influenced by many years of teaching graduate courses in distributed systems in general and resource management in particular, as well as my research performed in the area. Examples and exercises are included in appropriate sections of the book.

Book Contents

This book concerns resource management in distributed systems that can be classified into two categories: computing-intensive systems and data-intensive systems. The book has two parts, each focusing on a particular type of distributed systems. Part 1 (Chapters 2-5) focuses on issues underlying distributed computing-intensive systems, whereas Part 2 (Chapters 6-10) is concerned with distributed data-intensive systems.

Basics: Introductory material, including definitions of basic units of distributed computations and their characteristics, are discussed in Chapters 1 and 2.

Chapter 1 describes the evolution of distributed systems from nodes communicating with one another using basic communication mechanisms such as remote procedure calls (RPCs) to clusters, grids, and clouds, including edge computing systems and smart facilities. The basic units of computation used by various applications such as threads and processes are introduced. Three types of resource management operations performed on these computation units that include allocation and sheduling are described. General principles underlying resource management that form the backbone of a number of resource management techniques described in the later chapters are introduced.

Chapter 2 focuses on the characterization of parallelism in applications. Both graph-based characteristics as well as single-point characteristics are introduced. These application attributes have an intimate relationship with the execution behavior and performance of the system running the application and are thus important in the context of resource management. Performance metrics that can be used for analyzing the performance of resource management algorithms are described in this chapter. Energy consumed by computation and data-intensive applications is often of critical concern. The later part of the chapter introduces energy-related metrics and characteristics and discusses their interrelationship. The trade-off between energy consumption and performance is discussed. Energy-aware resource management is the subject of discussion in a later chapter.

Allocation and Scheduling: Resource allocation and scheduling are two important resource management operations that are discussed in Chapter 3. The resource allocator maps the application work units to processing resources and determines which process/thread will be executing on which processor. Well-known results in the area are discussed. The chapter discusses both optimal algorithms as well as techniques for devising heuristic resource allocation techniques. The discussion of resource allocation is followed by a discussion on process/task scheduling. The task scheduler decides the order in which computation units allocated to a processor will execute. Scheduling of tasks for an application with service level agreements that include deadlines for completion is considered. Scheduling algorithms for both single processor systems as well as systems with multiple processors are described. Analyses of performance of these algorithms are presented. The chapter ends with a discussion of scheduling techniques for client server systems.

Handling uncertainties in allocation and scheduling: Real systems are often characterized by uncertainties associated with system and workload parameters. Chapter 4 focuses on systems with such uncertainties and describes how to build in robustness into resource management techniques to mitigate their adverse impact on system performance. Two types of uncertainties are discussed. The first results from the errors associated with user-predicted task execution times that are often specified as part of a service level agreement (SLA). Techniques for handling both underestimation and overestimation of task execution times are discussed. Analyses of performance of the techniques are presented. A cloud data center often comprises hundreds and thousands of computing resources that are susceptible to changes with time and the exact local scheduling policy used by each resource may not always be available a priori to the resource manager for the data center. Techniques for resource management to handle this second type of uncertainty associated with the knowledge of the local scheduling policies used at the various resources are described. Performance analyses of these techniques referred to as techniques for resource management "in the dark" are presented.

Handling changes in workload in resource allocation: Determining the number of resources to provision for a given workload is a complex undertaking. The problem is further complicated by dynamic changes in workload that are typically the case on a distributed system used by multiple users. Chapter 5 addresses the problem of dynamically controlling the number of CPU resources by using resource auto-scaling. System capacity is increased or decreased automatically by the auto-scaling algorithm so that client satisfaction is met while keeping the cost of resource usage under control. The concepts of both vertical auto-scaling in which the CPU power and memory capacity of a given resource are controlled in accordance with the system load, and horizontal auto-scaling that increases or decreases the number of computing resources in accordance with the change in system workload are introduced. Most of the chapter concerns horizontal auto-scaling techniques. Three types of horizontal auto-scaling are discussed: (i) reactive auto-scaling for which a change to the number of resources is made after a change has occurred to the system workload; (ii) proactive auto-scaling for which the future system workload is predicted and the change in the number of resources to handle a change in this future workload is computed proactively before the said change in workload intensity occurs; and (iii) hybrid auto-scaling that is a combination of reactive and proactive auto-scaling. Performance analysis for each technique is reported.

Data-Intensive Systems and MapReduce: Part 2 of the book that concerns data-intensive distributed systems starts with Chapter 6, which focuses on platforms running MapReduce jobs that are used in big data analytics as well as for other data-intensive applications. The chapter describes techniques for allocation and scheduling for MapReduce jobs associated with SLAs that include job completion deadlines. Two resource management algorithms, a budget-based algorithm and a constraint programming-based algorithm, are discussed. The SLA associated with a job includes user estimates of task execution times that are often subject to error. Two techniques for handling such errors and increasing the robustness of resource management are described. The chapter includes a thorough discussion of the performance of the various...

Systemvoraussetzungen

Als PDF speichern Als Link merken