Resource Management on Distributed Systems

Name: Resource Management on Distributed Systems | Principles and Techniques
Brand: Wiley
Price: 107.99 EUR
Availability: OnlineOnly

Principles and Techniques

Shikharesh Majumdar(Author)

Wiley (Publisher)

1st Edition

Published on 6. September 2024

520 pages

E-Book

ePUB with Adobe-DRM

System requirements

978-1-119-91295-8 (ISBN)

€107.99incl. 7% vat

System requirements

for ePUB with Adobe-DRM

E-Book Single Licence

Available for download

Description

Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.

Alles über E-Books, Kopierschutz & Dateiformate finden Sie in unserem Info- & Hilfebereich.

Comprehensive guide to the principles, algorithms, and techniques underlying resource management for clouds, big data, and sensor-based systems

Resource Management on Distributed Systems provides helpful guidance by describing algorithms and techniques for managing resources on parallel and distributed systems, including grids, clouds, and parallel processing-based platforms for big data analytics.

The book focuses on four general principles of resource management and their impact on system performance, energy usage, and cost, including end-of-chapter exercises. The text includes chapters on sensors, autoscaling on clouds, complex event processing for streaming data, and data filtering techniques for big data systems.

The book also covers results of applying the discussed techniques on simulated as well as real systems (including clouds and big data processing platforms), and techniques for handling errors associated with user predicted task execution times.

Written by a highly qualified academic with significant research experience in the field, Resource Management on Distributed Systems includes information on sample topics such as:

Attributes of parallel/distributed applications that have an intimate relationship with system behavior and performance, plus their related performance metrics.
Handling a lack of a prior knowledge of local operating systems on individual nodes in a large system.
Detection and management of complex events (that correspond to the occurrence of multiple raw events) on a platform for streaming analytics.
Techniques for reducing data latency for multiple operator-based queries in an environment processing large textual documents.

With comprehensive coverage of core topics in the field, Resource Management on Distributed Systems is a comprehensive guide to resource management in a single publication and is an essential read for professionals, researchers and students working with distributed systems.

More details

Other editions

Person

Content

About the Author xv

Preface xvii

Acknowledgments xxiii

1 Introduction 1

1.1 Introduction to Distributed and Parallel Computing 1

1.2 Types of Computing Environments 2

1.3 Units of Computation 3

1.4 Principles Underlying Resource Management 5

1.5 Evolution of Distributed Systems 9

1.6 Summary 20

2 Characterization of Parallelism in Applications 25

2.1 Introduction 25

2.2 The Precedence Graph Model 26

2.3 Graph-Based Characteristics 27

2.4 Single-Point Characteristics 30

2.5 Performance Metrics 32

2.6 Impact of Parallelism Characteristics on Performance 36

2.7 Energy Performance Trade-Off 39

2.8 Summary 47

3 Resource Management Techniques for Distributed Computing Systems 51

3.1 Resource Allocation 52

3.2 Task/Process Scheduling 62

3.3 Grid Scheduling with Deadlines 67

3.4 Scheduling on Client-Server Systems 70

3.5 Summary 74

4 Resource Management on Systems Subjected to Uncertainties Associated with Workload and System Parameters 79

4.1 Introduction 79

4.2 Handling Errors Associated with User Estimates of Job Execution Times 80

4.3 Underestimation of Job Execution Times 82

4.4 Handling Uncertainties Associated with the Local Scheduling Policy 86

4.5 Any Schedulability Criterion 88

4.6 Matchmaking in the Dark: AS Criterion-Based Matchmaking 91

4.7 Soft Advance Reservation Requests 96

4.8 Summary 100

5 Resource Auto-Scaling 105

5.1 Introduction 105

5.2 Request Characteristics 107

5.3 Horizontal Auto-Scaling 108

5.4 Hybrid Auto-Scaling 117

5.5 Summary 120

6 Resource Management for Systems Running MapReduce Jobs 127

6.1 Introduction 127

6.2 MapReduce 128

6.3 Resource Management Techniques for MapReduce Job Requests to be Satisfied on a Best Effort Basis 130

6.4 Resource Management Techniques for MapReduce Job Requests with Service Level Agreements 132

6.5 The Constraint Programming-Based MapReduce Resource Management Technique 135

6.6 Errors Associated with User Estimates of Task Execution times 141

6.7 Summary 145

7 Energy Aware Resource Management 149

7.1 Introduction 149

7.2 DVFS-Based Resource Management Techniques 152

7.3 The EAMR-RM Algorithm 153

7.4 Configurable Resource Manager for Processing a Batch of MapReduce Jobs 158

7.5 Performance Analysis of CRM 161

7.6 Reducing the Number of Active Servers 164

7.7 Summary 166

8 Streaming Data and Complex Event Processing 169

8.1 Introduction 169

8.2 Management of Streaming Data 170

8.3 Dynamic Priority-Based Scheduling 171

8.4 Data-Driven Priority Scheduler (DDPS) 174

8.5 Multitennant Systems 179

8.6 Complex Event Processing 186

8.7 Remote Patient Monitoring System 191

8.8 Summary 196

9 Data Indexing and Filtering Techniques for Big Data Systems 201

9.1 Introduction 201

9.2 Harnessing Big Data 202

9.3 Data Indexing 203

9.4 Inverted Index 203

9.5 Graph-Based Indexing 205

9.6 Boolean AND Queries 207

9.7 Performance Analysis 210

9.8 Data Filtering 213

9.9 Parallel Processing Platforms 214

9.10 Motivations for Data Reduction 217

9.11 Data Filtering 219

9.12 Performance Analysis 224

9.13 Streaming Data 227

9.14 Handling User Preferences Comprising Keywords Connected by Boolean Operators 230

9.15 Summary 232

10 Sensor-Based Systems 237

10.1 Introduction 237

10.2 Middleware Services 239

10.3 Sensor-Based Bridge Management 242

10.4 Research Collaboration Platform for Management of Sensor-Based Smart Facilities 244

10.5 Resource Management on Wireless Sensor Networks 247

10.6 Scheduling on WSNs 250

10.7 Sensor Allocation 255

10.8 Summary 262

11 Summary 267

11.1 Chapter Entitled Introduction 267

11.2 Chapter Entitled Characterization of Parallelism in Applications 270

11.2.1 Graph-Based Characteristics 270

11.3 Chapter Entitled Resource Management Techniques for Distributed Computing Systems 271

11.4 Chapter Entitled Resource Management on Systems Subjected to Uncertainties Associated with Workload and System Parameters 272

11.5 Chapter Entitled Resource Auto-Scaling 274

11.6 Chapter Entitled Resource Management on Systems Running MapReduce Jobs 276

11.7 Chapter Entitled Energy-Aware Resource Management 278

11.8 Chapter Entitled Streaming Data and Complex Event Processing 279

11.9 Chapter Entitled Data Indexing and Filtering Techniques for Big Data Systems 281

11.10 Chapter Entitled Sensor-Based Systems 282

Index 285

Preface

The availability of processors, memory, and high-speed interconnection networks at a reasonable cost is continuously increasing the use of parallel and distributed systems. Appropriate management of resources is crucial, however, for effectively harnessing the power of the underlying resource pool. While resource management techniques on conventional single processor systems is covered in many standard operating systems books, there is comparatively less coverage of resource management on parallel and distributed systems in the currently existing books. This book aims at addressing this gap and describing algorithms and techniques for managing resources on parallel and distributed systems, including grids, clouds, smart facilities, and parallel processing-based platforms for big data analytics.

The book focuses on resource management on distributed systems, which is of interest to students, researchers, and industrial practitioners who work with systems comprising multiple resources that may range from clusters to clouds to smart facilities. In addition to a discussion of existing knowledge in the area, the book includes material based on research results that have made significant contributions to the state of the art. The book describes key concepts as well as summarizes research results. The key features of the book include the following:

An introduction to five general principles of resource management that will be adapted by the techniques described in the following chapters.
A description of the resource management techniques and algorithms with a discussion of their impact on system performance, energy usage, and cost. When applicable, attention is paid to the trade-offs among these three characteristics. Special techniques for achieving system scalability are discussed.
Results of applying the techniques on simulated as well as real systems (including clouds and big data processing platforms).
In addition to data processing on cloud environments topics that include the following are discussed:
1. Big data platforms and frameworks, e.g. MapReduce.
2. Sensor-based smart systems that are becoming an important component of a smart society.
Research results that include the description of the different experiments and pointers to research papers that provide supporting documentation for the research described. Insights into system behavior and performance resulting from these research results are provided.

The organization of the book is influenced by many years of teaching graduate courses in distributed systems in general and resource management in particular, as well as my research performed in the area. Examples and exercises are included in appropriate sections of the book.

Book Contents

This book concerns resource management in distributed systems that can be classified into two categories: computing-intensive systems and data-intensive systems. The book has two parts, each focusing on a particular type of distributed systems. Part 1 (Chapters 2-5) focuses on issues underlying distributed computing-intensive systems, whereas Part 2 (Chapters 6-10) is concerned with distributed data-intensive systems.

Basics: Introductory material, including definitions of basic units of distributed computations and their characteristics, are discussed in Chapters 1 and 2.

Chapter 1 describes the evolution of distributed systems from nodes communicating with one another using basic communication mechanisms such as remote procedure calls (RPCs) to clusters, grids, and clouds, including edge computing systems and smart facilities. The basic units of computation used by various applications such as threads and processes are introduced. Three types of resource management operations performed on these computation units that include allocation and sheduling are described. General principles underlying resource management that form the backbone of a number of resource management techniques described in the later chapters are introduced.

Chapter 2 focuses on the characterization of parallelism in applications. Both graph-based characteristics as well as single-point characteristics are introduced. These application attributes have an intimate relationship with the execution behavior and performance of the system running the application and are thus important in the context of resource management. Performance metrics that can be used for analyzing the performance of resource management algorithms are described in this chapter. Energy consumed by computation and data-intensive applications is often of critical concern. The later part of the chapter introduces energy-related metrics and characteristics and discusses their interrelationship. The trade-off between energy consumption and performance is discussed. Energy-aware resource management is the subject of discussion in a later chapter.

Allocation and Scheduling: Resource allocation and scheduling are two important resource management operations that are discussed in Chapter 3. The resource allocator maps the application work units to processing resources and determines which process/thread will be executing on which processor. Well-known results in the area are discussed. The chapter discusses both optimal algorithms as well as techniques for devising heuristic resource allocation techniques. The discussion of resource allocation is followed by a discussion on process/task scheduling. The task scheduler decides the order in which computation units allocated to a processor will execute. Scheduling of tasks for an application with service level agreements that include deadlines for completion is considered. Scheduling algorithms for both single processor systems as well as systems with multiple processors are described. Analyses of performance of these algorithms are presented. The chapter ends with a discussion of scheduling techniques for client server systems.

Handling uncertainties in allocation and scheduling: Real systems are often characterized by uncertainties associated with system and workload parameters. Chapter 4 focuses on systems with such uncertainties and describes how to build in robustness into resource management techniques to mitigate their adverse impact on system performance. Two types of uncertainties are discussed. The first results from the errors associated with user-predicted task execution times that are often specified as part of a service level agreement (SLA). Techniques for handling both underestimation and overestimation of task execution times are discussed. Analyses of performance of the techniques are presented. A cloud data center often comprises hundreds and thousands of computing resources that are susceptible to changes with time and the exact local scheduling policy used by each resource may not always be available a priori to the resource manager for the data center. Techniques for resource management to handle this second type of uncertainty associated with the knowledge of the local scheduling policies used at the various resources are described. Performance analyses of these techniques referred to as techniques for resource management "in the dark" are presented.

Handling changes in workload in resource allocation: Determining the number of resources to provision for a given workload is a complex undertaking. The problem is further complicated by dynamic changes in workload that are typically the case on a distributed system used by multiple users. Chapter 5 addresses the problem of dynamically controlling the number of CPU resources by using resource auto-scaling. System capacity is increased or decreased automatically by the auto-scaling algorithm so that client satisfaction is met while keeping the cost of resource usage under control. The concepts of both vertical auto-scaling in which the CPU power and memory capacity of a given resource are controlled in accordance with the system load, and horizontal auto-scaling that increases or decreases the number of computing resources in accordance with the change in system workload are introduced. Most of the chapter concerns horizontal auto-scaling techniques. Three types of horizontal auto-scaling are discussed: (i) reactive auto-scaling for which a change to the number of resources is made after a change has occurred to the system workload; (ii) proactive auto-scaling for which the future system workload is predicted and the change in the number of resources to handle a change in this future workload is computed proactively before the said change in workload intensity occurs; and (iii) hybrid auto-scaling that is a combination of reactive and proactive auto-scaling. Performance analysis for each technique is reported.

Data-Intensive Systems and MapReduce: Part 2 of the book that concerns data-intensive distributed systems starts with Chapter 6, which focuses on platforms running MapReduce jobs that are used in big data analytics as well as for other data-intensive applications. The chapter describes techniques for allocation and scheduling for MapReduce jobs associated with SLAs that include job completion deadlines. Two resource management algorithms, a budget-based algorithm and a constraint programming-based algorithm, are discussed. The SLA associated with a job includes user estimates of task execution times that are often subject to error. Two techniques for handling such errors and increasing the robustness of resource management are described. The chapter includes a thorough discussion of the performance of the various...

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Resource Management on Distributed Systems

Description

More details

Other editions

Additional editions

Person

Content

Preface

Book Contents

System requirements