Model-Based Reinforcement Learning

Name: Model-Based Reinforcement Learning | From Data to Continuous Actions with a Python-based Toolbox
Brand: Wiley
Price: 107.99 EUR
Availability: OnlineOnly

From Data to Continuous Actions with a Python-based Toolbox

Milad Farsi Jun Liu(Author)

Wiley (Publisher)

1st Edition

Published on 2. December 2022

272 pages

E-Book

ePUB with Adobe-DRM

System requirements

978-1-119-80859-6 (ISBN)

€107.99incl. 7% vat

System requirements

for ePUB with Adobe-DRM

E-Book Single Licence

Available for download

Description

Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.

Alles über E-Books, Kopierschutz & Dateiformate finden Sie in unserem Info- & Hilfebereich.

Model-Based Reinforcement Learning

Explore a comprehensive and practical approach to reinforcement learning

Reinforcement learning is an essential paradigm of machine learning, wherein an intelligent agent performs actions that ensure optimal behavior from devices. While this paradigm of machine learning has gained tremendous success and popularity in recent years, previous scholarship has focused either on theory--optimal control and dynamic programming - or on algorithms--most of which are simulation-based.

Model-Based Reinforcement Learning provides a model-based framework to bridge these two aspects, thereby creating a holistic treatment of the topic of model-based online learning control. In doing so, the authors seek to develop a model-based framework for data-driven control that bridges the topics of systems identification from data, model-based reinforcement learning, and optimal control, as well as the applications of each. This new technique for assessing classical results will allow for a more efficient reinforcement learning system. At its heart, this book is focused on providing an end-to-end framework--from design to application--of a more tractable model-based reinforcement learning technique.

Model-Based Reinforcement Learning readers will also find:

* A useful textbook to use in graduate courses on data-driven and learning-based control that emphasizes modeling and control of dynamical systems from data

* Detailed comparisons of the impact of different techniques, such as basic linear quadratic controller, learning-based model predictive control, model-free reinforcement learning, and structured online learning

* Applications and case studies on ground vehicles with nonholonomic dynamics and another on quadrator helicopters

* An online, Python-based toolbox that accompanies the contents covered in the book, as well as the necessary code and data

Model-Based Reinforcement Learning is a useful reference for senior undergraduate students, graduate students, research assistants, professors, process control engineers, and roboticists.

More details

Other editions

Persons

Content

About the Authors xi

Preface xiii

Acronyms xv

Introduction xvii

1 Nonlinear Systems Analysis 1

1.1 Notation 1

1.2 Nonlinear Dynamical Systems 2

1.2.1 Remarks on Existence, Uniqueness, and Continuation of Solutions 2

1.3 Lyapunov Analysis of Stability 3

1.4 Stability Analysis of Discrete Time Dynamical Systems 7

1.5 Summary 10

Bibliography 10

2 Optimal Control 11

2.1 Problem Formulation 11

2.2 Dynamic Programming 12

2.2.1 Principle of Optimality 12

2.2.2 Hamilton-Jacobi-Bellman Equation 14

2.2.3 A Sufficient Condition for Optimality 15

2.2.4 Infinite-Horizon Problems 16

2.3 Linear Quadratic Regulator 18

2.3.1 Differential Riccati Equation 18

2.3.2 Algebraic Riccati Equation 23

2.3.3 Convergence of Solutions to the Differential Riccati Equation 26

2.3.4 Forward Propagation of the Differential Riccati Equation for Linear Quadratic Regulator 28

2.4 Summary 30

Bibliography 30

3 Reinforcement Learning 33

3.1 Control-Affine Systems with Quadratic Costs 33

3.2 Exact Policy Iteration 35

3.2.1 Linear Quadratic Regulator 39

3.3 Policy Iteration with Unknown Dynamics and Function Approximations 41

3.3.1 Linear Quadratic Regulator with Unknown Dynamics 46

3.4 Summary 47

Bibliography 48

4 Learning of Dynamic Models 51

4.1 Introduction 51

4.1.1 Autonomous Systems 51

4.1.2 Control Systems 51

4.2 Model Selection 52

4.2.1 Gray-Box vs. Black-Box 52

4.2.2 Parametric vs. Nonparametric 52

4.3 Parametric Model 54

4.3.1 Model in Terms of Bases 54

4.3.2 Data Collection 55

4.3.3 Learning of Control Systems 55

4.4 Parametric Learning Algorithms 56

4.4.1 Least Squares 56

4.4.2 Recursive Least Squares 57

4.4.3 Gradient Descent 59

4.4.4 Sparse Regression 60

4.5 Persistence of Excitation 60

4.6 Python Toolbox 61

4.6.1 Configurations 62

4.6.2 Model Update 62

4.6.3 Model Validation 63

4.7 Comparison Results 64

4.7.1 Convergence of Parameters 65

4.7.2 Error Analysis 67

4.7.3 Runtime Results 69

4.8 Summary 73

Bibliography 75

5 Structured Online Learning-Based Control of Continuous-Time Nonlinear Systems 77

5.1 Introduction 77

5.2 A Structured Approximate Optimal Control Framework 77

5.3 Local Stability and Optimality Analysis 81

5.3.1 Linear Quadratic Regulator 81

5.3.2 SOL Control 82

5.4 SOL Algorithm 83

5.4.1 ODE Solver and Control Update 84

5.4.2 Identified Model Update 85

5.4.3 Database Update 85

5.4.4 Limitations and Implementation Considerations 86

5.4.5 Asymptotic Convergence with Approximate Dynamics 87

5.5 Simulation Results 87

5.5.1 Systems Identifiable in Terms of a Given Set of Bases 88

5.5.2 Systems to Be Approximated by a Given Set of Bases 91

5.5.3 Comparison Results 98

5.6 Summary 99

Bibliography 99

6 A Structured Online Learning Approach to Nonlinear Tracking with Unknown Dynamics 103

6.1 Introduction 103

6.2 A Structured Online Learning for Tracking Control 104

6.2.1 Stability and Optimality in the Linear Case 108

6.3 Learning-based Tracking Control Using SOL 111

6.4 Simulation Results 112

6.4.1 Tracking Control of the Pendulum 113

6.4.2 Synchronization of Chaotic Lorenz System 114

6.5 Summary 115

Bibliography 118

7 Piecewise Learning and Control with Stability Guarantees 121

7.1 Introduction 121

7.2 Problem Formulation 122

7.3 The Piecewise Learning and Control Framework 122

7.3.1 System Identification 123

7.3.2 Database 124

7.3.3 Feedback Control 125

7.4 Analysis of Uncertainty Bounds 125

7.4.1 Quadratic Programs for Bounding Errors 126

7.5 Stability Verification for Piecewise-Affine Learning and Control 129

7.5.1 Piecewise Affine Models 129

7.5.2 MIQP-based Stability Verification of PWA Systems 130

7.5.3 Convergence of ACCPM 133

7.6 Numerical Results 134

7.6.1 Pendulum System 134

7.6.2 Dynamic Vehicle System with Skidding 138

7.6.3 Comparison of Runtime Results 140

7.7 Summary 142

Bibliography 143

8 An Application to Solar Photovoltaic Systems 147

8.1 Introduction 147

8.2 Problem Statement 150

8.2.1 PV Array Model 151

8.2.2 DC-D C Boost Converter 152

8.3 Optimal Control of PV Array 154

8.3.1 Maximum Power Point Tracking Control 156

8.3.2 Reference Voltage Tracking Control 162

8.3.3 Piecewise Learning Control 164

8.4 Application Considerations 165

8.4.1 Partial Derivative Approximation Procedure 165

8.4.2 Partial Shading Effect 167

8.5 Simulation Results 170

8.5.1 Model and Control Verification 173

8.5.2 Comparative Results 174

8.5.3 Model-Free Approach Results 176

8.5.4 Piecewise Learning Results 178

8.5.5 Partial Shading Results 179

8.6 Summary 182

Bibliography 182

9 An Application to Low-level Control of Quadrotors 187

9.1 Introduction 187

9.2 Quadrotor Model 189

9.3 Structured Online Learning with RLS Identifier on Quadrotor 190

9.3.1 Learning Procedure 191

9.3.2 Asymptotic Convergence with Uncertain Dynamics 195

9.3.3 Computational Properties 195

9.4 Numerical Results 197

9.5 Summary 201

Bibliography 201

10 Python Toolbox 205

10.1 Overview 205

10.2 User Inputs 205

10.2.1 Process 206

10.2.2 Objective 207

10.3 SOL 207

10.3.1 Model Update 208

10.3.2 Database 208

10.3.3 Library 210

10.3.4 Control 210

10.4 Display and Outputs 211

10.4.1 Graphs and Printouts 213

10.4.2 3D Simulation 213

10.5 Summary 214

Bibliography 214

A Appendix 215

A.1 Supplementary Analysis of Remark 5.4 215

A.2 Supplementary Analysis of Remark 5.5 222

Index 223

Introduction

I.1 Background and Motivation

I.1.1 Lack of an Efficient General Nonlinear Optimal Control Technique

Optimal control theory plays an important role in designing effective control systems. For linear systems, a class of optimal control problems are solved successfully under the framework of Linear Quadratic Regulator (LQR). LQR problems are concerned with minimizing a quadratic cost for linear systems in terms of the control input and state, solving which allows us to regulate the state and the control input of the system. In control systems applications, this provides an opportunity to specifically regulate the behavior of the system by adjusting the weighting coefficients used in the cost functional. However, when it turns to nonlinear dynamical systems, there is no systematic method for efficiently obtaining an optimal feedback control for the general nonlinear systems. Thus, many of the techniques available in the literature on linear systems do not apply in general.

Despite the complexity of nonlinear dynamical systems, they have attracted much attention from researchers in recent years. This is mostly because of their practical benefits in establishing a wide variety of applications in engineering, including power electronics, flight control, and robotics, among many others. Considering the control of a general nonlinear dynamical system, optimal control involves finding a control input that minimizes a cost functional that depends on the controlled state trajectory and the control input. While such a problem formulation can cover a wide range of applications, how to efficiently solve such problems remains a topic of active research.

I.1.2 Importance of an Optimal Feedback Control

In general, there exist two well-known approaches to solving such optimal control problems: the maximum (or minimum) principles [Pontryagin, 1987] and the Dynamic Programming (DP) method [Bellman and Dreyfus, 1962]. To solve an optimization problem that involves dynamics, maximum principles require us to solve a two-point boundary value problem, where the solution is not in a feedback form.

There exist plenty of numerical techniques presented in the literature to solve the optimal control problem. Such approaches generally rely on knowledge of the exact model of the system. In the case where such a model exists, the optimal control input is obtained in the open-loop form as a time-dependent signal. Consequently, implementing these approaches in real-world problems often involves many complications that are well known by the control community. This is because of the model mismatch, noises, and disturbances that greatly affect the online solution, causing it to diverge from the preplanned offline solution. Therefore, obtaining a closed-loop solution for the optimal control problem is often preferred in such applications.

The DP approach analytically results in a feedback control for linear systems with a quadratic cost. Moreover, employing the Hamilton-Jacobi-Bellman (HJB) equation with a value function, one might manage to derive an optimal feedback control rule for some real-world applications, provided that the value function can be updated in an efficient manner. This motivates us to consider conditions leading to an optimal feedback control rule that can be efficiently implemented in real-world problems.

I.1.3 Limits of Optimal Feedback Control Techniques

Consider an optimal control problem over an infinite horizon involving a nonquadratic performance measure. Using the idea of inverse optimal control, the cost functional can be then be evaluated in closed form as long as the running cost depends somehow on an underlying Lyapunov function by which the asymptotic stability of the nonlinear closed-loop system is guaranteed. Then it can be obtained that the Lyapunov function is indeed the solution of the steady-state HJB equation. Although such a formulation allows analytically obtaining an optimal feedback rule, choosing the proper performance measure may not be trivial. Moreover, from a practical point of view, because of the nonlinearity in the performance measure, it might cause unpredictable behavior.

A well-studied method for solving an optimal control problem online is employing a value function assuming a given policy. Then, for any state, the value function gives a measure of how good the state is by collecting the cost starting from that state while the policy is applied. If such a value function can be obtained, and the system model is known, the optimal policy is actually the one that takes the system in the direction by which the value decreases the most in the space of the states. Such Reinforcement Learning (RL) techniques, which are known as value-based methods, including the Value Iteration (VI) and the Policy Iteration (PI) algorithms, are shown to be effective in finite state and control spaces. However, the computations cannot efficiently scale with the size of the state and control spaces.

I.1.4 Complexity of Approximate DP Algorithms

One way of facilitating the computations regarding the value updates is employing an approximate scheme. This is done by parameterizing the value function and adjusting the parameters in the training process. Then, the optimal policy given by the value function is also parameterized and approximated accordingly. The complexity of any value update depends directly on the number of parameters employed, where one may try limiting the number of the parameters by sacrificing the optimality. Therefore, we are motivated to obtain a more efficient update rule for the value parameters, rather than limiting the number of the parameters. We achieve this by reformulating the problem with a quadratically parameterized value function.

Moreover, the classical VI algorithm does not explicitly use the system model for evaluating the policy. This benefits applications in that the full knowledge of the system dynamics is no longer required. However, online training with VI alone may take much longer time to converge, since the model only participates implicitly through the future state. Therefore, the learning process can be potentially accelerated by introducing the system model. Furthermore, this creates an opportunity for running a separate identifier unit, where the model obtained can be simulated offline to complete the training or can be used for learning optimal policies for different objectives.

It can be shown that the VI algorithm for linear systems results in a Lyapunov recursion in the policy evaluation step. Such a Lyapunov equation in terms of the system matrices can be efficiently solved. However, for the general nonlinear case, methods for obtaining an equivalent are not amenable to efficient solutions. Hence, we are motivated to investigate the possibility of acquiring an efficient update rule for nonlinear systems.

I.1.5 Importance of Learning-based Tracking Approaches

One of the most common problems in the control of dynamical systems is to track a desired reference trajectory, which is found in a variety of real-world applications. However, designing an efficient tracking controller using conventional methods often necessitates a thorough understanding of the model, as well as computations and considerations for each application. RL approaches, on the other hand, propose a more flexible framework that requires less information about the system dynamics. While this may create additional problems, such as safety or computing limits, there are already effective outcomes from the use of such approaches in real-world situations. Similar to regulation problems, the applications of tracking control can benefit from Model-based Reinforcement Learning (MBRL) that can handle the parameter updates more efficiently.

I.1.6 Opportunities for Obtaining a Real-time Control

In the approximate optimal control technique, employing a limited number of parameters can only yield a local approximation of the model and the value function. However, if an approximation within a larger domain is intended, a considerably higher number of parameters may be needed. As a result, the identification and the controller's complexity might be rather too high to be performed online in real-world applications. This convinces us to circumvent this constraint by considering a set of local simple learners instead, in a piecewise approach.

As mentioned, there exist already interesting real-world applications of MBRL. Motivated by this, in this monograph, we aim on introducing automated ways of solving optimal control problems that can replace the conventional controllers. Hence, detailed applications of the proposed approaches are included, which are demonstrated with numerical simulations.

I.1.7 Summary

The main motivation for this monograph can be summarized as follows:

Optimal control is highly favored, while there is no general analytical technique applicable to all nonlinear systems.
Feedback control techniques are known to be more robust and computationally efficient compared to the numerical techniques, especially in the continuous space.
The chance of obtaining a feedback control in closed form is low, and the known techniques are limited to...

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Model-Based Reinforcement Learning

Description

More details

Other editions

Additional editions

Persons

Content

Introduction

I.1 Background and Motivation

I.1.1 Lack of an Efficient General Nonlinear Optimal Control Technique

I.1.2 Importance of an Optimal Feedback Control

I.1.3 Limits of Optimal Feedback Control Techniques

I.1.4 Complexity of Approximate DP Algorithms

I.1.5 Importance of Learning-based Tracking Approaches

I.1.6 Opportunities for Obtaining a Real-time Control

I.1.7 Summary

System requirements