Robust Adaptive Dynamic Programming

Name: Robust Adaptive Dynamic Programming
Brand: Wiley-IEEE Press
Price: 103.99 EUR
Availability: OnlineOnly

Hao Yu Zhong-Ping Jiang(Author)

Wiley-IEEE Press

Published on 25. April 2017

216 pages

E-Book

ePUB with Adobe-DRM

System requirements

978-1-119-13266-0 (ISBN)

€103.99incl. 7% vat

System requirements

for ePUB with Adobe-DRM

E-Book Single Licence

Available for download

Description

Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.

Alles über E-Books, Kopierschutz & Dateiformate finden Sie in unserem Info- & Hilfebereich.

A comprehensive look at state-of-the-art ADP theory and real-world applications This book fills a gap in the literature by providing a theoretical framework for integrating techniques from adaptive dynamic programming (ADP) and modern nonlinear control to address data-driven optimal control design challenges arising from both parametric and dynamic uncertainties. Traditional model-based approaches leave much to be desired when addressing the challenges posed by the ever-increasing complexity of real-world engineering systems. An alternative which has received much interest in recent years are biologically-inspired approaches, primarily RADP. Despite their growing popularity worldwide, until now books on ADP have focused nearly exclusively on analysis and design, with scant consideration given to how it can be applied to address robustness issues, a new challenge arising from dynamic uncertainties encountered in common engineering problems. Robust Adaptive Dynamic Programming zeros in on the practical concerns of engineers. The authors develop RADP theory from linear systems to partially-linear, large-scale, and completely nonlinear systems. They provide in-depth coverage of state-of-the-art applications in power systems, supplemented with numerous real-world examples implemented in MATLAB. They also explore fascinating reverse engineering topics, such how ADP theory can be applied to the study of the human brain and cognition. In addition, the book: * Covers the latest developments in RADP theory and applications for solving a range of systems' complexity problems * Explores multiple real-world implementations in power systems with illustrative examples backed up by reusable MATLAB code and Simulink block sets * Provides an overview of nonlinear control, machine learning, and dynamic control * Features discussions of novel applications for RADP theory, including an entire chapter on how it can be used as a computational mechanism of human movement control Robust Adaptive Dynamic Programming is both a valuable working resource and an intriguing exploration of contemporary ADP theory and applications for practicing engineers and advanced students in systems theory, control engineering, computer science, and applied mathematics.

More details

Other editions

Persons

Content

ABOUT THE AUTHORS xi PREFACE AND ACKNOWLEDGMENTS xiii ACRONYMS xvii GLOSSARY xix 1 INTRODUCTION 1 1.1 From RL to RADP 1 1.2 Summary of Each Chapter 5 References 6 2 ADAPTIVE DYNAMIC PROGRAMMING FOR UNCERTAIN LINEAR SYSTEMS 11 2.1 Problem Formulation and Preliminaries 11 2.2 Online Policy Iteration 14 2.3 Learning Algorithms 16 2.4 Applications 24 2.5 Notes 29 References 30 3 SEMI-GLOBAL ADAPTIVE DYNAMIC PROGRAMMING 35 3.1 Problem Formulation and Preliminaries 35 3.2 Semi-Global Online Policy Iteration 38 3.3 Application 43 3.4 Notes 46 References 46 4 GLOBAL ADAPTIVE DYNAMIC PROGRAMMING FOR NONLINEAR POLYNOMIAL SYSTEMS 49 4.1 Problem Formulation and Preliminaries 49 4.2 Relaxed HJB Equation and Suboptimal Control 52 4.3 SOS-Based Policy Iteration for Polynomial Systems 55 4.4 Global ADP for Uncertain Polynomial Systems 59 4.5 Extension for Nonlinear Non-Polynomial Systems 64 4.6 Applications 70 4.7 Notes 81 References 81 5 ROBUST ADAPTIVE DYNAMIC PROGRAMMING 85 5.1 RADP for Partially Linear Composite Systems 86 5.2 RADP for Nonlinear Systems 97 5.3 Applications 103 5.4 Notes 109 References 110 6 ROBUST ADAPTIVE DYNAMIC PROGRAMMING FOR LARGE-SCALE SYSTEMS 113 6.1 Stability and Optimality for Large-Scale Systems 113 6.2 RADP for Large-Scale Systems 122 6.3 Extension for Systems with Unmatched Dynamic Uncertainties 124 6.4 Application to a Ten-Machine Power System 128 6.5 Notes 132 References 133 7 ROBUST ADAPTIVE DYNAMIC PROGRAMMING AS A THEORY OF SENSORIMOTOR CONTROL 137 7.1 ADP for Continuous-Time Stochastic Systems 138 7.2 RADP for Continuous-Time Stochastic Systems 143 7.3 Numerical Results: ADP-Based Sensorimotor Control 153 7.4 Numerical Results: RADP-Based Sensorimotor Control 165 7.5 Discussion 167 7.6 Notes 172 References 173 A BASIC CONCEPTS IN NONLINEAR SYSTEMS 177 A.1 Lyapunov Stability 177 A.2 ISS and the Small-Gain Theorem 178 B SEMIDEFINITE PROGRAMMING AND SUM-OF-SQUARES PROGRAMMING 181 B.1 SDP and SOSP 181 C PROOFS 183 C.1 Proof of Theorem 3.1.4 183 C.2 Proof of Theorem 3.2.3 186 References 188 INDEX 191

Chapter 1
Introduction

1.1 From RL to RADP

1.1.1 Introduction to RL

Reinforcement learning (RL) is originally observed from the learning behavior in humans and other mammals. The definition of RL varies in different literature. Indeed, learning a certain task through trial-and-error can be considered as an example of RL. In general, an RL problem requires the existence of an agent, that can interact with some unknown environment by taking actions, and receiving a reward from it. Sutton and Barto referred to RL as how to map situations to actions so as to maximize a numerical reward signal [47]. Apparently, maximizing a reward is equivalent to minimizing a cost, which is used more frequently in the context of optimal control [32]. In this book, a mapping between situations and actions is called a policy, and the goal of RL is to learn an optimal policy such that a predefined cost is minimized.

As a unique learning approach, RL does not require a supervisor to teach an agent to take the optimal action. Instead, it focuses on how the agent, through interactions with the unknown environment, should modify its own actions toward the optimal one (Figure 1.1). An RL iteration generally contains two major steps. First, the agent evaluates the cost under the current policy, through interacting with the environment. This step is known as policy evaluation. Second, based on the evaluated cost, the agent adopts a new policy aiming at further reducing the cost. This is the step of policy improvement.

Figure 1.1 Illustration of RL. The agent takes an action to interact with the unknown environment, and evaluates the resulting cost, based on which the agent can further improve the action to reduce the cost.

As an important branch in machine learning theory, RL has been brought to the computer science and control science literature as a way to study artificial intelligence in the 1960s [37, 38, 54]. Since then, numerous contributions to RL, from a control perspective, have been made (see, e.g., [2, 29, 33, 34, 46, 53, 56]). Recently, AlphaGo, a computer program developed by Google DeepMind, is able to improve itself through reinforcement learning and has beaten professional human Go players [44]. It is believed that significant attention will continuously be paid to the study of reinforcement learning, since it is a promising tool for us to better understand the true intelligence in human brains.

1.1.2 Introduction to DP

On the other hand, dynamic programming (DP) [4] offers a theoretical way to solve multistage decision-making problems. However, it suffers from the inherent computational complexity, also known as the curse of dimensionality [41]. Therefore, the need for approximative methods has been recognized as early as in the late 1950s [3]. In [15], an iterative technique called policy iteration (PI) was devised by Howard for Markov decision processes (MDPs). Also, Howard referred to the iterative method developed by Bellman [3, 4] as value iteration (VI). Computing the optimal solution through successive approximations, PI is closely related to learning methods. In 1968, Werbos pointed out that PI can be employed to perform RL [58]. Starting from then, many real-time RL methods for finding online optimal control policies have emerged and they are broadly called approximate/adaptive dynamic programming (ADP) [31, 33, 41, 43, 55, 60-65, 68], or neurodynamic programming [5]. The main feature of ADP [59, 61] is that it employs ideas from RL to achieve online approximation of the value function, without using the knowledge of the system dynamics.

1.1.3 The Development of ADP

The development of ADP theory consists of three phases. In the first phase, ADP was extensively investigated within the communities of computer science and operations research. PI and VI are usually employed as two basic algorithms. In [46], Sutton introduced the temporal difference method. In 1989, Watkins proposed the well-known Q-learning method in his PhD thesis [56]. Q-learning shares similar features with the action-dependent heuristic dynamic programming (ADHDP) scheme proposed by Werbos in [62]. Other related research work under a discrete time and discrete state-space Markov decision process framework can be found in [5, 6, 8, 9, 41, 42, 48, 47] and references therein. In the second phase, stability is brought into the context of ADP while real-time control problems are studied for dynamic systems. To the best of our knowledge, Lewis and his co-workers are the first who contributed to the integration of stability theory and ADP theory [33]. An essential advantage of ADP theory is that an optimal control policy can be obtained via a recursive numerical algorithm using online information without solving the Hamilton-Jacobi-Bellman (HJB) equation (for nonlinear systems) and the algebraic Riccati equation (ARE) (for linear systems), even when the system dynamics are not precisely known. Related optimal feedback control designs for linear and nonlinear dynamic systems have been proposed by several researchers over the past few years; see, for example, [7, 10, 39, 40, 50, 52, 66, 69]. While most of the previous work on ADP theory was devoted to discrete-time (DT) systems (see [31] and references therein), there has been relatively less research for the continuous-time (CT) counterpart. This is mainly because ADP is considerably more difficult for CT systems than for DT systems. Indeed, many results developed for DT systems [35] cannot be extended straightforwardly to CT systems. As a result, early attempts were made to apply Q-learning for CT systems via discretization technique [1, 11]. However, the convergence and stability analysis of these schemes are challenging. In [40], Murray et. al proposed an implementation method which requires the measurements of the derivatives of the state variables. As said previously, Lewis and his co-workers proposed the first solution to stability analysis and convergence proofs for ADP-based control systems by means of linear quadratic regulator (LQR) theory [52]. A synchronous policy iteration scheme was also presented in [49]. For CT linear systems, the partial knowledge of the system dynamics (i.e., the input matrix) must be precisely known. This restriction has been completely removed in [18]. A nonlinear variant of this method can be found in [22] and [23].

The third phase in the development of ADP theory is related to extensions of previous ADP results to nonlinear uncertain systems. Neural networks and game theory are utilized to address the presence of uncertainty and nonlinearity in control systems. See, for example, [14, 31, 50, 51, 57, 67, 69, 70]. An implicit assumption in these papers is that the system order is known and that the uncertainty is static, not dynamic. The presence of dynamic uncertainty has not been systematically addressed in the literature of ADP. By dynamic uncertainty, we refer to the mismatch between the nominal model (also referred to as the reduced-order system) and the real plant when the order of the nominal model is lower than the order of the real system. A closely related topic of research is how to account for the effect of unseen variables [60]. It is quite common that the full-state information is often missing in many engineering applications and only the output measurement or partial-state measurements are available. Adaptation of the existing ADP theory to this practical scenario is important yet non-trivial. Neural networks are sought for addressing the state estimation problem [12, 28]. However, the stability analysis of the estimator/controller augmented system is by no means easy, because the total system is highly interconnected and often strongly nonlinear. The configuration of a standard ADP-based control system is shown in Figure 1.2.

Figure 1.2 Illustration of the ADP scheme.

Our recent work [17, 19, 20, 21] on the development of robust ADP (for short, RADP) theory is exactly targeted at addressing these challenges.

1.1.4 What Is RADP?

RADP is developed to address the presence of dynamic uncertainty in linear and nonlinear dynamical systems. See Figure 1.3 for an illustration. There are several reasons for which we pursue a new framework for RADP. First and foremost, it is well known that building an exact mathematical model for physical systems often is a hard task. Also, even if the exact mathematical model can be obtained for some particular engineering and biological applications, simplified nominal models are often more preferable for system analysis and control synthesis than the original complex system model. While we refer to the mismatch between the simplified nominal model and the original system as dynamic uncertainty here, the engineering literature often uses the term of unmodeled dynamics instead. Second, the observation errors may often be captured by dynamic uncertainty. From the literature of modern nonlinear control [25, 26, 30], it is known that the presence of dynamic uncertainty makes the feedback control problem extremely challenging in the context of nonlinear systems. In order to broaden the application scope of ADP theory in the presence of dynamic uncertainty, our strategy is to integrate tools from nonlinear control theory, such as Lyapunov designs, input-to-state stability theory [45], and nonlinear small-gain techniques [27]. This way RADP becomes applicable to wide classes of uncertain dynamic systems with incomplete state information and unknown system...

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Robust Adaptive Dynamic Programming

Description

More details

Other editions

Additional editions

Persons

Content

Chapter 1
Introduction

1.1 From RL to RADP

1.1.1 Introduction to RL

1.1.2 Introduction to DP

1.1.3 The Development of ADP

1.1.4 What Is RADP?

System requirements

Schweitzer Fachinformationen

Robust Adaptive Dynamic Programming

Description

More details

Other editions

Additional editions

Persons

Content

Chapter 1 Introduction

1.1 From RL to RADP

1.1.1 Introduction to RL

1.1.2 Introduction to DP

1.1.3 The Development of ADP

1.1.4 What Is RADP?

System requirements

Chapter 1
Introduction