Human-Robot Interaction Control Using Reinforcement Learning

Name: Human-Robot Interaction Control Using Reinforcement Learning
Brand: Wiley
Price: 118.99 EUR
Availability: OnlineOnly

Wen Yu Adolfo Perrusquia(Autor*in)

Wiley (Verlag)

1. Auflage

Erschienen am 6. Oktober 2021

288 Seiten

E-Book

ePUB mit Adobe-DRM

Systemvoraussetzungen

978-1-119-78276-6 (ISBN)

118,99 €inkl. 7% MwSt.

Systemvoraussetzungen

für ePUB mit Adobe-DRM

E-Book Einzellizenz

Als Download verfügbar

Beschreibung

A comprehensive exploration of the control schemes of human-robot interactions

In Human-Robot Interaction Control Using Reinforcement Learning, an expert team of authors delivers a concise overview of human-robot interaction control schemes and insightful presentations of novel, model-free and reinforcement learning controllers. The book begins with a brief introduction to state-of-the-art human-robot interaction control and reinforcement learning before moving on to describe the typical environment model. The authors also describe some of the most famous identification techniques for parameter estimation.

Human-Robot Interaction Control Using Reinforcement Learning offers rigorous mathematical treatments and demonstrations that facilitate the understanding of control schemes and algorithms. It also describes stability and convergence analysis of human-robot interaction control and reinforcement learning based control.

The authors also discuss advanced and cutting-edge topics, like inverse and velocity kinematics solutions, H2 neural control, and likely upcoming developments in the field of robotics.

Readers will also enjoy:

* A thorough introduction to model-based human-robot interaction control

* Comprehensive explorations of model-free human-robot interaction control and human-in-the-loop control using Euler angles

* Practical discussions of reinforcement learning for robot position and force control, as well as continuous time reinforcement learning for robot force control

* In-depth examinations of robot control in worst-case uncertainty using reinforcement learning and the control of redundant robots using multi-agent reinforcement learning

Perfect for senior undergraduate and graduate students, academic researchers, and industrial practitioners studying and working in the fields of robotics, learning control systems, neural networks, and computational intelligence, Human-Robot Interaction Control Using Reinforcement Learning is also an indispensable resource for students and professionals studying reinforcement learning.

Weitere Details

Weitere Ausgaben

Personen

Inhalt

Author Biographies xi

List of Figures xiii

List of Tables xvii

Preface xix

Part I Human-robot Interaction Control 1

1 Introduction 3

1.1 Human-Robot Interaction Control 3

1.2 Reinforcement Learning for Control 6

1.3 Structure of the Book 7

References 10

2 Environment Model of Human-Robot Interaction 17

2.1 Impedance and Admittance 17

2.2 Impedance Model for Human-Robot Interaction 21

2.3 Identification of Human-Robot Interaction Model 24

2.4 Conclusions 30

References 30

3 Model Based Human-Robot Interaction Control 33

3.1 Task Space Impedance/Admittance Control 33

3.2 Joint Space Impedance Control 36

3.3 Accuracy and Robustness 37

3.4 Simulations 39

3.5 Conclusions 42

References 44

4 Model Free Human-Robot Interaction Control 45

4.1 Task-Space Control Using Joint-Space Dynamics 45

4.2 Task-Space Control Using Task-Space Dynamics 52

4.3 Joint Space Control 53

4.4 Simulations 54

4.5 Experiments 55

4.6 Conclusions 68

References 71

5 Human-in-the-loop Control Using Euler Angles 73

5.1 Introduction 73

5.2 Joint-Space Control 74

5.3 Task-Space Control 79

5.4 Experiments 83

5.5 Conclusions 92

References 94

Part II Reinforcement Learning for Robot Interaction Control 97

6 Reinforcement Learning for Robot Position/Force Control 99

6.1 Introduction 99

6.2 Position/Force Control Using an Impedance Model 100

6.3 Reinforcement Learning Based Position/Force Control 103

6.4 Simulations and Experiments 110

6.5 Conclusions 117

References 117

7 Continuous-Time Reinforcement Learning for Force Control 119

7.1 Introduction 119

7.2 K-means Clustering for Reinforcement Learning 120

7.3 Position/Force Control Using Reinforcement Learning 124

7.4 Experiments 130

7.5 Conclusions 136

References 136

8 Robot Control in Worst-Case Uncertainty Using Reinforcement Learning 139

8.1 Introduction 139

8.2 Robust Control Using Discrete-Time Reinforcement Learning 141

8.3 Double Q-Learning with k-Nearest Neighbors 144

8.4 Robust Control Using Continuous-Time Reinforcement Learning 150

8.5 Simulations and Experiments: Discrete-Time Case 154

8.6 Simulations and Experiments: Continuous-Time Case 161

8.7 Conclusions 170

References 170

9 Redundant Robots Control Using Multi-Agent Reinforcement Learning 173

9.1 Introduction 173

9.2 Redundant Robot Control 175

9.3 Multi-Agent Reinforcement Learning for Redundant Robot Control 179

9.4 Simulations and experiments 183

9.5 Conclusions 187

References 189

10 Robot H2 Neural Control Using Reinforcement Learning 193

10.1 Introduction 193

10.2 H2 Neural Control Using Discrete-Time Reinforcement Learning 194

10.3 H2 Neural Control in Continuous Time 207

10.4 Examples 219

10.5 Conclusion 229

References 229

11 Conclusions 233

A Robot Kinematics and Dynamics 235

A.1 Kinematics 235

A.2 Dynamics 237

A.3 Examples 240

References 246

B Reinforcement Learning for Control 247

B.1 Markov decision processes 247

B.2 Value functions 248

B.3 Iterations 250

B.4 TD learning 251

Reference 258

Index 259

1
Introduction

1.1 Human-Robot Interaction Control

If we know the robot dynamics, we can use them to design model-based controllers (See Figure 1.1). The famous linear controllers are: Proportional-Derivative (PD) [1], linear quadratic regulator (LQR), and Proportional-Integral-Derivative (PID) [2]. They use linear system theory, so the robot dynamics are required to be linearized at some point of operation. The LQR [3-5] control has been used as a basis for the design of reinforcement learning approaches [6].

The classic controllers use complete or partial knowledge of the robot's dynamics. In these cases (without considering disturbances), it is possible to design controllers that guarantee perfect tracking performance. By using the compensation or the pre-compensation techniques, the robot dynamics is canceled and establishes a simpler desired dynamics [7-9]. The control schemes with model compensation or pre-compensation in joint space can be seen in Figure 1.2. Here is the desired reference, is the robot's joint position, is the joint error, is the compensator or pre-compensator of the dynamics, is the control coming from the controller, and is the control torque. A typical model-compensation control is the proportional-derivative (PD) controller with gravity compensation, which helps to decrease the steady-state error caused by the gravity terms of the robot dynamics.

When we do not have exact knowledge of the dynamics, it is not possible to design the previous controllers. Therefore, we need to use model-free controllers. Some famous controllers are: PID control [10, 11], sliding mode control [2, 12], and neural control [13]. These controllers are tuned according to specific plant under certain conditions (disturbances, friction, parameters). When new conditions arise, the controllers do not display the same behavior, even reaching instability. Model-free controllers perform well for different tasks and are relatively easy to tune; however, they cannot guarantee an optimal performance and require re-tuning the control gains when the robot parameter are changed or a disturbance is applied.

Figure 1.1 Classic robot control

Figure 1.2 Model compensation control

All the above controllers are designed for position control and do not consider interaction with the environment. There is a great diversity of works related to the interaction, such as stiffness control, force control, hybrid control, and impedance control [14]. The force control regulates the interaction force using P (stiffness control), PD, and PID force controllers [15]. The position control can also use force control to perform position and velocity tracking [16, 17] (see Figure 1.3). Here is the desired force, is the contact force, is the force error, is the output of the force controller, and is the position error in task space. The force/position control uses the force for the compensation [17]. It can also use full dynamics to linearize the closed-loop system for perfect tracking [18].

Figure 1.3 Position/force control

Impedance control [7] addresses the problem of how to move the robot end-effector when it is in contact with the external environment. It uses a desired dynamic model, also known as mechanical impedance, to design the control. The simplest impedance control is the stiffness control, where the stiffness of the robot and the environment have a proportional interaction [19].

Traditional impedance control linearizes the system by assuming that the robot model is known exactly [20-22]. These algorithms need the strong assumption that the exact robot dynamics are known [23]. The robustness of the control lies in the compensation of the model.

Most impedance controllers assume that the desired inertia of the impedance model is equal to the robot inertia. Thus, we only have the stiffness and damping terms, which is equivalent to a PD control law [8, 21, 24]. One way to solve the inaccuracy of dynamic model compensation is through the use of adaptive algorithms, neural networks, or other intelligent methods [9, 25-31]

There are several implementations of impedance control. In [32], the impedance control uses human characteristics to obtain the inertia, damping, and stiffness components of the desired impedance. For the position control a PID control is used, which favors the omission of the model compensation. Another way to avoid the use of the model or to proceed without its full knowledge is to take advantage of system characteristics, that is, the high gear-ratio velocity reduction that causes the non-linear elements to become very small and the system to become decoupled [33].

In mechanical systems, particularly in the haptic field, the admittance is the dynamic mapping from force to motion. The input force "admits" certain amount of movement [11]. The position control based on impedance or admittance needs the inverse impedance model to obtain the reference position [34-38]. This type of scheme is more complete because there is a double control loop where the interaction with the environment can be used more directly.

The applications of impedance/admittance control are quite wide; for example, exoskeletons are used by a human operator. In order to maintain human safety, low mechanical impedance is required, while tracking control requires high impedance to reject the disturbances. So there are different solutions such as frequency molding and the reduction of mechanical impedance using the poles and zeros of the system [39, 40].

Model-based impedance/admittance control is sensitive to modeling error. There exist several modifications to the classical impedance/admittance controllers, such as the position-based impedance control, which improves robustness in the presence of modeling error using an internal position control loop [21].

1.2 Reinforcement Learning for Control

Figure 1.4 shows the control scheme with reinforcement learning. The main difference with the model-free controller in Figure 1.1 is that the reinforcement learning updates its value in each step using the tracking error and control torque.

The reinforcement learning schemes are first designed for discrete-time systems with discrete input space [6, 41]. Among the most famous methods are Monte Carlo [42], Q-learning [43], Sarsa [44], and critic algorithms [45].

If the input space is large or continuous, the classical reinforcement learning algorithms cannot be directly implemented due to the computational cost, and in most cases the algorithm would not converge to a solution [41, 46]. This problem is known as the curse of dimensionality of machine learning. For robot control, the curse of dimensionality increases because there are various degrees of freedom (DOFs), and each DOF needs its own input space [47, 48]. Another factor that makes the dimension problem more acute is the disturbances, because new states and controls must be considered.

To solve the curse of dimensionality, the model-based techniques can be applied to the reinforcement learning [49-51]. These learning methods are very popular; some algorithms are called "policy search" [52-59]. However, these methods require model knowledge to decrease the dimension of the input space.

There are a wide variety of model-free algorithms similar to the discrete-time algorithms. The main idea of these algorithms is to design adequate reward and approximators, which reduces the computational cost in presence of a large or continuous input space.

The simplest approximator to decrease the input space is the handcraft methods [60-65]. They speed up the learning time by looking for regions where the reward is minimized/maximized. [66, 67] use learning methods from input data, similarly to discrete-time learning algorithms, but the learning time increases. Other techniques are based on previously established actions in a sequential and related way; that is, the actions that must be taken at each time instant are defined to do a simple task by themselves [68-72]. The main problems of these methods require an expert knowledge to obtain the best regions and to set the predefined actions.

Figure 1.4 Reinforcement learning for control

A linear combination of approximators learn from input data without expert intervention. The most widely used approximators in robot control are inspired by human morphology [73, 74], neural networks [75-77], local models [74, 78], and Gaussian regression processes [79-82]. The success of these approximators is due to the adequate choice of their parameters and hyper-parameters.

A poor reward design can involve a long learning time, convergence to wrong solutions, or the algorithms never converging to any solution. On the other hand, the proper design of a reward helps...

Systemvoraussetzungen

Als PDF speichern Als Link merken