Schweitzer Fachinformationen
Wenn es um professionelles Wissen geht, ist Schweitzer Fachinformationen wegweisend. Kunden aus Recht und Beratung sowie Unternehmen, öffentliche Verwaltungen und Bibliotheken erhalten komplette Lösungen zum Beschaffen, Verwalten und Nutzen von digitalen und gedruckten Medien.
If we know the robot dynamics, we can use them to design model-based controllers (See Figure 1.1). The famous linear controllers are: Proportional-Derivative (PD) [1], linear quadratic regulator (LQR), and Proportional-Integral-Derivative (PID) [2]. They use linear system theory, so the robot dynamics are required to be linearized at some point of operation. The LQR [3-5] control has been used as a basis for the design of reinforcement learning approaches [6].
The classic controllers use complete or partial knowledge of the robot's dynamics. In these cases (without considering disturbances), it is possible to design controllers that guarantee perfect tracking performance. By using the compensation or the pre-compensation techniques, the robot dynamics is canceled and establishes a simpler desired dynamics [7-9]. The control schemes with model compensation or pre-compensation in joint space can be seen in Figure 1.2. Here is the desired reference, is the robot's joint position, is the joint error, is the compensator or pre-compensator of the dynamics, is the control coming from the controller, and is the control torque. A typical model-compensation control is the proportional-derivative (PD) controller with gravity compensation, which helps to decrease the steady-state error caused by the gravity terms of the robot dynamics.
When we do not have exact knowledge of the dynamics, it is not possible to design the previous controllers. Therefore, we need to use model-free controllers. Some famous controllers are: PID control [10, 11], sliding mode control [2, 12], and neural control [13]. These controllers are tuned according to specific plant under certain conditions (disturbances, friction, parameters). When new conditions arise, the controllers do not display the same behavior, even reaching instability. Model-free controllers perform well for different tasks and are relatively easy to tune; however, they cannot guarantee an optimal performance and require re-tuning the control gains when the robot parameter are changed or a disturbance is applied.
Figure 1.1 Classic robot control
Figure 1.2 Model compensation control
All the above controllers are designed for position control and do not consider interaction with the environment. There is a great diversity of works related to the interaction, such as stiffness control, force control, hybrid control, and impedance control [14]. The force control regulates the interaction force using P (stiffness control), PD, and PID force controllers [15]. The position control can also use force control to perform position and velocity tracking [16, 17] (see Figure 1.3). Here is the desired force, is the contact force, is the force error, is the output of the force controller, and is the position error in task space. The force/position control uses the force for the compensation [17]. It can also use full dynamics to linearize the closed-loop system for perfect tracking [18].
Figure 1.3 Position/force control
Impedance control [7] addresses the problem of how to move the robot end-effector when it is in contact with the external environment. It uses a desired dynamic model, also known as mechanical impedance, to design the control. The simplest impedance control is the stiffness control, where the stiffness of the robot and the environment have a proportional interaction [19].
Traditional impedance control linearizes the system by assuming that the robot model is known exactly [20-22]. These algorithms need the strong assumption that the exact robot dynamics are known [23]. The robustness of the control lies in the compensation of the model.
Most impedance controllers assume that the desired inertia of the impedance model is equal to the robot inertia. Thus, we only have the stiffness and damping terms, which is equivalent to a PD control law [8, 21, 24]. One way to solve the inaccuracy of dynamic model compensation is through the use of adaptive algorithms, neural networks, or other intelligent methods [9, 25-31]
There are several implementations of impedance control. In [32], the impedance control uses human characteristics to obtain the inertia, damping, and stiffness components of the desired impedance. For the position control a PID control is used, which favors the omission of the model compensation. Another way to avoid the use of the model or to proceed without its full knowledge is to take advantage of system characteristics, that is, the high gear-ratio velocity reduction that causes the non-linear elements to become very small and the system to become decoupled [33].
In mechanical systems, particularly in the haptic field, the admittance is the dynamic mapping from force to motion. The input force "admits" certain amount of movement [11]. The position control based on impedance or admittance needs the inverse impedance model to obtain the reference position [34-38]. This type of scheme is more complete because there is a double control loop where the interaction with the environment can be used more directly.
The applications of impedance/admittance control are quite wide; for example, exoskeletons are used by a human operator. In order to maintain human safety, low mechanical impedance is required, while tracking control requires high impedance to reject the disturbances. So there are different solutions such as frequency molding and the reduction of mechanical impedance using the poles and zeros of the system [39, 40].
Model-based impedance/admittance control is sensitive to modeling error. There exist several modifications to the classical impedance/admittance controllers, such as the position-based impedance control, which improves robustness in the presence of modeling error using an internal position control loop [21].
Figure 1.4 shows the control scheme with reinforcement learning. The main difference with the model-free controller in Figure 1.1 is that the reinforcement learning updates its value in each step using the tracking error and control torque.
The reinforcement learning schemes are first designed for discrete-time systems with discrete input space [6, 41]. Among the most famous methods are Monte Carlo [42], Q-learning [43], Sarsa [44], and critic algorithms [45].
If the input space is large or continuous, the classical reinforcement learning algorithms cannot be directly implemented due to the computational cost, and in most cases the algorithm would not converge to a solution [41, 46]. This problem is known as the curse of dimensionality of machine learning. For robot control, the curse of dimensionality increases because there are various degrees of freedom (DOFs), and each DOF needs its own input space [47, 48]. Another factor that makes the dimension problem more acute is the disturbances, because new states and controls must be considered.
To solve the curse of dimensionality, the model-based techniques can be applied to the reinforcement learning [49-51]. These learning methods are very popular; some algorithms are called "policy search" [52-59]. However, these methods require model knowledge to decrease the dimension of the input space.
There are a wide variety of model-free algorithms similar to the discrete-time algorithms. The main idea of these algorithms is to design adequate reward and approximators, which reduces the computational cost in presence of a large or continuous input space.
The simplest approximator to decrease the input space is the handcraft methods [60-65]. They speed up the learning time by looking for regions where the reward is minimized/maximized. [66, 67] use learning methods from input data, similarly to discrete-time learning algorithms, but the learning time increases. Other techniques are based on previously established actions in a sequential and related way; that is, the actions that must be taken at each time instant are defined to do a simple task by themselves [68-72]. The main problems of these methods require an expert knowledge to obtain the best regions and to set the predefined actions.
Figure 1.4 Reinforcement learning for control
A linear combination of approximators learn from input data without expert intervention. The most widely used approximators in robot control are inspired by human morphology [73, 74], neural networks [75-77], local models [74, 78], and Gaussian regression processes [79-82]. The success of these approximators is due to the adequate choice of their parameters and hyper-parameters.
A poor reward design can involve a long learning time, convergence to wrong solutions, or the algorithms never converging to any solution. On the other hand, the proper design of a reward helps...
Dateiformat: ePUBKopierschutz: Adobe-DRM (Digital Rights Management)
Systemvoraussetzungen:
Das Dateiformat ePUB ist sehr gut für Romane und Sachbücher geeignet – also für „fließenden” Text ohne komplexes Layout. Bei E-Readern oder Smartphones passt sich der Zeilen- und Seitenumbruch automatisch den kleinen Displays an. Mit Adobe-DRM wird hier ein „harter” Kopierschutz verwendet. Wenn die notwendigen Voraussetzungen nicht vorliegen, können Sie das E-Book leider nicht öffnen. Daher müssen Sie bereits vor dem Download Ihre Lese-Hardware vorbereiten.Bitte beachten Sie: Wir empfehlen Ihnen unbedingt nach Installation der Lese-Software diese mit Ihrer persönlichen Adobe-ID zu autorisieren!
Weitere Informationen finden Sie in unserer E-Book Hilfe.