Programming by Demonstration

Name: Programming by Demonstration | Intuitive Techniques for Robotic Skill Acquisition and Adaptation
Brand: One Billion Knowledgeable
Price: 4.99 EUR
Availability: OnlineOnly

Intuitive Techniques for Robotic Skill Acquisition and Adaptation

Fouad Sabry(Autor*in)

One Billion Knowledgeable (Verlag)

1. Auflage

Erschienen am 24. Dezember 2024

202 Seiten

E-Book

ePUB mit Adobe-DRM

Systemvoraussetzungen

6610000689309 (EAN)

4,99 €inkl. 7% MwSt.

Systemvoraussetzungen

für ePUB mit Adobe-DRM

E-Book Einzellizenz

Als Download verfügbar

Beschreibung

'Programming by Demonstration' is an essential resource for those interested in the rapidly advancing field of robotics. Written by Fouad Sabry, this book bridges theoretical knowledge with practical applications in robotic programming. Whether you're a professional, an undergraduate or graduate student, or an enthusiast, this comprehensive guide is designed to enhance your understanding and skills in robotics and automation.

Programming by demonstration: This chapter introduces the fundamental concept of programming by demonstration, focusing on the role of human guidance in robot learning.

Humanoid robot: Explores the design and development of humanoid robots, their challenges, and the significance of their lifelike movements and interactions.

Reinforcement learning: Discusses how reinforcement learning techniques empower robots to learn from their actions, making them adaptive and capable of handling complex tasks.

Developmental robotics: Focuses on the developmental processes in robotics, where robots learn progressively, much like human development, through interaction and feedback.

Human-robot interaction: This chapter delves into the various methods of interaction between humans and robots, emphasizing safety, efficiency, and the potential for collaboration.

Robot learning: Explores different learning paradigms in robotics, including supervised and unsupervised learning, and their application to realworld robotic systems.

Programming by example: Introduces programming by example as a form of teaching robots specific tasks by showing them how to perform actions directly.

Adaptable robotics: Investigates the adaptability of robots in dynamic environments and how they can modify their behavior based on new data or tasks.

Legged robot: Focuses on legged robots and their unique challenges, such as balance, locomotion, and interaction with various terrains.

Offline learning: Covers offline learning methods that allow robots to be trained without realtime interaction, improving their efficiency and reducing training costs.

Apprenticeship learning: Discusses the apprenticeship learning model, where robots learn from expert demonstrations to mimic complex behaviors.

Surena (robot): Provides a detailed look at Surena, a humanoid robot developed in Iran, showcasing its capabilities and the innovations behind its design.

Juggling robot: Describes a robot capable of performing complex tasks like juggling, highlighting the challenges and solutions in balancing dynamic motion.

Cloud robotics: Explores how cloud computing is integrated into robotics, enabling robots to share data and computational resources for better performance.

Incremental learning: Focuses on incremental learning techniques, allowing robots to continuously improve their abilities without forgetting previous knowledge.

Jan Peters (computer scientist): Highlights the work of Jan Peters, a pioneer in robotics, and discusses his contributions to learning and robot development.

Deep reinforcement learning: Introduces deep reinforcement learning, a cuttingedge approach where robots improve their decisionmaking capabilities through neural networks.

Aude Billard: A look at Aude Billard's groundbreaking research in humanrobot interaction and robot learning, emphasizing her impact on the field.

Auke Ijspeert: Discusses the work of Auke Ijspeert, particularly his contributions to robotic locomotion and braininspired robotic control.

Imitation learning: Focuses on imitation learning, a process where robots learn tasks by observing human behavior, a powerful tool for skill transfer.

Robot: Concludes with an exploration of robots in general, covering their history, development, and future potential in various industries.

Weitere Details

Inhalt

Chapter 3: Reinforcement learning

Reinforcement learning, sometimes known as RL, is a subfield of machine learning that investigates how intelligent agents should behave in an environment in order to maximize the concept of cumulative reward. One of the three fundamental models of machine learning is known as reinforcement learning. The other two models are supervised learning and unsupervised learning.

The key difference between supervised learning and reinforcement learning is that the former does not need labelled input/output pairings to be given, and the latter does not require sub-optimal behaviors to be explicitly corrected. Instead, the emphasis is on striking a healthy balance between exploiting existing resources and discovering new ones (in unexplored terrain) (of current knowledge).

Because many reinforcement learning algorithms designed for this context make use of dynamic programming methods, the environment is generally expressed in the form of a Markov decision process, also known as an MDP. The primary distinction between traditional methods of dynamic programming and reinforcement learning algorithms is that the latter do not presume prior knowledge of an exact mathematical model of the MDP and aim to solve large-scale MDPs, which are beyond the scope of traditional methods due to the impracticality of exact solutions.

Reinforcement learning is investigated across a wide variety of fields as a result of its generic nature. Some examples of these fields include game theory, control theory, operations research, information theory, simulation-based optimization, multi-agent systems, swarm intelligence, and statistics. In the academic literature pertaining to operations research and control, reinforcement learning is referred to as either neuro-dynamic programming or approximation dynamic programming. The problems that are of interest in reinforcement learning have also been studied in the theory of optimal control. The theory of optimal control is primarily concerned with the existence and characterization of optimal solutions, as well as algorithms for the exact computation of those solutions, and is less concerned with learning or approximation, particularly in the absence of a mathematical model of the environment. Reinforcement learning is a method that may be used in the fields of economics and game theory to explain how equilibrium might emerge from limited rationality.

The most fundamental kind of learning, known as reinforcement learning, is depicted as a Markov decision process (MDP):

a collection of states held by both the environment and the agents, S; a group of the agent's acts, denoted by "A.";

is the probability of transition (at time

) from state

to state

under action

is the immediate reward after transition from

with action

The goal of reinforcement learning is for the agent to learn an optimal, or nearly optimal, policy that maximizes the "reward function" or another user-provided reinforcement signal that accumulates from the immediate rewards. This can be accomplished by learning an optimal policy that maximizes the "reward function" or another user-provided reinforcement signal. This is analogous to mechanisms that seem to take place in animal psychology. For instance, biological brains are programmed to understand signals like pain and hunger as negative reinforcements. On the other hand, biological brains are also hardwired to interpret pleasure and the consumption of food as positive reinforcements. Animals may acquire the ability to participate in activities that maximize the benefits they get under certain conditions. This suggests that animals are capable of acquiring skills via positive reinforcement.

A simple artificial intelligence that uses reinforcement learning would interact with its surroundings in discrete time increments.

At each and every time t, the agent receives the current state

and reward

It then chooses an action

from the set of available actions, which is then discharged onto the surrounding environment.

The environment moves to a new state

and the reward

associated with the transition

is determined.

The goal of a reinforcement learning agent is to learn a policy:

which maximizes the expected cumulative reward.

When the issue is presented in the form of an MDP, it is assumed that the agent directly observes the current state of the environment. When this is the case, we say that the issue has complete observability. If the agent can only see a subset of the states or if the seen states are tainted by noise, then the agent is said to have partial observability, and the issue must be formalized as a partially observable Markov decision process in order to be solved. In either scenario, there is a possibility that the agent's choice of actions will be limited. For instance, the state of an account balance might be limited to only exist in positive territory; if the value of the state at the moment is 3, and the state transition tries to lower the value by 4, the transition will not be permitted.

The concept of regret is brought about when the performance of the agent is contrasted to that of an agent who operates in the most effective manner possible. It is necessary for the agent to think about the long-term effects of its actions (i.e., maximize future revenue) in order for it to behave in a manner that is close to optimum, despite the fact that the immediate reward associated with this may be unfavorable.

Therefore, situations that involve a trade-off between the rewards received in the short term and the long term are especially well suited for reinforcement learning. It has been effectively used to a wide variety of challenges, including robot control and Go (AlphaGo).

Reinforcement learning is so effective because of two distinct factors: first, the utilization of samples to improve overall performance, and second, the utilization of function approximation to manage expansive situations. Reinforcement learning is applicable in vast contexts because of these two fundamental components in the following scenarios::

There is a model of the environment, but there is not yet a solution that can be analytically derived from it; The only model of the environment provided is a simulation of it (the subject of simulation-based optimization); Interacting with one's surroundings is the only method to acquire knowledge on that surroundings.

The first two of these challenges may be categorized as planning challenges (given that there is some kind of model available), however the third challenge could be categorized as a true learning challenge. Reinforcement learning, on the other hand, transforms both of these planning difficulties into machine learning problems.

The trade-off between exploration and exploitation has been examined in great depth via the use of the multi-armed bandit problem and for limited state space MDPs in Burnetas and Katehakis (1997).

Learning via reinforcement necessitates the use of ingenious exploration methods; poor performance might result from picking actions at random without making reference to an estimated probability distribution. There is a good deal of comprehension of the (small) finite MDP scenario. Nevertheless, owing to the absence of algorithms that scale effectively with the number of states (or scale to issues with infinite state spaces), basic exploration approaches are the most practically applicable.

One such method is

-greedy, where

is a parameter controlling the amount of exploration vs.

exploitation.

With probability

, The practice of exploitation is selected, And the agent will choose the course of action that it feels will have the most positive impact in the long run (ties between actions are broken uniformly at random).

Alternatively, with probability

, The option of exploration is taken, and the action is selected in a standardized manner at random.

is usually a fixed parameter but can be adjusted either according to a schedule (making the agent explore progressively less), either in an adaptable manner based on heuristics, or both.

Even if the difficulty of exploration is overlooked and even if the state was viewable (which will be assumed afterwards), the challenge of determining which behaviors lead to larger cumulative rewards using previous experience is still a concern.

The policy serves as a model for the agent's decision-making process about action selection:

The policy map gives the probability of taking action

when in state

.:?61? There are also deterministic policies.

The value function

is defined as, expected return starting with state

, i.e.

, and successively following policy

Hence, roughly speaking, the value function estimates "how good" it is to be in a given state.:?60?

where the random variable

denotes the return, It is defined as the total of future rewards after taking into account discounts:

where

is the reward at step

is the discount-rate.

Gamma is a negative number, Therefore, events that are very far in the future are given less weight than occurrences that are relatively close in the future.

The algorithm has to identify a strategy that will provide the highest possible predicted return. According to the theory behind MDPs, it is common knowledge that the search space may be limited to the group of policies...

Systemvoraussetzungen

Als PDF speichern Als Link merken