Grokking Deep Reinforcement Learning

Name: Grokking Deep Reinforcement Learning
Brand: Manning
Price: 49.44 EUR
Availability: OnlineOnly

Miguel Morales(Author)

Manning (Publisher)

1st Edition

Published on 15. October 2020

472 pages

E-Book

ePUB with Adobe-DRM

System requirements

978-1-63835-666-0 (ISBN)

€49.44incl. 7% vat

System requirements

for ePUB with Adobe-DRM

E-Book Single Licence

Available for download

Description

Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.

Alles über E-Books, Kopierschutz & Dateiformate finden Sie in unserem Info- & Hilfebereich.

Grokking Deep Reinforcement Learning uses engaging exercises to teach you how to build deep learning systems. This book combines annotated Python code with intuitive explanations to explore DRL techniques. You'll see how algorithms function and learn to develop your own DRL agents using evaluative feedback.

Summary
We all learn through trial and error. We avoid the things that cause us to experience pain and failure. We embrace and build on the things that give us reward and success. This common pattern is the foundation of deep reinforcement learning: building machine learning systems that explore and learn based on the responses of the environment. Grokking Deep Reinforcement Learning introduces this powerful machine learning approach, using examples, illustrations, exercises, and crystal-clear teaching. You'll love the perfectly paced teaching and the clever, engaging writing style as you dig into this awesome exploration of reinforcement learning fundamentals, effective deep learning techniques, and practical applications in this emerging field.

Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications.

About the technology
We learn by interacting with our environment, and the rewards or punishments we experience guide our future behavior. Deep reinforcement learning brings that same natural process to artificial intelligence, analyzing results to uncover the most efficient ways forward. DRL agents can improve marketing campaigns, predict stock performance, and beat grand masters in Go and chess.

About the book
Grokking Deep Reinforcement Learning uses engaging exercises to teach you how to build deep learning systems. This book combines annotated Python code with intuitive explanations to explore DRL techniques. You'll see how algorithms function and learn to develop your own DRL agents using evaluative feedback.

What's inside
An introduction to reinforcement learning
DRL agents with human-like behaviors
Applying DRL to complex situations

About the reader
For developers with basic deep learning experience.

About the author
Miguel Morales works on reinforcement learning at Lockheed Martin and is an instructor for the Georgia Institute of Technology's Reinforcement Learning and Decision Making course.

Table of Contents

1 Introduction to deep reinforcement learning

2 Mathematical foundations of reinforcement learning

3 Balancing immediate and long-term goals

4 Balancing the gathering and use of information

5 Evaluating agents' behaviors

6 Improving agents' behaviors

7 Achieving goals more effectively and efficiently

8 Introduction to value-based deep reinforcement learning

9 More stable value-based methods

10 Sample-efficient value-based methods

11 Policy-gradient and actor-critic methods

12 Advanced actor-critic methods

13 Toward artificial general intelligence

More details

Other editions

Person

Content

Intro
Grokking Deep Reinforcement Learning
Copyright
dedication
contents
front matter
foreword
preface
acknowledgments
about this book
Who should read this book
How this book is organized: a roadmap
About the code
liveBook discussion forum
about the author
1 Introduction to deep reinforcement learning
What is deep reinforcement learning?
Deep reinforcement learning is a machine learning approach to artificial intelligence
Deep reinforcement learning is concerned with creating computer programs
Deep reinforcement learning agents can solve problems that require intelligence
Deep reinforcement learning agents improve their behavior through trial-and-error learning
Deep reinforcement learning agents learn from sequential feedback
Deep reinforcement learning agents learn from evaluative feedback
Deep reinforcement learning agents learn from sampled feedback
Deep reinforcement learning agents use powerful non-linear function approximation
The past, present, and future of deep reinforcement learning
Recent history of artificial intelligence and deep reinforcement learning
Artificial intelligence winters
The current state of artificial intelligence
Progress in deep reinforcement learning
Opportunities ahead
The suitability of deep reinforcement learning
What are the pros and cons?
Deep reinforcement learning's strengths
Deep reinforcement learning's weaknesses
Setting clear two-way expectations
What to expect from the book?
How to get the most out of this book
Deep reinforcement learning development environment
Summary
2 Mathematical foundations of reinforcement learning
Components of reinforcement learning
Examples of problems, agents, and environments
The agent: The decision maker
The environment: Everything else
Agent-environment interaction cycle
MDPs: The engine of the environment
States: Specific configurations of the environment
Actions: A mechanism to influence the environment
Transition function: Consequences of agent actions
Reward signal: Carrots and sticks
Horizon: Time changes what's optimal
Discount: The future is uncertain, value it less
Extensions to MDPs
Putting it all together
Summary
3 Balancing immediate and long-term goals
The objective of a decision-making agent
Policies: Per-state action prescriptions
State-value function: What to expect from here?
Action-value function: What should I expect from here if I do this?
Action-advantage function: How much better if I do that?
Optimality
Planning optimal sequences of actions
Policy evaluation: Rating policies
Policy improvement: Using ratings to get better
Policy iteration: Improving upon improved behaviors
Value iteration: Improving behaviors early
Summary
4 Balancing the gathering and use of information
The challenge of interpreting evaluative feedback
Bandits: Single-state decision problems
Regret: The cost of exploration
Approaches to solving MAB environments
Greedy: Always exploit
Random: Always explore
Epsilon-greedy: Almost always greedy and sometimes random
Decaying epsilon-greedy: First maximize exploration, then exploitation
Optimistic initialization: Start off believing it's a wonderful world
Strategic exploration
Softmax: Select actions randomly in proportion to their estimates
UCB: It's not about optimism, it's about realistic optimism
Thompson sampling: Balancing reward and risk
Summary
5 Evaluating agents' behaviors
Learning to estimate the value of policies
First-visit Monte Carlo: Improving estimates after each episode
Every-visit Monte Carlo: A different way of handling state visits
Temporal-difference learning: Improving estimates after each step
Learning to estimate from multiple steps
N-step TD learning: Improving estimates after a couple of steps
Forward-view TD(?): Improving estimates of all visited states
TD(?): Improving estimates of all visited states after each step
Summary
6 Improving agents' behaviors
The anatomy of reinforcement learning agents
Most agents gather experience samples
Most agents estimate something
Most agents improve a policy
Generalized policy iteration
Learning to improve policies of behavior
Monte Carlo control: Improving policies after each episode
SARSA: Improving policies after each step
Decoupling behavior from learning
Q-learning: Learning to act optimally, even if we choose not to
Double Q-learning: A max of estimates for an estimate of a max
Summary
7 Achieving goals more effectively and efficiently
Learning to improve policies using robust targets
SARSA(?): Improving policies after each step based on multi-step estimates
Watkins's Q(?): Decoupling behavior from learning, again
Agents that interact, learn, and plan
Dyna-Q: Learning sample models
Trajectory sampling: Making plans for the immediate future
Summary
8 Introduction to value-based deep reinforcement learning
The kind of feedback deep reinforcement learning agents use
Deep reinforcement learning agents deal with sequential feedback
But, if it isn't sequential, what is it?
Deep reinforcement learning agents deal with evaluative feedback
But, if it isn't evaluative, what is it?
Deep reinforcement learning agents deal with sampled feedback
But, if it isn't sampled, what is it?
Introduction to function approximation for reinforcement learning
Reinforcement learning problems can have high-dimensional state and action spaces
Reinforcement learning problems can have continuous state and action spaces
There are advantages when using function approximation
NFQ: The first attempt at value-based deep reinforcement learning
First decision point: Selecting a value function to approximate
Second decision point: Selecting a neural network architecture
Third decision point: Selecting what to optimize
Fourth decision point: Selecting the targets for policy evaluation
Fifth decision point: Selecting an exploration strategy
Sixth decision point: Selecting a loss function
Seventh decision point: Selecting an optimization method
Things that could (and do) go wrong
Summary
9 More stable value-based methods
DQN: Making reinforcement learning more like supervised learning
Common problems in value-based deep reinforcement learning
Using target networks
Using larger networks
Using experience replay
Using other exploration strategies
Double DQN: Mitigating the overestimation of action-value functions
The problem of overestimation, take two
Separating action selection from action evaluation
A solution
A more practical solution
A more forgiving loss function
Things we can still improve on
Summary
10 Sample-efficient value-based methods
Dueling DDQN: A reinforcement-learning-aware neural network architecture
Reinforcement learning isn't a supervised learning problem
Nuances of value-based deep reinforcement learning methods
Advantage of using advantages
A reinforcement-learning-aware architecture
Building a dueling network
Reconstructing the action-value function
Continuously updating the target network
What does the dueling network bring to the table?
PER: Prioritizing the replay of meaningful experiences
A smarter way to replay experiences
Then, what's a good measure of "important" experiences?
Greedy prioritization by TD error
Sampling prioritized experiences stochastically
Proportional prioritization
Rank-based prioritization
Prioritization bias
Summary
11 Policy-gradient and actor-critic methods
REINFORCE: Outcome-based policy learning
Introduction to policy-gradient methods
Advantages of policy-gradient methods
Learning policies directly
Reducing the variance of the policy gradient
VPG: Learning a value function
Further reducing the variance of the policy gradient
Learning a value function
Encouraging exploration
A3C: Parallel policy updates
Using actor-workers
Using n-step estimates
Non-blocking model updates
GAE: Robust advantage estimation
Generalized advantage estimation
A2C: Synchronous policy updates
Weight-sharing model
Restoring order in policy updates
Summary
12 Advanced actor-critic methods
DDPG: Approximating a deterministic policy
DDPG uses many tricks from DQN
Learning a deterministic policy
Exploration with deterministic policies
TD3: State-of-the-art improvements over DDPG
Double learning in DDPG
Smoothing the targets used for policy updates
Delaying updates
SAC: Maximizing the expected return and entropy
Adding the entropy to the Bellman equations
Learning the action-value function
Learning the policy
Automatically tuning the entropy coefficient
PPO: Restricting optimization steps
Using the same actor-critic architecture as A2C
Batching experiences
Clipping the policy updates
Clipping the value function updates
Summary
13 Toward artificial general intelligence
What was covered and what notably wasn't?
Markov decision processes
Planning methods
Bandit methods
Tabular reinforcement learning
Value-based deep reinforcement learning
Policy-based and actor-critic deep reinforcement learning
Advanced actor-critic techniques
Model-based deep reinforcement learning
Derivative-free optimization methods
More advanced concepts toward AGI
What is AGI, again?
Advanced exploration strategies
Inverse reinforcement learning
Transfer learning
Multi-task learning
Curriculum learning
Meta learning
Hierarchical reinforcement learning
Multi-agent reinforcement learning
Explainable AI, safety, fairness, and ethical standards
What happens next?
How to use DRL to solve custom problems
Going forward
Get yourself out there! Now!
Summary
index

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Grokking Deep Reinforcement Learning

Description

More details

Other editions

Additional editions

Person

Content

System requirements