
Grokking Deep Reinforcement Learning
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
Summary
We all learn through trial and error. We avoid the things that cause us to experience pain and failure. We embrace and build on the things that give us reward and success. This common pattern is the foundation of deep reinforcement learning: building machine learning systems that explore and learn based on the responses of the environment. Grokking Deep Reinforcement Learning introduces this powerful machine learning approach, using examples, illustrations, exercises, and crystal-clear teaching. You'll love the perfectly paced teaching and the clever, engaging writing style as you dig into this awesome exploration of reinforcement learning fundamentals, effective deep learning techniques, and practical applications in this emerging field.
Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications.
About the technology
We learn by interacting with our environment, and the rewards or punishments we experience guide our future behavior. Deep reinforcement learning brings that same natural process to artificial intelligence, analyzing results to uncover the most efficient ways forward. DRL agents can improve marketing campaigns, predict stock performance, and beat grand masters in Go and chess.
About the book
Grokking Deep Reinforcement Learning uses engaging exercises to teach you how to build deep learning systems. This book combines annotated Python code with intuitive explanations to explore DRL techniques. You'll see how algorithms function and learn to develop your own DRL agents using evaluative feedback.
What's inside
An introduction to reinforcement learning
DRL agents with human-like behaviors
Applying DRL to complex situations
About the reader
For developers with basic deep learning experience.
About the author
Miguel Morales works on reinforcement learning at Lockheed Martin and is an instructor for the Georgia Institute of Technology's Reinforcement Learning and Decision Making course.
Table of Contents
1 Introduction to deep reinforcement learning
2 Mathematical foundations of reinforcement learning
3 Balancing immediate and long-term goals
4 Balancing the gathering and use of information
5 Evaluating agents' behaviors
6 Improving agents' behaviors
7 Achieving goals more effectively and efficiently
8 Introduction to value-based deep reinforcement learning
9 More stable value-based methods
10 Sample-efficient value-based methods
11 Policy-gradient and actor-critic methods
12 Advanced actor-critic methods
13 Toward artificial general intelligence
More details
Other editions
Additional editions

Person
Content
- Intro
- Grokking Deep Reinforcement Learning
- Copyright
- dedication
- contents
- front matter
- foreword
- preface
- acknowledgments
- about this book
- Who should read this book
- How this book is organized: a roadmap
- About the code
- liveBook discussion forum
- about the author
- 1 Introduction to deep reinforcement learning
- What is deep reinforcement learning?
- Deep reinforcement learning is a machine learning approach to artificial intelligence
- Deep reinforcement learning is concerned with creating computer programs
- Deep reinforcement learning agents can solve problems that require intelligence
- Deep reinforcement learning agents improve their behavior through trial-and-error learning
- Deep reinforcement learning agents learn from sequential feedback
- Deep reinforcement learning agents learn from evaluative feedback
- Deep reinforcement learning agents learn from sampled feedback
- Deep reinforcement learning agents use powerful non-linear function approximation
- The past, present, and future of deep reinforcement learning
- Recent history of artificial intelligence and deep reinforcement learning
- Artificial intelligence winters
- The current state of artificial intelligence
- Progress in deep reinforcement learning
- Opportunities ahead
- The suitability of deep reinforcement learning
- What are the pros and cons?
- Deep reinforcement learning's strengths
- Deep reinforcement learning's weaknesses
- Setting clear two-way expectations
- What to expect from the book?
- How to get the most out of this book
- Deep reinforcement learning development environment
- Summary
- 2 Mathematical foundations of reinforcement learning
- Components of reinforcement learning
- Examples of problems, agents, and environments
- The agent: The decision maker
- The environment: Everything else
- Agent-environment interaction cycle
- MDPs: The engine of the environment
- States: Specific configurations of the environment
- Actions: A mechanism to influence the environment
- Transition function: Consequences of agent actions
- Reward signal: Carrots and sticks
- Horizon: Time changes what's optimal
- Discount: The future is uncertain, value it less
- Extensions to MDPs
- Putting it all together
- Summary
- 3 Balancing immediate and long-term goals
- The objective of a decision-making agent
- Policies: Per-state action prescriptions
- State-value function: What to expect from here?
- Action-value function: What should I expect from here if I do this?
- Action-advantage function: How much better if I do that?
- Optimality
- Planning optimal sequences of actions
- Policy evaluation: Rating policies
- Policy improvement: Using ratings to get better
- Policy iteration: Improving upon improved behaviors
- Value iteration: Improving behaviors early
- Summary
- 4 Balancing the gathering and use of information
- The challenge of interpreting evaluative feedback
- Bandits: Single-state decision problems
- Regret: The cost of exploration
- Approaches to solving MAB environments
- Greedy: Always exploit
- Random: Always explore
- Epsilon-greedy: Almost always greedy and sometimes random
- Decaying epsilon-greedy: First maximize exploration, then exploitation
- Optimistic initialization: Start off believing it's a wonderful world
- Strategic exploration
- Softmax: Select actions randomly in proportion to their estimates
- UCB: It's not about optimism, it's about realistic optimism
- Thompson sampling: Balancing reward and risk
- Summary
- 5 Evaluating agents' behaviors
- Learning to estimate the value of policies
- First-visit Monte Carlo: Improving estimates after each episode
- Every-visit Monte Carlo: A different way of handling state visits
- Temporal-difference learning: Improving estimates after each step
- Learning to estimate from multiple steps
- N-step TD learning: Improving estimates after a couple of steps
- Forward-view TD(?): Improving estimates of all visited states
- TD(?): Improving estimates of all visited states after each step
- Summary
- 6 Improving agents' behaviors
- The anatomy of reinforcement learning agents
- Most agents gather experience samples
- Most agents estimate something
- Most agents improve a policy
- Generalized policy iteration
- Learning to improve policies of behavior
- Monte Carlo control: Improving policies after each episode
- SARSA: Improving policies after each step
- Decoupling behavior from learning
- Q-learning: Learning to act optimally, even if we choose not to
- Double Q-learning: A max of estimates for an estimate of a max
- Summary
- 7 Achieving goals more effectively and efficiently
- Learning to improve policies using robust targets
- SARSA(?): Improving policies after each step based on multi-step estimates
- Watkins's Q(?): Decoupling behavior from learning, again
- Agents that interact, learn, and plan
- Dyna-Q: Learning sample models
- Trajectory sampling: Making plans for the immediate future
- Summary
- 8 Introduction to value-based deep reinforcement learning
- The kind of feedback deep reinforcement learning agents use
- Deep reinforcement learning agents deal with sequential feedback
- But, if it isn't sequential, what is it?
- Deep reinforcement learning agents deal with evaluative feedback
- But, if it isn't evaluative, what is it?
- Deep reinforcement learning agents deal with sampled feedback
- But, if it isn't sampled, what is it?
- Introduction to function approximation for reinforcement learning
- Reinforcement learning problems can have high-dimensional state and action spaces
- Reinforcement learning problems can have continuous state and action spaces
- There are advantages when using function approximation
- NFQ: The first attempt at value-based deep reinforcement learning
- First decision point: Selecting a value function to approximate
- Second decision point: Selecting a neural network architecture
- Third decision point: Selecting what to optimize
- Fourth decision point: Selecting the targets for policy evaluation
- Fifth decision point: Selecting an exploration strategy
- Sixth decision point: Selecting a loss function
- Seventh decision point: Selecting an optimization method
- Things that could (and do) go wrong
- Summary
- 9 More stable value-based methods
- DQN: Making reinforcement learning more like supervised learning
- Common problems in value-based deep reinforcement learning
- Using target networks
- Using larger networks
- Using experience replay
- Using other exploration strategies
- Double DQN: Mitigating the overestimation of action-value functions
- The problem of overestimation, take two
- Separating action selection from action evaluation
- A solution
- A more practical solution
- A more forgiving loss function
- Things we can still improve on
- Summary
- 10 Sample-efficient value-based methods
- Dueling DDQN: A reinforcement-learning-aware neural network architecture
- Reinforcement learning isn't a supervised learning problem
- Nuances of value-based deep reinforcement learning methods
- Advantage of using advantages
- A reinforcement-learning-aware architecture
- Building a dueling network
- Reconstructing the action-value function
- Continuously updating the target network
- What does the dueling network bring to the table?
- PER: Prioritizing the replay of meaningful experiences
- A smarter way to replay experiences
- Then, what's a good measure of "important" experiences?
- Greedy prioritization by TD error
- Sampling prioritized experiences stochastically
- Proportional prioritization
- Rank-based prioritization
- Prioritization bias
- Summary
- 11 Policy-gradient and actor-critic methods
- REINFORCE: Outcome-based policy learning
- Introduction to policy-gradient methods
- Advantages of policy-gradient methods
- Learning policies directly
- Reducing the variance of the policy gradient
- VPG: Learning a value function
- Further reducing the variance of the policy gradient
- Learning a value function
- Encouraging exploration
- A3C: Parallel policy updates
- Using actor-workers
- Using n-step estimates
- Non-blocking model updates
- GAE: Robust advantage estimation
- Generalized advantage estimation
- A2C: Synchronous policy updates
- Weight-sharing model
- Restoring order in policy updates
- Summary
- 12 Advanced actor-critic methods
- DDPG: Approximating a deterministic policy
- DDPG uses many tricks from DQN
- Learning a deterministic policy
- Exploration with deterministic policies
- TD3: State-of-the-art improvements over DDPG
- Double learning in DDPG
- Smoothing the targets used for policy updates
- Delaying updates
- SAC: Maximizing the expected return and entropy
- Adding the entropy to the Bellman equations
- Learning the action-value function
- Learning the policy
- Automatically tuning the entropy coefficient
- PPO: Restricting optimization steps
- Using the same actor-critic architecture as A2C
- Batching experiences
- Clipping the policy updates
- Clipping the value function updates
- Summary
- 13 Toward artificial general intelligence
- What was covered and what notably wasn't?
- Markov decision processes
- Planning methods
- Bandit methods
- Tabular reinforcement learning
- Value-based deep reinforcement learning
- Policy-based and actor-critic deep reinforcement learning
- Advanced actor-critic techniques
- Model-based deep reinforcement learning
- Derivative-free optimization methods
- More advanced concepts toward AGI
- What is AGI, again?
- Advanced exploration strategies
- Inverse reinforcement learning
- Transfer learning
- Multi-task learning
- Curriculum learning
- Meta learning
- Hierarchical reinforcement learning
- Multi-agent reinforcement learning
- Explainable AI, safety, fairness, and ethical standards
- What happens next?
- How to use DRL to solve custom problems
- Going forward
- Get yourself out there! Now!
- Summary
- index
System requirements
File format: ePUB
Copy protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (not Kindle).
The file format ePub works well for novels and non-fiction books – i.e., „flowing” text without complex layout. On an e-reader or smartphone, line and page breaks automatically adjust to fit the small displays.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our ebook Help page.