
Python Reinforcement Learning
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
More details
Other editions
Additional editions

Content
- Cover
- Title Page
- Copyright and Credits
- About Packt
- Contributors
- Table of Contents
- Preface
- Chapter 1: Introduction to Reinforcement Learning
- What is RL?
- RL algorithm
- How RL differs from other ML paradigms
- Elements of RL
- Agent
- Policy function
- Value function
- Model
- Agent environment interface
- Types of RL environment
- Deterministic environment
- Stochastic environment
- Fully observable environment
- Partially observable environment
- Discrete environment
- Continuous environment
- Episodic and non-episodic environment
- Single and multi-agent environment
- RL platforms
- OpenAI Gym and Universe
- DeepMind Lab
- RL-Glue
- Project Malmo
- ViZDoom
- Applications of RL
- Education
- Medicine and healthcare
- Manufacturing
- Inventory management
- Finance
- Natural Language Processing and Computer Vision
- Summary
- Questions
- Further reading
- Chapter 2: Getting Started with OpenAI and TensorFlow
- Setting up your machine
- Installing Anaconda
- Installing Docker
- Installing OpenAI Gym and Universe
- Common error fixes
- OpenAI Gym
- Basic simulations
- Training a robot to walk
- OpenAI Universe
- Building a video game bot
- TensorFlow
- Variables, constants, and placeholders
- Variables
- Constants
- Placeholders
- Computation graph
- Sessions
- TensorBoard
- Adding scope
- Summary
- Questions
- Further reading
- Chapter 3: The Markov Decision Process and Dynamic Programming
- The Markov chain and Markov process
- Markov Decision Process
- Rewards and returns
- Episodic and continuous tasks
- Discount factor
- The policy function
- State value function
- State-action value function (Q function)
- The Bellman equation and optimality
- Deriving the Bellman equation for value and Q functions
- Solving the Bellman equation
- Dynamic programming
- Value iteration
- Policy iteration
- Solving the frozen lake problem
- Value iteration
- Policy iteration
- Summary
- Questions
- Further reading
- Chapter 4: Gaming with Monte Carlo Methods
- Monte Carlo methods
- Estimating the value of pi using Monte Carlo
- Monte Carlo prediction
- First visit Monte Carlo
- Every visit Monte Carlo
- Let's play Blackjack with Monte Carlo
- Monte Carlo control
- Monte Carlo exploration starts
- On-policy Monte Carlo control
- Off-policy Monte Carlo control
- Summary
- Questions
- Further reading
- Chapter 5: Temporal Difference Learning
- TD learning
- TD prediction
- TD control
- Q learning
- Solving the taxi problem using Q learning
- SARSA
- Solving the taxi problem using SARSA
- The difference between Q learning and SARSA
- Summary
- Questions
- Further reading
- Chapter 6: Multi-Armed Bandit Problem
- The MAB problem
- The epsilon-greedy policy
- The softmax exploration algorithm
- The upper confidence bound algorithm
- The Thompson sampling algorithm
- Applications of MAB
- Identifying the right advertisement banner using MAB
- Contextual bandits
- Summary
- Questions
- Further reading
- Chapter 7: Playing Atari Games
- Introduction to Atari games
- Building an Atari emulator
- Getting started
- Implementation of the Atari emulator
- Atari simulator using gym
- Data preparation
- Deep Q-learning
- Basic elements of reinforcement learning
- Demonstrating basic Q-learning algorithm
- Implementation of DQN
- Experiments
- Summary
- Chapter 8: Atari Games with Deep Q Network
- What is a Deep Q Network?
- Architecture of DQN
- Convolutional network
- Experience replay
- Target network
- Clipping rewards
- Understanding the algorithm
- Building an agent to play Atari games
- Double DQN
- Prioritized experience replay
- Dueling network architecture
- Summary
- Questions
- Further reading
- Chapter 9: Playing Doom with a Deep Recurrent Q Network
- DRQN
- Architecture of DRQN
- Training an agent to play Doom
- Basic Doom game
- Doom with DRQN
- DARQN
- Architecture of DARQN
- Summary
- Questions
- Further reading
- Chapter 10: The Asynchronous Advantage Actor Critic Network
- The Asynchronous Advantage Actor Critic
- The three As
- The architecture of A3C
- How A3C works
- Driving up a mountain with A3C
- Visualization in TensorBoard
- Summary
- Questions
- Further reading
- Chapter 11: Policy Gradients and Optimization
- Policy gradient
- Lunar Lander using policy gradients
- Deep deterministic policy gradient
- Swinging a pendulum
- Trust Region Policy Optimization
- Proximal Policy Optimization
- Summary
- Questions
- Further reading
- Chapter 12: Balancing CartPole
- OpenAI Gym
- Gym
- Installation
- Running an environment
- Atari
- Algorithmic tasks
- MuJoCo
- Robotics
- Markov models
- CartPole
- Summary
- Chapter 13: Simulating Control Tasks
- Introduction to control tasks
- Getting started
- The classic control tasks
- Deterministic policy gradient
- The theory behind policy gradient
- DPG algorithm
- Implementation of DDPG
- Experiments
- Trust region policy optimization
- Theory behind TRPO
- TRPO algorithm
- Experiments on MuJoCo tasks
- Summary
- Chapter 14: Building Virtual Worlds in Minecraft
- Introduction to the Minecraft environment
- Data preparation
- Asynchronous advantage actor-critic algorithm
- Implementation of A3C
- Experiments
- Summary
- Chapter 15: Learning to Play Go
- A brief introduction to Go
- Go and other board games
- Go and AI research
- Monte Carlo tree search
- Selection
- Expansion
- Simulation
- Update
- AlphaGo
- Supervised learning policy networks
- Reinforcement learning policy networks
- Value network
- Combining neural networks and MCTS
- AlphaGo Zero
- Training AlphaGo Zero
- Comparison with AlphaGo
- Implementing AlphaGo Zero
- Policy and value networks
- preprocessing.py
- features.py
- network.py
- Monte Carlo tree search
- mcts.py
- Combining PolicyValueNetwork and MCTS
- alphagozero_agent.py
- Putting everything together
- controller.py
- train.py
- Summary
- References
- Chapter 16: Creating a Chatbot
- The background problem
- Dataset
- Step-by-step guide
- Data parser
- Data reader
- Helper methods
- Chatbot model
- Training the data
- Testing and results
- Summary
- Chapter 17: Generating a Deep Learning Image Classifier
- Neural Architecture Search
- Generating and training child networks
- Training the Controller
- Training algorithm
- Implementing NAS
- child_network.py
- cifar10_processor.py
- controller.py
- Method for generating the Controller
- Generating a child network using the Controller
- train_controller method
- Testing ChildCNN
- config.py
- train.py
- Additional exercises
- Advantages of NAS
- Summary
- Chapter 18: Predicting Future Stock Prices
- Background problem
- Data used
- Step-by-step guide
- Actor script
- Critic script
- Agent script
- Helper script
- Training the data
- Final result
- Summary
- Chapter 19: Capstone Project - Car Racing Using DQN
- Environment wrapper functions
- Dueling network
- Replay memory
- Training the network
- Car racing
- Summary
- Questions
- Further reading
- Chapter 20: Looking Ahead
- The shortcomings of reinforcement learning
- Resource efficiency
- Reproducibility
- Explainability/accountability
- Susceptibility to attacks
- Upcoming developments in reinforcement learning
- Addressing the limitations
- Transfer learning
- Multi-agent reinforcement learning
- Summary
- References
- Assessments
- Other Books You May Enjoy
- Index
System requirements
File format: ePUB
Copy protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (not Kindle).
The file format ePub works well for novels and non-fiction books – i.e., „flowing” text without complex layout. On an e-reader or smartphone, line and page breaks automatically adjust to fit the small displays.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our ebook Help page.