
Exploiting Environment Configurability in Reinforcement Learning
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
This book, Exploiting Environment Configurability in Reinforcement Learning, aims to formalize and study diverse aspects of environment configuration. In a traditional MDP, the agent perceives the state of the environment and performs actions. As a consequence, the environment transitions to a new state and generates a reward signal. The goal of the agent consists of learning a policy, i.e., a prescription of actions that maximize the long-term reward. Although environment configuration arises quite often in real applications, the topic is very little explored in the literature. The contributions in the book are theoretical, algorithmic, and experimental and can be broadly subdivided into three parts. The first part introduces the novel formalism of Configurable Markov Decision Processes (Conf-MDPs) to model the configuration opportunities offered by the environment. The second part of the book focuses on the cooperative Conf-MDP setting and investigates the problem of finding an agent policy and an environment configuration that jointly optimize the long-term reward. The third part addresses two specific applications of the Conf-MDP framework: policy space identification and control frequency adaptation.
The book will be of interest to all those using RL as part of their work.
More details
Other editions
Additional editions
Content
- Intro
- Title page
- Abstract
- Contents
- List of Figures
- List of Tables
- List of Algorithms
- List of Symbols and Notation
- Acknowledgments
- Introduction
- What is Reinforcement Learning?
- Why Environment Configurability?
- Original Contributions
- Overview
- Foundations of Sequential Decision-Making
- Introduction
- Markov Decision Processes
- Markov Reward Processes
- Markov Chains
- Performance Indexes
- Value Functions
- Optimality Criteria
- Exact Solution Methods
- Reinforcement Learning Algorithms
- Temporal Difference Methods
- Function Approximation
- Policy Search
- Modeling Environment Configurability
- Configurable Markov Decision Processes
- Introduction
- Motivations and Examples
- Definition
- Value Functions
- Bellman Equations and Operators
- Taxonomy
- Related Literature
- Solution Concepts for Conf-MDPs
- Cooperative Setting
- Non-Cooperative Setting
- Learning in Cooperative Configurable Markov Decision Processes
- Learning in Finite Cooperative Conf-MDPs
- Introduction
- Relative Advantage Functions
- Performance Improvement Bound
- Safe Policy Model Iteration
- Theoretical Analysis
- Experimental Evaluation
- Examples of Conf-MDPs
- Learning in Continuous Conf-MDPs
- Introduction
- Solving Parametric Conf-MDPs
- Relative Entropy Model Policy Search
- Theoretical Analysis
- Approximation of the Transition Model
- Experiments
- Applications of Configurable Markov Decision Processes
- Policy Space Identification
- Introduction
- Generalized Likelihood Ratio Test
- Policy Space Identification in a Fixed Env
- Analysis for the Exponential Family
- Policy Space Identification in a Configurable Env
- Connections with Existing Work
- Experimental Results
- Control Frequency Adaptation
- Introduction
- Persisting Actions in MDPs
- Bounding the Performance Loss
- Persistent Fitted Q-Iteration
- Persistence Selection
- Related Works
- Experimental Evaluation
- Open Questions
- Discussion and Conclusions
- Modeling Environment Configurability
- Learning in Conf-MDPs
- Applications of Conf-MDPs
- Appendices
- Additional Results and Proofs
- Additional Results and Proofs of Chapter 6
- Additional Results and Proofs of Chapter 7
- Additional Results and Proofs of Chapter 8
- Additional Results and Proofs of Chapter 9
- Exponential Family Policies
- Gaussian and Boltzmann Linear Policies as Exponential Family distributions
- Fisher Information Matrix
- Subgaussianity Assumption
- Bibliography
System requirements
File format: PDF
Copy protection: Watermark-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Use the free software Adobe Reader, Adobe Digital Editions, or any other PDF viewer of your choice (see eBook Help).
- Tablet/Smartphone (Android; iOS): Install the free app Adobe Digital Editions or another reading app for eBooks, e.g., PocketBook (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (only limited: Kindle).
The file format PDF always displays a book page identically on any hardware. This makes PDF suitable for complex layouts such as those used in textbooks and reference books (images, tables, columns, footnotes). Unfortunately, on the small screens of e-readers or smartphones, PDFs are rather annoying, requiring too much scrolling.
This eBook uses Watermark-DRM, a „soft” copy protection. This means that there are no technical restrictions to prevent illegal distribution. However, there is a personalised watermark embedded in the eBook that can be used to identify the purchaser of the eBook in the event of misuse and to provide evidence for legal purposes.
For more information, see our eBook Help page.