Probabilistic Graphical Models

Name: Probabilistic Graphical Models | Principles and Techniques
Brand: MIT Press
Price: 142.99 EUR
Availability: OnlineOnly

Principles and Techniques

Daphne Koller Nir Friedman(Author)

MIT Press

Published on 31. July 2009

1270 pages

E-Book

ePUB with Adobe-DRM

System requirements

978-0-262-25835-7 (ISBN)

€142.99incl. 7% vat

System requirements

for ePUB with Adobe-DRM

E-Book Single Licence

Available for download

Description

Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.

Alles über E-Books, Kopierschutz & Dateiformate finden Sie in unserem Info- & Hilfebereich.

A general framework for constructing and using probabilistic models of complex systems that would enable a computer to use available information for making decisions.

Most tasks require a person or an automated system to reason—to reach conclusions based on available information. The framework of probabilistic graphical models, presented in this book, provides a general approach for this task. The approach is model-based, allowing interpretable models to be constructed and then manipulated by reasoning algorithms. These models can also be learned automatically from data, allowing the approach to be used in cases where manually constructing a model is difficult or even impossible. Because uncertainty is an inescapable aspect of most real-world applications, the book focuses on probabilistic models, which make the uncertainty explicit and provide models that are more faithful to reality.

Probabilistic Graphical Models discusses a variety of models, spanning Bayesian networks, undirected Markov networks, discrete and continuous models, and extensions to deal with dynamical systems and relational data. For each class of models, the text describes the three fundamental cornerstones: representation, inference, and learning, presenting both basic concepts and advanced techniques. Finally, the book considers the use of the proposed framework for causal reasoning and decision making under uncertainty. The main text in each chapter provides the detailed technical development of the key ideas. Most chapters also include boxes with additional material: skill boxes, which describe techniques; case study boxes, which discuss empirical cases related to the approach described in the text, including applications in computer vision, robotics, natural language understanding, and computational biology; and concept boxes, which present significant concepts drawn from the material in the chapter. Instructors (and readers) can group chapters in various combinations, from core topics to more technically advanced material, to suit their particular needs.

More details

Other editions

Persons

Content

Cover
Copyright
Contents
Acknowledgments
List of Figures
List of Algorithms
List of Boxes
1 Introduction
1.1 Motivation
1.2 Structured Probabilistic Models
1.2.1 Probabilistic Graphical Models
1.2.2 Representation, Inference, Learning
1.3 Overview and Roadmap
1.3.1 Overview of Chapters
1.3.2 Reader's Guide
1.3.3 Connection to Other Disciplines
1.4 Historical Notes
2 Foundations
2.1 Probability Theory
2.1.1 Probability Distributions
2.1.2 Basic Concepts in Probability
2.1.3 Random Variables and Joint Distributions
2.1.4 Independence and Conditional Independence
2.1.5 Querying a Distribution
2.1.6 Continuous Spaces
2.1.7 Expectation and Variance
2.2 Graphs
2.2.1 Nodes and Edges
2.2.2 Subgraphs
2.2.3 Paths and Trails
2.2.4 Cycles and Loops
2.3 Relevant Literature
2.4 Exercises
I Representation
3 The Bayesian Network Representation
3.1 Exploiting Independence Properties
3.1.1 Independent Random Variables
3.1.2 The Conditional Parameterization
3.1.3 The Naive Bayes Model
3.2 Bayesian Networks
3.2.1 The Student Example Revisited
3.2.2 Basic Independencies in Bayesian Networks
3.2.3 Graphs and Distributions
3.3 Independencies in Graphs
3.3.1 D-separation
3.3.2 Soundness and Completeness
3.3.3 An Algorithm for d-Separation
3.3.4 I-Equivalence
3.4 From Distributions to Graphs
3.4.1 Minimal I-Maps
3.4.2 Perfect Maps
3.4.3 Finding Perfect Maps
3.5 Summary
3.6 Relevant Literature
3.7 Exercises
4 Undirected Graphical Models
4.1 The Misconception Example
4.2 Parameterization
4.2.1 Factors
4.2.2 Gibbs Distributions and Markov Networks
4.2.3 Reduced Markov Networks
4.3 Markov Network Independencies
4.3.1 Basic Independencies
4.3.2 Independencies Revisited
4.3.3 From Distributions to Graphs
4.4 Parameterization Revisited
4.4.1 Finer-Grained Parameterization
4.4.2 Overparameterization
4.5 Bayesian Networks and Markov Networks
4.5.1 From Bayesian Networks to Markov Networks
4.5.2 From Markov Networks to Bayesian Networks
4.5.3 Chordal Graphs
4.6 Partially Directed Models
4.6.1 Conditional Random Fields
4.6.2 Chain Graph Models
4.7 Summary and Discussion
4.8 Relevant Literature
4.9 Exercises
5 Local Probabilistic Models
5.1 Tabular CPDs
5.2 Deterministic CPDs
5.2.1 Representation
5.2.2 Independencies
5.3 Context-Specific CPDs
5.3.1 Representation
5.3.2 Independencies
5.4 Independence of Causal Influence
5.4.1 The Noisy-Or Model
5.4.2 Generalized Linear Models
5.4.3 The General Formulation
5.4.4 Independencies
5.5 Continuous Variables
5.5.1 Hybrid Models
5.6 Conditional Bayesian Networks
5.7 Summary
5.8 Relevant Literature
5.9 Exercises
6 Template-Based Representations
6.1 Introduction
6.2 Temporal Models
6.2.1 Basic Assumptions
6.2.2 Dynamic Bayesian Networks
6.2.3 State-Observation Models
6.3 Template Variables and Template Factors
6.4 Directed Probabilistic Models for Object-Relational Domains
6.4.1 Plate Models
6.4.2 Probabilistic Relational Models
6.5 Undirected Representation
6.6 Structural Uncertainty
6.6.1 Relational Uncertainty
6.6.2 Object Uncertainty
6.7 Summary
6.8 Relevant Literature
6.9 Exercises
7 Gaussian Network Models
7.1 Multivariate Gaussians
7.1.1 Basic Parameterization
7.1.2 Operations on Gaussians
7.1.3 Independencies in Gaussians
7.2 Gaussian Bayesian Networks
7.3 Gaussian Markov Random Fields
7.4 Summary
7.5 Relevant Literature
7.6 Exercises
8 The Exponential Family
8.1 Introduction
8.2 Exponential Families
8.2.1 Linear Exponential Families
8.3 Factored Exponential Families
8.3.1 Product Distributions
8.3.2 Bayesian Networks
8.4 Entropy and Relative Entropy
8.4.1 Entropy
8.4.2 Relative Entropy
8.5 Projections
8.5.1 Comparison
8.5.2 M-Projections
8.5.3 I-Projections
8.6 Summary
8.7 Relevant Literature
8.8 Exercises
II Inference
9 Exact Inference: Variable Elimination
9.1 Analysis of Complexity
9.1.1 Analysis of Exact Inference
9.1.2 Analysis of Approximate Inference
9.2 Variable Elimination: The Basic Ideas
9.3 Variable Elimination
9.3.1 Basic Elimination
9.3.2 Dealing with Evidence
9.4 Complexity and Graph Structure: Variable Elimination
9.4.1 Simple Analysis
9.4.2 Graph-Theoretic Analysis
9.4.3 Finding Elimination Orderings
9.5 Conditioning
9.5.1 The Conditioning Algorithm
9.5.2 Conditioning and Variable Elimination
9.5.3 Graph-Theoretic Analysis
9.5.4 Improved Conditioning
9.6 Inference with Structured CPDs
9.6.1 Independence of Causal Influence
9.6.2 Context-Specific Independence
9.6.3 Discussion
9.7 Summary and Discussion
9.8 Relevant Literature
9.9 Exercises
10 Exact Inference: Clique Trees
10.1 Variable Elimination and Clique Trees
10.1.1 Cluster Graphs
10.1.2 Clique Trees
10.2 Message Passing: Sum Product
10.2.1 Variable Elimination in a Clique Tree
10.2.2 Clique Tree Calibration
10.2.3 A Calibrated Clique Tree as a Distribution
10.3 Message Passing: Belief Update
10.3.1 Message Passing with Division
10.3.2 Equivalence of Sum-Product and Belief Update Messages
10.3.3 Answering Queries
10.4 Constructing a Clique Tree
10.4.1 Clique Trees from Variable Elimination
10.4.2 Clique Trees from Chordal Graphs
10.5 Summary
10.6 Relevant Literature
10.7 Exercises
11 Inference as Optimization
11.1 Introduction
11.1.1 Exact Inference Revisited
11.1.2 The Energy Functional
11.1.3 Optimizing the Energy Functional
11.2 Exact Inference as Optimization
11.2.1 Fixed-Point Characterization
11.2.2 Inference as Optimization
11.3 Propagation-Based Approximation
11.3.1 A Simple Example
11.3.2 Cluster-Graph Belief Propagation
11.3.3 Properties of Cluster-Graph Belief Propagation
11.3.4 Analyzing Convergence
11.3.5 Constructing Cluster Graphs
11.3.6 Variational Analysis
11.3.7 Other Entropy Approximations
11.3.8 Discussion
11.4 Propagation with Approximate Messages
11.4.1 Factorized Messages
11.4.2 Approximate Message Computation
11.4.3 Inference with Approximate Messages
11.4.4 Expectation Propagation
11.4.5 Variational Analysis
11.4.6 Discussion
11.5 Structured Variational Approximations
11.5.1 The Mean Field Approximation
11.5.2 Structured Approximations
11.5.3 Local Variational Methods
11.6 Summary and Discussion
11.7 Relevant Literature
11.8 Exercises
12 Particle-Based Approximate Inference
12.1 Forward Sampling
12.1.1 Sampling from a Bayesian Network
12.1.2 Analysis of Error
12.1.3 Conditional Probability Queries
12.2 Likelihood Weighting and Importance Sampling
12.2.1 Likelihood Weighting: Intuition
12.2.2 Importance Sampling
12.2.3 Importance Sampling for Bayesian Networks
12.2.4 Importance Sampling Revisited
12.3 Markov Chain Monte Carlo Methods
12.3.1 Gibbs Sampling Algorithm
12.3.2 Markov Chains
12.3.3 Gibbs Sampling Revisited
12.3.4 A Broader Class of Markov Chains
12.3.5 Using a Markov Chain
12.4 Collapsed Particles
12.4.1 Collapsed Likelihood Weighting
12.4.2 Collapsed MCMC
12.5 Deterministic Search Methods
12.6 Summary
12.7 Relevant Literature
12.8 Exercises
13 MAP Inference
13.1 Overview
13.1.1 Computational Complexity
13.1.2 Overview of Solution Methods
13.2 Variable Elimination for (Marginal) MAP
13.2.1 Max-Product Variable Elimination
13.2.2 Finding the Most Probable Assignment
13.2.3 Variable Elimination for Marginal MAP
13.3 Max-Product in Clique Trees
13.3.1 Computing Max-Marginals
13.3.2 Message Passing as Reparameterization
13.3.3 Decoding Max-Marginals
13.4 Max-Product Belief Propagation in Loopy Cluster Graphs
13.4.1 Standard Max-Product Message Passing
13.4.2 Max-Product BP with Counting Numbers
13.4.3 Discussion
13.5 MAP as a Linear Optimization Problem
13.5.1 The Integer Program Formulation
13.5.2 Linear Programming Relaxation
13.5.3 Low-Temperature Limits
13.6 Using Graph Cuts for MAP
13.6.1 Inference Using Graph Cuts
13.6.2 Nonbinary Variables
13.7 Local Search Algorithms
13.8 Summary
13.9 Relevant Literature
13.10 Exercises
14 Inference in Hybrid Networks
14.1 Introduction
14.1.1 Challenges
14.1.2 Discretization
14.1.3 Overview
14.2 Variable Elimination in Gaussian Networks
14.2.1 Canonical Forms
14.2.2 Sum-Product Algorithms
14.2.3 Gaussian Belief Propagation
14.3 Hybrid Networks
14.3.1 The Difficulties
14.3.2 Factor Operations for Hybrid Gaussian Networks
14.3.3 EP for CLG Networks
14.3.4 An "Exact" CLG Algorithm
14.4 Nonlinear Dependencies
14.4.1 Linearization
14.4.2 Expectation Propagation with Gaussian Approximation
14.5 Particle-Based Approximation Methods
14.5.1 Sampling in Continuous Spaces
14.5.2 Forward Sampling in Bayesian Networks
14.5.3 MCMC Methods
14.5.4 Collapsed Particles
14.5.5 Nonparametric Message Passing
14.6 Summary and Discussion
14.7 Relevant Literature
14.8 Exercises
15 Inference in Temporal Models
15.1 Inference Tasks
15.2 Exact Inference
15.2.1 Filtering in State-Observation Models
15.2.2 Filtering as Clique Tree Propagation
15.2.3 Clique Tree Inference in DBNs
15.2.4 Entanglement
15.3 Approximate Inference
15.3.1 Key Ideas
15.3.2 Factored Belief State Methods
15.3.3 Particle Filtering
15.3.4 Deterministic Search Techniques
15.4 Hybrid DBNs
15.4.1 Continuous Models
15.4.2 Hybrid Models
15.5 Summary
15.6 Relevant Literature
15.7 Exercises
III Learning
16 Learning Graphical Models: Overview
16.1 Motivation
16.2 Goals of Learning
16.2.1 Density Estimation
16.2.2 Specific Prediction Tasks
16.2.3 Knowledge Discovery
16.3 Learning as Optimization
16.3.1 Empirical Risk and Overfitting
16.3.2 Discriminative versus Generative Training
16.4 Learning Tasks
16.4.1 Model Constraints
16.4.2 Data Observability
16.4.3 Taxonomy of Learning Tasks
16.5 Relevant Literature
17 Parameter Estimation
17.1 Maximum Likelihood Estimation
17.1.1 The Thumbtack Example
17.1.2 The Maximum Likelihood Principle
17.2 MLE for Bayesian Networks
17.2.1 A Simple Example
17.2.2 Global Likelihood Decomposition
17.2.3 Table-CPDs
17.2.4 Gaussian Bayesian Networks
17.2.5 Maximum Likelihood Estimation as M-Projection
17.3 Bayesian Parameter Estimation
17.3.1 The Thumbtack Example Revisited
17.3.2 Priors and Posteriors
17.4 Bayesian Parameter Estimation in Bayesian Networks
17.4.1 Parameter Independence and Global Decomposition
17.4.2 Local Decomposition
17.4.3 Priors for Bayesian Network Learning
17.4.4 MAP Estimation
17.5 Learning Models with Shared Parameters
17.5.1 Global Parameter Sharing
17.5.2 Local Parameter Sharing
17.5.3 Bayesian Inference with Shared Parameters
17.5.4 Hierarchical Priors
17.6 Generalization Analysis
17.6.1 Asymptotic Analysis
17.6.2 PAC-Bounds
17.7 Summary
17.8 Relevant Literature
17.9 Exercises
18 Structure Learning in Bayesian Networks
18.1 Introduction
18.1.1 Problem Definition
18.1.2 Overview of Methods
18.2 Constraint-Based Approaches
18.2.1 General Framework
18.2.2 Independence Tests
18.3 Structure Scores
18.3.1 Likelihood Scores
18.3.2 Bayesian Score
18.3.3 Marginal Likelihood for a Single Variable
18.3.4 Bayesian Score for Bayesian Networks
18.3.5 Understanding the Bayesian Score
18.3.6 Priors
18.3.7 Score Equivalence
18.4 Structure Search
18.4.1 Learning Tree-Structured Networks
18.4.2 Known Order
18.4.3 General Graphs
18.4.4 Learning with Equivalence Classes
18.5 Bayesian Model Averaging
18.5.1 Basic Theory
18.5.2 Model Averaging Given an Order
18.5.3 The General Case
18.6 Learning Models with Additional Structure
18.6.1 Learning with Local Structure
18.6.2 Learning Template Models
18.7 Summary and Discussion
18.8 Relevant Literature
18.9 Exercises
19 Partially Observed Data
19.1 Foundations
19.1.1 Likelihood of Data and Observation Models
19.1.2 Decoupling of Observation Mechanism
19.1.3 The Likelihood Function
19.1.4 Identifiability
19.2 Parameter Estimation
19.2.1 Gradient Ascent
19.2.2 Expectation Maximization (EM)
19.2.3 Comparison: Gradient Ascent versus EM
19.2.4 Approximate Inference
19.3 Bayesian Learning with Incomplete Data
19.3.1 Overview
19.3.2 MCMC Sampling
19.3.3 Variational Bayesian Learning
19.4 Structure Learning
19.4.1 Scoring Structures
19.4.2 Structure Search
19.4.3 Structural EM
19.5 Learning Models with Hidden Variables
19.5.1 Information Content of Hidden Variables
19.5.2 Determining the Cardinality
19.5.3 Introducing Hidden Variables
19.6 Summary
19.7 Relevant Literature
19.8 Exercises
20 Learning Undirected Models
20.1 Overview
20.2 The Likelihood Function
20.2.1 An Example
20.2.2 Form of the Likelihood Function
20.2.3 Properties of the Likelihood Function
20.3 Maximum (Conditional) Likelihood Parameter Estimation
20.3.1 Maximum Likelihood Estimation
20.3.2 Conditionally Trained Models
20.3.3 Learning with Missing Data
20.3.4 Maximum Entropy and Maximum Likelihood
20.4 Parameter Priors and Regularization
20.4.1 Local Priors
20.4.2 Global Priors
20.5 Learning with Approximate Inference
20.5.1 Belief Propagation
20.5.2 MAP-Based Learning
20.6 Alternative Objectives
20.6.1 Pseudolikelihood and Its Generalizations
20.6.2 Contrastive Optimization Criteria
20.7 Structure Learning
20.7.1 Structure Learning Using Independence Tests
20.7.2 Score-Based Learning: Hypothesis Spaces
20.7.3 Objective Functions
20.7.4 Optimization Task
20.7.5 Evaluating Changes to the Model
20.8 Summary
20.9 Relevant Literature
20.10 Exercises
IV Actions and Decisions
21 Causality
21.1 Motivation and Overview
21.1.1 Conditioning and Intervention
21.1.2 Correlation and Causation
21.2 Causal Models
21.3 Structural Causal Identifiability
21.3.1 Query Simplification Rules
21.3.2 Iterated Query Simplification
21.4 Mechanisms and Response Variables
21.5 Partial Identifiability in Functional Causal Models
21.6 Counterfactual Queries
21.6.1 Twinned Networks
21.6.2 Bounds on Counterfactual Queries
21.7 Learning Causal Models
21.7.1 Learning Causal Models without Confounding Factors
21.7.2 Learning from Interventional Data
21.7.3 Dealing with Latent Variables
21.7.4 Learning Functional Causal Models
21.8 Summary
21.9 Relevant Literature
21.10 Exercises
22 Utilities and Decisions
22.1 Foundations: Maximizing Expected Utility
22.1.1 Decision Making Under Uncertainty
22.1.2 Theoretical Justification
22.2 Utility Curves
22.2.1 Utility of Money
22.2.2 Attitudes Toward Risk
22.2.3 Rationality
22.3 Utility Elicitation
22.3.1 Utility Elicitation Procedures
22.3.2 Utility of Human Life
22.4 Utilities of Complex Outcomes
22.4.1 Preference and Utility Independence
22.4.2 Additive Independence Properties
22.5 Summary
22.6 Relevant Literature
22.7 Exercises
23 Structured Decision Problems
23.1 Decision Trees
23.1.1 Representation
23.1.2 Backward Induction Algorithm
23.2 Influence Diagrams
23.2.1 Basic Representation
23.2.2 Decision Rules
23.2.3 Time and Recall
23.2.4 Semantics and Optimality Criterion
23.3 Backward Induction in Influence Diagrams
23.3.1 Decision Trees for Influence Diagrams
23.3.2 Sum-Max-Sum Rule
23.4 Computing Expected Utilities
23.4.1 Simple Variable Elimination
23.4.2 Multiple Utility Variables: Simple Approaches
23.4.3 Generalized Variable Elimination
23.5 Optimization in Influence Diagrams
23.5.1 Optimizing a Single Decision Rule
23.5.2 Iterated Optimization Algorithm
23.5.3 Strategic Relevance and Global Optimality
23.6 Ignoring Irrelevant Information
23.7 Value of Information
23.7.1 Single Observations
23.7.2 Multiple Observations
23.8 Summary
23.9 Relevant Literature
23.10 Exercises
24 Epilogue
A Background Material
A.1 Information Theory
A.1.1 Compression and Entropy
A.1.2 Conditional Entropy and Information
A.1.3 Relative Entropy and Distances Between Distributions
A.2 Convergence Bounds
A.2.1 Central Limit Theorem
A.2.2 Convergence Bounds
A.3 Algorithms and Algorithmic Complexity
A.3.1 Basic Graph Algorithms
A.3.2 Analysis of Algorithmic Complexity
A.3.3 Dynamic Programming
A.3.4 Complexity Theory
A.4 Combinatorial Optimization and Search
A.4.1 Optimization Problems
A.4.2 Local Search
A.4.3 Branch and Bound Search
A.5 Continuous Optimization
A.5.1 Characterizing Optima of a Continuous Function
A.5.2 Gradient Ascent Methods
A.5.3 Constrained Optimization
A.5.4 Convex Duality
Bibliography
Notation Index
Subject Index

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Probabilistic Graphical Models

Description

More details

Other editions

Additional editions

Persons

Content

System requirements