
Probabilistic Graphical Models
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
Most tasks require a person or an automated system to reason—to reach conclusions based on available information. The framework of probabilistic graphical models, presented in this book, provides a general approach for this task. The approach is model-based, allowing interpretable models to be constructed and then manipulated by reasoning algorithms. These models can also be learned automatically from data, allowing the approach to be used in cases where manually constructing a model is difficult or even impossible. Because uncertainty is an inescapable aspect of most real-world applications, the book focuses on probabilistic models, which make the uncertainty explicit and provide models that are more faithful to reality.
Probabilistic Graphical Models discusses a variety of models, spanning Bayesian networks, undirected Markov networks, discrete and continuous models, and extensions to deal with dynamical systems and relational data. For each class of models, the text describes the three fundamental cornerstones: representation, inference, and learning, presenting both basic concepts and advanced techniques. Finally, the book considers the use of the proposed framework for causal reasoning and decision making under uncertainty. The main text in each chapter provides the detailed technical development of the key ideas. Most chapters also include boxes with additional material: skill boxes, which describe techniques; case study boxes, which discuss empirical cases related to the approach described in the text, including applications in computer vision, robotics, natural language understanding, and computational biology; and concept boxes, which present significant concepts drawn from the material in the chapter. Instructors (and readers) can group chapters in various combinations, from core topics to more technically advanced material, to suit their particular needs.
More details
Other editions
Additional editions

Persons
Nir Friedman is Professor in the Department of Computer Science and Engineering at Hebrew University.
Content
- Cover
- Copyright
- Contents
- Acknowledgments
- List of Figures
- List of Algorithms
- List of Boxes
- 1 Introduction
- 1.1 Motivation
- 1.2 Structured Probabilistic Models
- 1.2.1 Probabilistic Graphical Models
- 1.2.2 Representation, Inference, Learning
- 1.3 Overview and Roadmap
- 1.3.1 Overview of Chapters
- 1.3.2 Reader's Guide
- 1.3.3 Connection to Other Disciplines
- 1.4 Historical Notes
- 2 Foundations
- 2.1 Probability Theory
- 2.1.1 Probability Distributions
- 2.1.2 Basic Concepts in Probability
- 2.1.3 Random Variables and Joint Distributions
- 2.1.4 Independence and Conditional Independence
- 2.1.5 Querying a Distribution
- 2.1.6 Continuous Spaces
- 2.1.7 Expectation and Variance
- 2.2 Graphs
- 2.2.1 Nodes and Edges
- 2.2.2 Subgraphs
- 2.2.3 Paths and Trails
- 2.2.4 Cycles and Loops
- 2.3 Relevant Literature
- 2.4 Exercises
- I Representation
- 3 The Bayesian Network Representation
- 3.1 Exploiting Independence Properties
- 3.1.1 Independent Random Variables
- 3.1.2 The Conditional Parameterization
- 3.1.3 The Naive Bayes Model
- 3.2 Bayesian Networks
- 3.2.1 The Student Example Revisited
- 3.2.2 Basic Independencies in Bayesian Networks
- 3.2.3 Graphs and Distributions
- 3.3 Independencies in Graphs
- 3.3.1 D-separation
- 3.3.2 Soundness and Completeness
- 3.3.3 An Algorithm for d-Separation
- 3.3.4 I-Equivalence
- 3.4 From Distributions to Graphs
- 3.4.1 Minimal I-Maps
- 3.4.2 Perfect Maps
- 3.4.3 Finding Perfect Maps
- 3.5 Summary
- 3.6 Relevant Literature
- 3.7 Exercises
- 4 Undirected Graphical Models
- 4.1 The Misconception Example
- 4.2 Parameterization
- 4.2.1 Factors
- 4.2.2 Gibbs Distributions and Markov Networks
- 4.2.3 Reduced Markov Networks
- 4.3 Markov Network Independencies
- 4.3.1 Basic Independencies
- 4.3.2 Independencies Revisited
- 4.3.3 From Distributions to Graphs
- 4.4 Parameterization Revisited
- 4.4.1 Finer-Grained Parameterization
- 4.4.2 Overparameterization
- 4.5 Bayesian Networks and Markov Networks
- 4.5.1 From Bayesian Networks to Markov Networks
- 4.5.2 From Markov Networks to Bayesian Networks
- 4.5.3 Chordal Graphs
- 4.6 Partially Directed Models
- 4.6.1 Conditional Random Fields
- 4.6.2 Chain Graph Models
- 4.7 Summary and Discussion
- 4.8 Relevant Literature
- 4.9 Exercises
- 5 Local Probabilistic Models
- 5.1 Tabular CPDs
- 5.2 Deterministic CPDs
- 5.2.1 Representation
- 5.2.2 Independencies
- 5.3 Context-Specific CPDs
- 5.3.1 Representation
- 5.3.2 Independencies
- 5.4 Independence of Causal Influence
- 5.4.1 The Noisy-Or Model
- 5.4.2 Generalized Linear Models
- 5.4.3 The General Formulation
- 5.4.4 Independencies
- 5.5 Continuous Variables
- 5.5.1 Hybrid Models
- 5.6 Conditional Bayesian Networks
- 5.7 Summary
- 5.8 Relevant Literature
- 5.9 Exercises
- 6 Template-Based Representations
- 6.1 Introduction
- 6.2 Temporal Models
- 6.2.1 Basic Assumptions
- 6.2.2 Dynamic Bayesian Networks
- 6.2.3 State-Observation Models
- 6.3 Template Variables and Template Factors
- 6.4 Directed Probabilistic Models for Object-Relational Domains
- 6.4.1 Plate Models
- 6.4.2 Probabilistic Relational Models
- 6.5 Undirected Representation
- 6.6 Structural Uncertainty
- 6.6.1 Relational Uncertainty
- 6.6.2 Object Uncertainty
- 6.7 Summary
- 6.8 Relevant Literature
- 6.9 Exercises
- 7 Gaussian Network Models
- 7.1 Multivariate Gaussians
- 7.1.1 Basic Parameterization
- 7.1.2 Operations on Gaussians
- 7.1.3 Independencies in Gaussians
- 7.2 Gaussian Bayesian Networks
- 7.3 Gaussian Markov Random Fields
- 7.4 Summary
- 7.5 Relevant Literature
- 7.6 Exercises
- 8 The Exponential Family
- 8.1 Introduction
- 8.2 Exponential Families
- 8.2.1 Linear Exponential Families
- 8.3 Factored Exponential Families
- 8.3.1 Product Distributions
- 8.3.2 Bayesian Networks
- 8.4 Entropy and Relative Entropy
- 8.4.1 Entropy
- 8.4.2 Relative Entropy
- 8.5 Projections
- 8.5.1 Comparison
- 8.5.2 M-Projections
- 8.5.3 I-Projections
- 8.6 Summary
- 8.7 Relevant Literature
- 8.8 Exercises
- II Inference
- 9 Exact Inference: Variable Elimination
- 9.1 Analysis of Complexity
- 9.1.1 Analysis of Exact Inference
- 9.1.2 Analysis of Approximate Inference
- 9.2 Variable Elimination: The Basic Ideas
- 9.3 Variable Elimination
- 9.3.1 Basic Elimination
- 9.3.2 Dealing with Evidence
- 9.4 Complexity and Graph Structure: Variable Elimination
- 9.4.1 Simple Analysis
- 9.4.2 Graph-Theoretic Analysis
- 9.4.3 Finding Elimination Orderings
- 9.5 Conditioning
- 9.5.1 The Conditioning Algorithm
- 9.5.2 Conditioning and Variable Elimination
- 9.5.3 Graph-Theoretic Analysis
- 9.5.4 Improved Conditioning
- 9.6 Inference with Structured CPDs
- 9.6.1 Independence of Causal Influence
- 9.6.2 Context-Specific Independence
- 9.6.3 Discussion
- 9.7 Summary and Discussion
- 9.8 Relevant Literature
- 9.9 Exercises
- 10 Exact Inference: Clique Trees
- 10.1 Variable Elimination and Clique Trees
- 10.1.1 Cluster Graphs
- 10.1.2 Clique Trees
- 10.2 Message Passing: Sum Product
- 10.2.1 Variable Elimination in a Clique Tree
- 10.2.2 Clique Tree Calibration
- 10.2.3 A Calibrated Clique Tree as a Distribution
- 10.3 Message Passing: Belief Update
- 10.3.1 Message Passing with Division
- 10.3.2 Equivalence of Sum-Product and Belief Update Messages
- 10.3.3 Answering Queries
- 10.4 Constructing a Clique Tree
- 10.4.1 Clique Trees from Variable Elimination
- 10.4.2 Clique Trees from Chordal Graphs
- 10.5 Summary
- 10.6 Relevant Literature
- 10.7 Exercises
- 11 Inference as Optimization
- 11.1 Introduction
- 11.1.1 Exact Inference Revisited
- 11.1.2 The Energy Functional
- 11.1.3 Optimizing the Energy Functional
- 11.2 Exact Inference as Optimization
- 11.2.1 Fixed-Point Characterization
- 11.2.2 Inference as Optimization
- 11.3 Propagation-Based Approximation
- 11.3.1 A Simple Example
- 11.3.2 Cluster-Graph Belief Propagation
- 11.3.3 Properties of Cluster-Graph Belief Propagation
- 11.3.4 Analyzing Convergence
- 11.3.5 Constructing Cluster Graphs
- 11.3.6 Variational Analysis
- 11.3.7 Other Entropy Approximations
- 11.3.8 Discussion
- 11.4 Propagation with Approximate Messages
- 11.4.1 Factorized Messages
- 11.4.2 Approximate Message Computation
- 11.4.3 Inference with Approximate Messages
- 11.4.4 Expectation Propagation
- 11.4.5 Variational Analysis
- 11.4.6 Discussion
- 11.5 Structured Variational Approximations
- 11.5.1 The Mean Field Approximation
- 11.5.2 Structured Approximations
- 11.5.3 Local Variational Methods
- 11.6 Summary and Discussion
- 11.7 Relevant Literature
- 11.8 Exercises
- 12 Particle-Based Approximate Inference
- 12.1 Forward Sampling
- 12.1.1 Sampling from a Bayesian Network
- 12.1.2 Analysis of Error
- 12.1.3 Conditional Probability Queries
- 12.2 Likelihood Weighting and Importance Sampling
- 12.2.1 Likelihood Weighting: Intuition
- 12.2.2 Importance Sampling
- 12.2.3 Importance Sampling for Bayesian Networks
- 12.2.4 Importance Sampling Revisited
- 12.3 Markov Chain Monte Carlo Methods
- 12.3.1 Gibbs Sampling Algorithm
- 12.3.2 Markov Chains
- 12.3.3 Gibbs Sampling Revisited
- 12.3.4 A Broader Class of Markov Chains
- 12.3.5 Using a Markov Chain
- 12.4 Collapsed Particles
- 12.4.1 Collapsed Likelihood Weighting
- 12.4.2 Collapsed MCMC
- 12.5 Deterministic Search Methods
- 12.6 Summary
- 12.7 Relevant Literature
- 12.8 Exercises
- 13 MAP Inference
- 13.1 Overview
- 13.1.1 Computational Complexity
- 13.1.2 Overview of Solution Methods
- 13.2 Variable Elimination for (Marginal) MAP
- 13.2.1 Max-Product Variable Elimination
- 13.2.2 Finding the Most Probable Assignment
- 13.2.3 Variable Elimination for Marginal MAP
- 13.3 Max-Product in Clique Trees
- 13.3.1 Computing Max-Marginals
- 13.3.2 Message Passing as Reparameterization
- 13.3.3 Decoding Max-Marginals
- 13.4 Max-Product Belief Propagation in Loopy Cluster Graphs
- 13.4.1 Standard Max-Product Message Passing
- 13.4.2 Max-Product BP with Counting Numbers
- 13.4.3 Discussion
- 13.5 MAP as a Linear Optimization Problem
- 13.5.1 The Integer Program Formulation
- 13.5.2 Linear Programming Relaxation
- 13.5.3 Low-Temperature Limits
- 13.6 Using Graph Cuts for MAP
- 13.6.1 Inference Using Graph Cuts
- 13.6.2 Nonbinary Variables
- 13.7 Local Search Algorithms
- 13.8 Summary
- 13.9 Relevant Literature
- 13.10 Exercises
- 14 Inference in Hybrid Networks
- 14.1 Introduction
- 14.1.1 Challenges
- 14.1.2 Discretization
- 14.1.3 Overview
- 14.2 Variable Elimination in Gaussian Networks
- 14.2.1 Canonical Forms
- 14.2.2 Sum-Product Algorithms
- 14.2.3 Gaussian Belief Propagation
- 14.3 Hybrid Networks
- 14.3.1 The Difficulties
- 14.3.2 Factor Operations for Hybrid Gaussian Networks
- 14.3.3 EP for CLG Networks
- 14.3.4 An "Exact" CLG Algorithm
- 14.4 Nonlinear Dependencies
- 14.4.1 Linearization
- 14.4.2 Expectation Propagation with Gaussian Approximation
- 14.5 Particle-Based Approximation Methods
- 14.5.1 Sampling in Continuous Spaces
- 14.5.2 Forward Sampling in Bayesian Networks
- 14.5.3 MCMC Methods
- 14.5.4 Collapsed Particles
- 14.5.5 Nonparametric Message Passing
- 14.6 Summary and Discussion
- 14.7 Relevant Literature
- 14.8 Exercises
- 15 Inference in Temporal Models
- 15.1 Inference Tasks
- 15.2 Exact Inference
- 15.2.1 Filtering in State-Observation Models
- 15.2.2 Filtering as Clique Tree Propagation
- 15.2.3 Clique Tree Inference in DBNs
- 15.2.4 Entanglement
- 15.3 Approximate Inference
- 15.3.1 Key Ideas
- 15.3.2 Factored Belief State Methods
- 15.3.3 Particle Filtering
- 15.3.4 Deterministic Search Techniques
- 15.4 Hybrid DBNs
- 15.4.1 Continuous Models
- 15.4.2 Hybrid Models
- 15.5 Summary
- 15.6 Relevant Literature
- 15.7 Exercises
- III Learning
- 16 Learning Graphical Models: Overview
- 16.1 Motivation
- 16.2 Goals of Learning
- 16.2.1 Density Estimation
- 16.2.2 Specific Prediction Tasks
- 16.2.3 Knowledge Discovery
- 16.3 Learning as Optimization
- 16.3.1 Empirical Risk and Overfitting
- 16.3.2 Discriminative versus Generative Training
- 16.4 Learning Tasks
- 16.4.1 Model Constraints
- 16.4.2 Data Observability
- 16.4.3 Taxonomy of Learning Tasks
- 16.5 Relevant Literature
- 17 Parameter Estimation
- 17.1 Maximum Likelihood Estimation
- 17.1.1 The Thumbtack Example
- 17.1.2 The Maximum Likelihood Principle
- 17.2 MLE for Bayesian Networks
- 17.2.1 A Simple Example
- 17.2.2 Global Likelihood Decomposition
- 17.2.3 Table-CPDs
- 17.2.4 Gaussian Bayesian Networks
- 17.2.5 Maximum Likelihood Estimation as M-Projection
- 17.3 Bayesian Parameter Estimation
- 17.3.1 The Thumbtack Example Revisited
- 17.3.2 Priors and Posteriors
- 17.4 Bayesian Parameter Estimation in Bayesian Networks
- 17.4.1 Parameter Independence and Global Decomposition
- 17.4.2 Local Decomposition
- 17.4.3 Priors for Bayesian Network Learning
- 17.4.4 MAP Estimation
- 17.5 Learning Models with Shared Parameters
- 17.5.1 Global Parameter Sharing
- 17.5.2 Local Parameter Sharing
- 17.5.3 Bayesian Inference with Shared Parameters
- 17.5.4 Hierarchical Priors
- 17.6 Generalization Analysis
- 17.6.1 Asymptotic Analysis
- 17.6.2 PAC-Bounds
- 17.7 Summary
- 17.8 Relevant Literature
- 17.9 Exercises
- 18 Structure Learning in Bayesian Networks
- 18.1 Introduction
- 18.1.1 Problem Definition
- 18.1.2 Overview of Methods
- 18.2 Constraint-Based Approaches
- 18.2.1 General Framework
- 18.2.2 Independence Tests
- 18.3 Structure Scores
- 18.3.1 Likelihood Scores
- 18.3.2 Bayesian Score
- 18.3.3 Marginal Likelihood for a Single Variable
- 18.3.4 Bayesian Score for Bayesian Networks
- 18.3.5 Understanding the Bayesian Score
- 18.3.6 Priors
- 18.3.7 Score Equivalence
- 18.4 Structure Search
- 18.4.1 Learning Tree-Structured Networks
- 18.4.2 Known Order
- 18.4.3 General Graphs
- 18.4.4 Learning with Equivalence Classes
- 18.5 Bayesian Model Averaging
- 18.5.1 Basic Theory
- 18.5.2 Model Averaging Given an Order
- 18.5.3 The General Case
- 18.6 Learning Models with Additional Structure
- 18.6.1 Learning with Local Structure
- 18.6.2 Learning Template Models
- 18.7 Summary and Discussion
- 18.8 Relevant Literature
- 18.9 Exercises
- 19 Partially Observed Data
- 19.1 Foundations
- 19.1.1 Likelihood of Data and Observation Models
- 19.1.2 Decoupling of Observation Mechanism
- 19.1.3 The Likelihood Function
- 19.1.4 Identifiability
- 19.2 Parameter Estimation
- 19.2.1 Gradient Ascent
- 19.2.2 Expectation Maximization (EM)
- 19.2.3 Comparison: Gradient Ascent versus EM
- 19.2.4 Approximate Inference
- 19.3 Bayesian Learning with Incomplete Data
- 19.3.1 Overview
- 19.3.2 MCMC Sampling
- 19.3.3 Variational Bayesian Learning
- 19.4 Structure Learning
- 19.4.1 Scoring Structures
- 19.4.2 Structure Search
- 19.4.3 Structural EM
- 19.5 Learning Models with Hidden Variables
- 19.5.1 Information Content of Hidden Variables
- 19.5.2 Determining the Cardinality
- 19.5.3 Introducing Hidden Variables
- 19.6 Summary
- 19.7 Relevant Literature
- 19.8 Exercises
- 20 Learning Undirected Models
- 20.1 Overview
- 20.2 The Likelihood Function
- 20.2.1 An Example
- 20.2.2 Form of the Likelihood Function
- 20.2.3 Properties of the Likelihood Function
- 20.3 Maximum (Conditional) Likelihood Parameter Estimation
- 20.3.1 Maximum Likelihood Estimation
- 20.3.2 Conditionally Trained Models
- 20.3.3 Learning with Missing Data
- 20.3.4 Maximum Entropy and Maximum Likelihood
- 20.4 Parameter Priors and Regularization
- 20.4.1 Local Priors
- 20.4.2 Global Priors
- 20.5 Learning with Approximate Inference
- 20.5.1 Belief Propagation
- 20.5.2 MAP-Based Learning
- 20.6 Alternative Objectives
- 20.6.1 Pseudolikelihood and Its Generalizations
- 20.6.2 Contrastive Optimization Criteria
- 20.7 Structure Learning
- 20.7.1 Structure Learning Using Independence Tests
- 20.7.2 Score-Based Learning: Hypothesis Spaces
- 20.7.3 Objective Functions
- 20.7.4 Optimization Task
- 20.7.5 Evaluating Changes to the Model
- 20.8 Summary
- 20.9 Relevant Literature
- 20.10 Exercises
- IV Actions and Decisions
- 21 Causality
- 21.1 Motivation and Overview
- 21.1.1 Conditioning and Intervention
- 21.1.2 Correlation and Causation
- 21.2 Causal Models
- 21.3 Structural Causal Identifiability
- 21.3.1 Query Simplification Rules
- 21.3.2 Iterated Query Simplification
- 21.4 Mechanisms and Response Variables
- 21.5 Partial Identifiability in Functional Causal Models
- 21.6 Counterfactual Queries
- 21.6.1 Twinned Networks
- 21.6.2 Bounds on Counterfactual Queries
- 21.7 Learning Causal Models
- 21.7.1 Learning Causal Models without Confounding Factors
- 21.7.2 Learning from Interventional Data
- 21.7.3 Dealing with Latent Variables
- 21.7.4 Learning Functional Causal Models
- 21.8 Summary
- 21.9 Relevant Literature
- 21.10 Exercises
- 22 Utilities and Decisions
- 22.1 Foundations: Maximizing Expected Utility
- 22.1.1 Decision Making Under Uncertainty
- 22.1.2 Theoretical Justification
- 22.2 Utility Curves
- 22.2.1 Utility of Money
- 22.2.2 Attitudes Toward Risk
- 22.2.3 Rationality
- 22.3 Utility Elicitation
- 22.3.1 Utility Elicitation Procedures
- 22.3.2 Utility of Human Life
- 22.4 Utilities of Complex Outcomes
- 22.4.1 Preference and Utility Independence
- 22.4.2 Additive Independence Properties
- 22.5 Summary
- 22.6 Relevant Literature
- 22.7 Exercises
- 23 Structured Decision Problems
- 23.1 Decision Trees
- 23.1.1 Representation
- 23.1.2 Backward Induction Algorithm
- 23.2 Influence Diagrams
- 23.2.1 Basic Representation
- 23.2.2 Decision Rules
- 23.2.3 Time and Recall
- 23.2.4 Semantics and Optimality Criterion
- 23.3 Backward Induction in Influence Diagrams
- 23.3.1 Decision Trees for Influence Diagrams
- 23.3.2 Sum-Max-Sum Rule
- 23.4 Computing Expected Utilities
- 23.4.1 Simple Variable Elimination
- 23.4.2 Multiple Utility Variables: Simple Approaches
- 23.4.3 Generalized Variable Elimination
- 23.5 Optimization in Influence Diagrams
- 23.5.1 Optimizing a Single Decision Rule
- 23.5.2 Iterated Optimization Algorithm
- 23.5.3 Strategic Relevance and Global Optimality
- 23.6 Ignoring Irrelevant Information
- 23.7 Value of Information
- 23.7.1 Single Observations
- 23.7.2 Multiple Observations
- 23.8 Summary
- 23.9 Relevant Literature
- 23.10 Exercises
- 24 Epilogue
- A Background Material
- A.1 Information Theory
- A.1.1 Compression and Entropy
- A.1.2 Conditional Entropy and Information
- A.1.3 Relative Entropy and Distances Between Distributions
- A.2 Convergence Bounds
- A.2.1 Central Limit Theorem
- A.2.2 Convergence Bounds
- A.3 Algorithms and Algorithmic Complexity
- A.3.1 Basic Graph Algorithms
- A.3.2 Analysis of Algorithmic Complexity
- A.3.3 Dynamic Programming
- A.3.4 Complexity Theory
- A.4 Combinatorial Optimization and Search
- A.4.1 Optimization Problems
- A.4.2 Local Search
- A.4.3 Branch and Bound Search
- A.5 Continuous Optimization
- A.5.1 Characterizing Optima of a Continuous Function
- A.5.2 Gradient Ascent Methods
- A.5.3 Constrained Optimization
- A.5.4 Convex Duality
- Bibliography
- Notation Index
- Subject Index
System requirements
File format: ePUB
Copy protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (not Kindle).
The file format ePub works well for novels and non-fiction books – i.e., „flowing” text without complex layout. On an e-reader or smartphone, line and page breaks automatically adjust to fit the small displays.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our ebook Help page.