Chapter 1: Bayesian network
A Bayesian network is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph. It is also known as a Bayes network, Bayes net, belief network, or decision network. Other names for this type of model include a Bayes network and a Bayes net (DAG). Bayesian networks are perfect for determining the probability that any one of multiple potential known causes was the contributing element in an event that has already taken place and making a prediction based on that likelihood. For instance, the probabilistic links that exist between illnesses and symptoms may be represented by a Bayesian network. The network may be used to calculate the odds of the existence of a variety of illnesses based on the symptoms that are provided.
Bayesian networks are able to do inference and learning with the use of efficient algorithms. Bayesian networks that model sequences of variables (such as protein sequences or voice signals, for example) are referred to as dynamic Bayesian networks. Generalizations of Bayesian networks that may depict decision issues under uncertainty and provide solutions to such problems are referred to as influence diagrams.
Formally, Bayesian networks are a kind of directed acyclic graph (DAG), and the nodes of a Bayesian network represent variables in the Bayesian sense. These variables might be observable things, latent variables, parameters or theories that are not known.
Edges represent conditional dependencies; nodes that are not linked to one another (meaning that there is no route that leads from one node to another) reflect variables that are conditionally independent of one another.
Each node is connected to a probability function that considers the inputs it receives, as input, a specific collection of values for each of the node's parent variables, It produces (as the result) the probability (or the probability distribution), if relevant) of the variable that the node is attempting to represent.
For example, if
parent nodes represent
Boolean variables, then the probability function could be represented by a table of
entries, one entry for each of the
possible parent combinations.
It's possible to apply comparable concepts to undirected, and maybe even cyclic, graphs like Markov networks also count.
Let's use an example to drive home the points about what a Bayesian network is and how it works. Let's say we wish to represent the dependencies that exist between three variables: the sprinkler (or, more accurately, its state, which refers to whether it is on or off), the presence or absence of rain, and whether or not the grass is moist. It is important to note that there are two things that might cause the grass to get wet: an operating sprinkler or rain. The effectiveness of the sprinkler system is directly influenced by the rain (namely that when it rains, the sprinkler usually is not active). A Bayesian network is an appropriate modeling tool for this scenario (shown to the right). Each variable may take on the values T (which stands for true) or F (which stands for false) (for false).
The chain rule of probability may be used to get the joint probability function as,
where G stands for "Grass wet (true/false)", S stands for "Sprinkler switched on (true/false)," and R stands for "Raining (true/false)".
The model is able to provide answers to queries about the existence of a cause given the existence of an effect (also known as the "inverse probability"), such as "What is the chance that it is raining, given that the grass is wet?" with the use of the formula for conditional probabilities and the accumulation of all nuisance factors:
Using the expansion for the joint probability function
and the conditional probabilities from the conditional probability tables (CPTs) stated in the diagram, One is able to do an analysis on each phrase that is included in the sums in the numerator and denominator.
For example,
The subsequent step is to write down the numerical results, which are then subscripted by the related variable values.
In order to provide a solution to an interventional inquiry, such as "What is the chance that it will rain, given that we drenched the grass?" The solution is determined by the joint distribution function after the intervention.
obtained by removing the factor
from the pre-intervention distribution.
The use of the do operator ensures that the value of G is always true.
The action will not have any impact on the likelihood of precipitation:
In order to estimate the results of activating the sprinkler system:
with the term
removed, demonstrating that the motion does not have an effect on the rain but does have an effect on the grass.
Given the existence of unseen factors, it is possible that these predictions cannot be realized, as is the case with the majority of policy assessment issues.
The effect of the action
can still be predicted, however, Once the criteria established via the back door are met.
(or blocks) any and all other routes leading from X to Y, then
One definition of a back-door road is one that ultimately leads to the letter X.
Sets that are "adequate" or "admissible" are the terms used to describe those that meet the back-door condition. For example, For the purpose of anticipating the influence of S = T on G, the set Z = R may be considered, because R d-separates the (only) back-door path S R G.
However, in the event that S is not seen, There is no other set that d-separates this route, and the impact that activating the sprinkler (S = T) will have on the grass (G) cannot be deduced from observing the grass passively.
In this particular scenario, the variable P(G | do(S = T)) cannot be "recognized.".
This is an indication of the fact that, a paucity of data about interventions, The observed reliance between S and G may be explained by a causal relationship, or it can be explained as spurious (an apparent dependence coming from a shared source), R).
(see Simpson's dilemma)
By using the three principles of "do-calculus," one is able to establish whether or not a causal link may be discovered based on an arbitrary Bayesian network that contains unobserved variables.
When compared to using extensive probability tables, using a Bayesian network may save a large amount of memory, if there are few dependencies in the joint distribution, it is said to have a sparse structure.
For example, a naive way of storing the conditional probabilities of 10 two-valued variables as a table requires storage space for
values.
If the local distribution of no variable is dependent on more than three of its parent variables, then, the Bayesian network representation stores at most
values.
One of the benefits of Bayesian networks is that it is more intuitively simple for a human to grasp (a sparse collection of) local distributions and direct relationships than it is to understand whole joint distributions.
Bayesian networks are responsible for three primary types of inference work:
It is possible to use a Bayesian network to answer probabilistic questions about its variables and the interactions between them since it is a comprehensive model for those variables and relationships. For instance, the network may be used to revise one's understanding of the current state of a subset of variables if another set of data, known as the evidence variables, is seen. Probabilistic inference refers to the process of determining the posterior distribution of variables based on the evidence that has been presented. When selecting values for the variable subset that minimize some expected loss function, such as the probability of decision error, the posterior gives a statistic that is universally sufficient for use in detection applications. This statistic can be used to determine whether or not an event has been detected. One way to think of a Bayesian network is as a method for automatically applying Bayes' theorem to difficult tasks.
The most common exact inference methods include recursive conditioning and AND/OR search, which allow for a space-time tradeoff and match the efficiency of variable elimination. clique tree propagation, which caches the computation so that many variables can be queried at once and new evidence can be propagated quickly. variable elimination, which eliminates (by integration or summation) the non-observed non-query variables one by one by distributing the sum over the product. The complexity of each of these approaches increases at a rate proportional to the treewidth of the network. The most widely used techniques for approximation inference are called mini-bucket elimination, significance sampling, stochastic MCMC simulation, loopy belief propagation, extended belief propagation, and variational approaches.
It is required, in order to properly describe the joint probability distribution and to fully define the Bayesian network, to state for each node X the probability distribution for X conditional upon X's parents. This is done in order to fully specify the Bayesian network. There are many possible configurations for how X will be distributed depending on its parents. Due to the fact that working with discrete or Gaussian distributions makes computations easier, it is usual practice to do so. Sometimes just the limitations on distribution are known; in this case, one may apply the concept of maximum entropy to choose a single distribution, which will be the one that has the most entropy given the constraints....