Artificial Intelligence. A Modern Approach [Global Edition]

Stuart Russell, Peter Norvig

Chapter 14 Probabilistic Reasoning - all with Video Answers

Educators

Chapter Questions

Problem 1

We have a bag of three biased coins $a, b$, and $c$ with probabilities of coming up heads of $30 \%, 60 \%$, and $75 \%$, respectively. One coin is drawn randomly from the bag (with equal likelihood of drawing each of the three coins), and then the coin is flipped three times to generate the outcomes $X_1, X_2$, and $X_3$.
a. Draw the Bayesian network corresponding to this setup and define the necessary CPTs.
b. Calculate which coin was most likely to have been drawn from the bag if the observed flips come out heads twice and tails once.

Joshua Sieverding

Numerade Educator

01:40

Problem 2

Equation (14.1) on page 513 defines the joint distribution represented by a Bayesian network in terms of the parameters $\theta\left(X_i \mid \operatorname{Parents}\left(X_i\right)\right)$. This exercise asks you to derive the equivalence between the parameters and the conditional probabilities $\mathbf{P}\left(X_i \mid \operatorname{Parents}\left(X_i\right)\right)$ from this definition.
a. Consider a simple network $X \rightarrow Y \rightarrow Z$ with three Boolean variables. Use Equations (13.3) and (13.6) (pages 485 and 492) to express the conditional probability $P(z \mid y)$ as the ratio of two sums, each over entries in the joint distribution $\mathbf{P}(X, Y, Z)$.
b. Now use Equation (14.1) to write this expression in terms of the network parameters $\theta(X), \theta(Y \mid X)$, and $\theta(Z \mid Y)$.
c. Next, expand out the summations in your expression from part (b), writing out explicitly the terms for the true and false values of each summed variable. Assuming that all network parameters satisfy the constraint $\sum_{x_i} \theta\left(x_i \mid\right.$ parents $\left.\left(X_i\right)\right)=1$, show that the resulting expression reduces to $\theta(x \mid y)$.
d. Generalize this derivation to show that $\theta\left(X_i \mid\right.$ Parents $\left.\left(X_i\right)\right)=\mathbf{P}\left(X_i \mid\right.$ Parents $\left.\left(X_i\right)\right)$ for any Bayesian network.

Manik Pulyani

Numerade Educator

01:40

Problem 3

The operation of are reversal in a Bayesian network allows us to change the direction of an arc $X \rightarrow Y$ while preserving the joint probability distribution that the network represents (Shachter, 1986). Arc reversal may require introducing new arcs: all the parents of $X$ also become parents of $Y$, and all parents of $Y$ also become parents of $X$.
a. Assume that $X$ and $Y$ start with $m$ and $n$ parents, respectively, and that all variables have $k$ values. By calculating the change in size for the CPTs of $X$ and $Y$, show that the total number of parameters in the network cannot decrease during are reversal. (Hint: the parents of $X$ and $Y$ need not be disjoint.)
b. Under what circumstances can the total number remain constant?
c. Let the parents of $X$ be $\mathbf{U} \cup \mathbf{V}$ and the parents of $Y$ be $\mathbf{V} \cup \mathbf{W}$, where $\mathbf{U}$ and $\mathbf{W}$ are disjoint. The formulas for the new CPTs after arc reversal are as follows:
$$
\begin{aligned}
\mathbf{P}(Y \mid \mathbf{U}, \mathbf{V}, \mathbf{W}) & =\sum_x \mathbf{P}(Y \mid \mathbf{V}, \mathbf{W}, x) \mathbf{P}(x \mid \mathbf{U}, \mathbf{V}) \\
\mathbf{P}(X \mid \mathbf{U}, \mathbf{V}, \mathbf{W}, Y) & =\mathbf{P}(Y \mid X, \mathbf{V}, \mathbf{W}) \mathbf{P}(X \mid \mathbf{U}, \mathbf{V}) / \mathbf{P}(Y \mid \mathbf{U}, \mathbf{V}, \mathbf{W}) .
\end{aligned}
$$

Prove that the new network expresses the same joint distribution over all variables as the original network.

Manik Pulyani

Numerade Educator

01:57

Problem 4

Consider the Bayesian network in Figure 14.2.
a. If no evidence is observed, are Burglary and Earthquake independent? Prove this from the numerical semantics and from the topological semantics.
b. If we observe Alarm = true, are Burglary and Earthquake independent? Justify your answer by calculating whether the probabilities involved satisfy the definition of conditional independence.

Sneha Ravi

Numerade Educator

04:14

Problem 5

Let $H_x$ be a random variable denoting the handedness of an individual $x$, with possible values $l$ or $r$. A common hypothesis is that left- or right-handedness is inherited by a simple mechanism; that is, perhaps there is a gene $G_x$, also with values $l$ or $r$, and perhaps actual
handedness turns out mostly the same (with some probability $s$ ) as the gene an individual possesses. Furthermore, perhaps the gene itself is equally likely to be inherited from either of an individual's parents, with a small nonzero probability $m$ of a random mutation flipping the handedness.
a. Which of the three networks in Figure 14.20 claim that $\mathbf{P}\left(G_{\text {father }}, G_{\text {mother }}, G_{\text {child }}\right)=$ $\mathbf{P}\left(G_{\text {father }}\right) \mathbf{P}\left(G_{\text {mother }}\right) \mathbf{P}\left(G_{\text {child }}\right)$ ?
b. Which of the three networks make independence claims that are consistent with the hypothesis about the inheritance of handedness?
c. Which of the three networks is the best description of the hypothesis?
d. Write down the CPT for the $G_{\text {child }}$ node in network (a), in terms of $s$ and $m$.
e. Suppose that $P\left(G_{\text {father }}=l\right)=P\left(G_{\text {mother }}=l\right)=q$. In network (a), derive an expression for $P\left(G_{\text {child }}=l\right)$ in terms of $m$ and $q$ only, by conditioning on its parent nodes.
f. Under conditions of genetic equilibrium, we expect the distribution of genes to be the same across generations. Use this to calculate the value of $q$, and, given what you know about handedness in humans, explain why the hypothesis described at the beginning of this question must be wrong.

Harsh Gadhiya

Numerade Educator

Problem 6

The Markov blanket of a variable is defined on page 517. Prove that a variable is independent of all other variables in the network, given its Markov blanket and derive Equation (14.12) (page 538).

Check back soon!

01:53

Problem 7

Consider the network for car diagnosis shown in Figure 14.21.
a. Extend the network with the Boolean variables IcyWeather and StarterMotor.
b. Give reasonable conditional probability tables for all the nodes.
c. How many independent values are contained in the joint probability distribution for eight Boolean nodes, assuming that no conditional independence relations are known to hold among them?
d. How many independent probability values do your network tables contain?
e. The conditional distribution for Starts could be described as a noisy-AND distribution. Define this family in general and relate it to the noisy-OR distribution.

Hast Aggarwal

Numerade Educator

Problem 8

Consider a simple Bayesian network with root variables Cold, Flu, and Malaria and child variable Fever, with a noisy-OR conditional distribution for Fever as described in Section 14.3. By adding appropriate auxiliary variables for inhibition events and fever-inducing events, construct an equivalent Bayesian network whose CPTs (except for root variables) are deterministic. Define the CPTs and prove equivalence.

Check back soon!

11:10

Problem 9

Consider the family of linear Gaussian networks, as defined on page 520 .
a. In a two-variable network, let $X_1$ be the parent of $X_2$, let $X_1$ have a Gaussian prior, and let $\mathbf{P}\left(X_2 \mid X_1\right)$ be a linear Gaussian distribution. Show that the joint distribution $P\left(X_1, X_2\right)$ is a multivariate Gaussian, and calculate its covariance matrix.
b. Prove by induction that the joint distribution for a general linear Gaussian network on $X_1, \ldots, X_n$ is also a multivariate Gaussian.

Abhirup Pal

Numerade Educator

05:45

Problem 10

The probit distribution defined on page 522 describes the probability distribution for a Boolean child, given a single continuous parent.
a. How might the definition be extended to cover multiple continuous parents?
b. How might it be extended to handle a multivalued child variable? Consider both cases where the child's values are ordered (as in selecting a gear while driving, depending on speed, slope, desired acceleration, etc.) and cases where they are unordered (as in selecting bus, train, or car to get to work). (Hint: Consider ways to divide the possible values into two sets, to mimic a Boolean variable.)

Robin Corrigan

Numerade Educator

06:06

Problem 11

In your local nuclear power station, there is an alarm that senses when a temperature gauge exceeds a given threshold. The gauge measures the temperature of the core. Consider the Boolean variables $A$ (alarm sounds), $F_A$ (alarm is faulty), and $F_G$ (gauge is faulty) and the multivalued nodes $G$ (gauge reading) and $T$ (actual core temperature).
a. Draw a Bayesian network for this domain, given that the gauge is more likely to fail when the core temperature gets too high.
b. Is your network a polytree? Why or why not?
c. Suppose there are just two possible actual and measured temperatures, normal and high; the probability that the gauge gives the correct temperature is $x$ when it is working, but $y$ when it is faulty. Give the conditional probability table associated with $G$.
d. Suppose the alarm works correctly unless it is faulty, in which case it never sounds. Give the conditional probability table associated with $A$.
e. Suppose the alarm and gauge are working and the alarm sounds. Calculate an expression for the probability that the temperature of the core is too high, in terms of the various conditional probabilities in the network.

Amany Waheeb

Numerade Educator

07:49

Problem 12

Two astronomers in different parts of the world make measurements $M_1$ and $M_2$ of the number of stars $N$ in some small region of the sky, using their telescopes. Normally, there is a small possibility $e$ of error by up to one star in each direction. Each telescope can also (with a much smaller probability $f$ ) be badly out of focus (events $F_1$ and $F_2$ ), in which case the scientist will undercount by three or more stars (or if $N$ is less than 3 , fail to detect any stars at all). Consider the three networks shown in Figure 14.22.
a. Which of these Bayesian networks are correct (but not necessarily efficient) representations of the preceding information?
b. Which is the best network? Explain.
c. Write out a conditional distribution for $\mathbf{P}\left(M_1 \mid N\right)$, for the case where $N \in\{1,2,3\}$ and $M_1 \in\{0,1,2,3,4\}$. Each entry in the conditional distribution should be expressed as a function of the parameters $e$ and/or $f$.
d. Suppose $M_1=1$ and $M_2=3$. What are the possible numbers of stars if you assume no prior constraint on the values of $N$ ?
e. What is the most likely number of stars, given these observations? Explain how to compute this, or if it is not possible to compute, explain what additional information is needed and how it would affect the result.

Keshav Singh

Numerade Educator

Problem 13

Consider the Bayes net shown in Figure 14.23.
a. Which, if any, of the following are asserted by the network structure (ignoring the CPTs for now)?
(i) $\mathbf{P}(B, I, M)=\mathbf{P}(B) \mathbf{P}(I) \mathbf{P}(M)$.
(ii) $\mathbf{P}(J \mid G)=\mathbf{P}(J \mid G, I)$.
(iii) $\mathbf{P}(M \mid G, B, I)=\mathbf{P}(M \mid G, B, I, J)$.
b. Calculate the value of $P(b, i, m, \neg g, j)$.
c. Calculate the probability that someone goes to jail given that they broke the law, have been indicted, and face a politically motivated prosecutor.
d. A context-specific independence (see page 542) allows a variable to be independent of some of its parents given certain values of others. In addition to the usual conditional independences given by the graph structure, what context-specific independences exist in the Bayes net in Figure 14.23?
e. Suppose we want to add the variable $P=$ PresidentialPardon to the network; draw the new network and briefly explain any links you add.

Check back soon!

Problem 14

Consider the variable elimination algorithm in Figure 14.11 (page 528).
a. Section 14.4 applies variable elimination to the query
$$
\mathbf{P}(\text { Burglary } \mid \text { JohnCalls }=\text { true, } \text { MaryCalls }=\text { true }) .
$$

Perform the calculations indicated and check that the answer is correct.
b. Count the number of arithmetic operations performed, and compare it with the number performed by the enumeration algorithm.
c. Suppose a network has the form of a chain: a sequence of Boolean variables $X_1, \ldots, X_n$ where Parents $\left(X_i\right)=\left\{X_{i-1}\right\}$ for $i=2, \ldots, n$. What is the complexity of computing $\mathbf{P}\left(X_1 \mid X_n=\right.$ true $)$ using enumeration? Using variable elimination?
d. Prove that the complexity of running variable elimination on a polytree network is linear in the size of the tree for any variable ordering consistent with the network structure.

Check back soon!

Problem 15

Investigate the complexity of exact inference in general Bayesian networks:
a. Prove that any 3-SAT problem can be reduced to exact inference in a Bayesian network constructed to represent the particular problem and hence that exact inference is NP-hard.
b. The problem of counting the number of satisfying assignments for a 3-SAT problem is \#P-complete. Show that exact inference is at least as hard as this.

Check back soon!

06:06

Problem 16

Consider the problem of generating a random sample from a specified distribution on a single variable. Assume you have a random number generator that returns a random number uniformly distributed between 0 and 1 .
a. Let $X$ be a discrete variable with $P\left(X=x_i\right)=p_i$ for $i \in\{1, \ldots, k\}$. The cumulative distribution of $X$ gives the probability that $X \in\left\{x_1, \ldots, x_j\right\}$ for each possible $j$. (See also Appendix A.) Explain how to calculate the cumulative distribution in $O(k)$ time and how to generate a single sample of $X$ from it. Can the latter be done in less than $O(k)$ time?
b. Now suppose we want to generate $N$ samples of $X$, where $N \gg k$. Explain how to do this with an expected run time per sample that is constant (i.e., independent of $k$ ).
c. Now consider a continuous-valued variable with a parameterized distribution (e.g., Gaussian). How can samples be generated from such a distribution?
d. Suppose you want to query a continuous-valued variable and you are using a sampling algorithm such as LIKELIHOODWEIGHTING to do the inference. How would you have to modify the query-answering process?

James Kiss

Numerade Educator

07:59

Problem 17

Consider the query $\mathbf{P}($ Rain $\mid$ Sprinkler $=$ true, WetGrass = true $)$ in Figure 14.12(a) (page 529) and how Gibbs sampling can answer it.
a. How many states does the Markov chain have?
b. Calculate the transition matrix $\mathbf{Q}$ containing $q\left(\mathbf{y} \rightarrow \mathbf{y}^{\prime}\right)$ for all $\mathbf{y}, \mathbf{y}^{\prime}$.
c. What does $\mathbf{Q}^2$, the square of the transition matrix, represent?
d. What about $\mathbf{Q}^n$ as $n \rightarrow \infty$ ?
e. Explain how to do probabilistic inference in Bayesian networks, assuming that $\mathbf{Q}^n$ is available. Is this a practical way to do inference?

Mengchun Cai

Numerade Educator

Problem 18

This exercise explores the stationary distribution for Gibbs sampling methods.
a. The convex composition $\left[\alpha, q_1 ; 1-\alpha, q_2\right]$ of $q_1$ and $q_2$ is a transition probability distribution that first chooses one of $q_1$ and $q_2$ with probabilities $\alpha$ and $1-\alpha$, respectively, and then applies whichever is chosen. Prove that if $q_1$ and $q_2$ are in detailed balance with $\pi$, then their convex composition is also in detailed balance with $\pi$. (Note, this result justifies a variant of GIBBS-ASK in which variables are chosen at random rather than sampled in a fixed sequence.)
b. Prove that if each of $q_1$ and $q_2$ has $\pi$ as its stationary distribution, then the sequential composition $q=q_1 \circ q_2$ also has $\pi$ as its stationary distribution.

Check back soon!

Problem 19

The Metropolis-Hastings algorithm is a member of the MCMC family; as such, it is designed to generate samples $\mathbf{x}$ (eventually) according to target probabilities $\pi(\mathbf{x})$. (Typically we are interested in sampling from $\pi(\mathbf{x})=P(\mathbf{x} \mid \mathbf{e})$.) Like simulated annealing, MetropolisHastings operates in two stages. First, it samples a new state $\mathbf{x}^{\prime}$ from a proposal distribution $q\left(\mathbf{x}^{\prime} \mid \mathbf{x}\right)$, given the current state $\mathbf{x}$. Then, it probabilistically accepts or rejects $\mathbf{x}^{\prime}$ according to the acceptance probability
$$
\alpha\left(\mathbf{x}^{\prime} \mid \mathbf{x}\right)=\min \left(1, \frac{\pi\left(\mathbf{x}^{\prime}\right) q\left(\mathbf{x} \mid \mathbf{x}^{\prime}\right)}{\pi(\mathbf{x}) q\left(\mathbf{x}^{\prime} \mid \mathbf{x}\right)}\right) .
$$

If the proposal is rejected, the state remains at $\mathbf{x}$.
a. Consider an ordinary Gibbs sampling step for a specific variable $X_i$. Show that this step, considered as a proposal, is guaranteed to be accepted by Metropolis-Hastings. (Hence, Gibbs sampling is a special case of Metropolis-Hastings.)
b. Show that the two-step process above, viewed as a transition probability distribution, is in detailed balance with $\pi$.

Check back soon!

Problem 20

Three soccer teams $A, B$, and $C$, play each other once. Each match is between two teams, and can be won, drawn, or lost. Each team has a fixed, unknown degree of qualityan integer ranging from 0 to 3 -and the outcome of a match depends probabilistically on the difference in quality between the two teams.
a. Construct a relational probability model to describe this domain, and suggest numerical values for all the necessary probability distributions.
b. Construct the equivalent Bayesian network for the three matches.
c. Suppose that in the first two matches $A$ beats $B$ and draws with $C$. Using an exact inference algorithm of your choice, compute the posterior distribution for the outcome of the third match.
d. Suppose there are $n$ teams in the league and we have the results for all but the last match. How does the complexity of predicting the last game vary with $n$ ?
e. Investigate the application of MCMC to this problem. How quickly does it converge in practice and how well does it scale?

Check back soon!