• Home
  • Textbooks
  • Artificial Intelligence. A Modern Approach [Global Edition]
  • Learning Probabilistic Models

Artificial Intelligence. A Modern Approach [Global Edition]

Stuart Russell, Peter Norvig

Chapter 20

Learning Probabilistic Models - all with Video Answers

Educators


Chapter Questions

01:05

Problem 1

The data used for Figure 20.1 on page 804 can be viewed as being generated by $h_5$. For each of the other four hypotheses, generate a data set of length 100 and plot the corresponding graphs for $P\left(h_i \mid d_1, \ldots, d_N\right)$ and $P\left(D_{N+1}=\right.$ lime $\left.\mid d_1, \ldots, d_N\right)$. Comment on your results.

Vishal Parmar
Vishal Parmar
Numerade Educator

Problem 2

Repeat Exercise 20.1, this time plotting the values of $P\left(D_{N+1}=l i m e \mid h_{\mathrm{MAP}}\right)$ and $P\left(D_{N+1}=\right.$ lime $\left.\mid h_{\mathrm{ML}}\right)$.

Check back soon!
08:40

Problem 3

Suppose that Ann's utilities for cherry and lime candies are $c_A$ and $\ell_A$, whereas Bob's utilities are $c_B$ and $\ell_B$. (But once Ann has unwrapped a piece of candy, Bob won't buy it.) Presumably, if Bob likes lime candies much more than Ann, it would be wise for Ann to sell her bag of candies once she is sufficiently sure of its lime content. On the other hand, if Ann unwraps too many candies in the process, the bag will be worth less. Discuss the problem of determining the optimal point at which to sell the bag. Determine the expected utility of the optimal procedure, given the prior distribution from Section 20.1.

Md.Daniyal Arshad
Md.Daniyal Arshad
Numerade Educator
03:45

Problem 4

Two statisticians go to the doctor and are both given the same prognosis: A $40 \%$ chance that the problem is the deadly disease $A$, and a $60 \%$ chance of the fatal disease $B$. Fortunately, there are anti- $A$ and anti- $B$ drugs that are inexpensive, $100 \%$ effective, and free of side-effects. The statisticians have the choice of taking one drug, both, or neither. What will the first statistician (an avid Bayesian) do? How about the second statistician, who always uses the maximum likelihood hypothesis?

The doctor does some research and discovers that disease $B$ actually comes in two versions, dextro- $B$ and levo- $B$, which are equally likely and equally treatable by the anti- $B$ drug. Now that there are three hypotheses, what will the two statisticians do?

Bon Zapata
Bon Zapata
Numerade Educator

Problem 5

Explain how to apply the boosting method of Chapter 18 to naive Bayes learning. Test the performance of the resulting algorithm on the restaurant learning problem.

Check back soon!

Problem 6

Consider $N$ data points $\left(x_j, y_j\right)$, where the $y_j \mathrm{~s}$ are generated from the $x_j \mathrm{~s}$ according to the linear Gaussian model in Equation (20.5). Find the values of $\theta_1, \theta_2$, and $\sigma$ that maximize the conditional log likelihood of the data.

Check back soon!

Problem 7

Consider the noisy-OR model for fever described in Section 14.3. Explain how to apply maximum-likelihood learning to fit the parameters of such a model to a set of complete data. (Hint: use the chain rule for partial derivatives.)

Check back soon!

Problem 8

This exercise investigates properties of the Beta distribution defined in Equation (20.6).
a. By integrating over the range $[0,1]$, show that the normalization constant for the distribution beta $[a, b]$ is given by $\alpha=\Gamma(a+b) / \Gamma(a) \Gamma(b)$ where $\Gamma(x)$ is the Gamma function, defined by $\Gamma(x+1)=x \cdot \Gamma(x)$ and $\Gamma(1)=1$. (For integer $x, \Gamma(x+1)=x !$.)
b. Show that the mean is $a /(a+b)$.
c. Find the mode(s) (the most likely value(s) of $\theta$ ).
d. Describe the distribution beta $[\epsilon, \epsilon]$ for very small $\epsilon$. What happens as such a distribution is updated?

Check back soon!

Problem 10

Consider an arbitrary Bayesian network, a complete data set for that network, and the likelihood for the data set according to the network. Give a simple proof that the likelihood of the data cannot decrease if we add a new link to the network and recompute the maximumlikelihood parameter values.

Check back soon!