Question

2. Markov Decision Process - Value Iteration Consider an infinite horizon Markov Decision Process. There are three states $S_1, S_2, S_3$ and two actions $a_1, a_2$. Assume the reward functions are $R(S_1, a_1) = R(S_1, a_2) = 1$, $R(S_2, a_1) =$ $R(S_2, a_2) = -1$ and $R(S_3, a_1) = R(S_3, a_2) = 3$. The state transitions by taking action $a_1$ and $a_2$ are described by the matrices below, respectively. $a_1:$ $\begin{array}{|c|c|c|c|} \hline & S_1 & S_2 & S_3 \\ \hline S_1 & 0.8 & 0.1 & 0.1 \\ S_2 & 0.9 & 0.0 & 0.1 \\ S_3 & 0.2 & 0.7 & 0.1 \\ \hline \end{array}$ $a_2:$ $\begin{array}{|c|c|c|c|} \hline & S_1 & S_2 & S_3 \\ \hline S_1 & 0.1 & 0.8 & 0.1 \\ S_2 & 0.9 & 0.1 & 0.0 \\ S_3 & 0.7 & 0.2 & 0.1 \\ \hline \end{array}$ (1) Starting with a value function that is 0 for all states, perform value iteration for 10 iterations for $\delta = 0.9$. Do it again for $\delta = 0.1$. (You can either work it out using a table or by computer). (2) Compare the values of states after 10 iterations with those in Problem 1. Are the values of states always no lower? Why yes and why not? (3) Can you see it converging faster in one case? If so, why? (4) What is the optimal policy?

          2. Markov Decision Process - Value Iteration
Consider an infinite horizon Markov Decision Process. There are three states $S_1, S_2, S_3$ and two
actions $a_1, a_2$. Assume the reward functions are $R(S_1, a_1) = R(S_1, a_2) = 1$, $R(S_2, a_1) =$
$R(S_2, a_2) = -1$ and $R(S_3, a_1) = R(S_3, a_2) = 3$. The state transitions by taking action $a_1$ and $a_2$
are described by the matrices below, respectively.
$a_1:$
$\begin{array}{|c|c|c|c|}
\hline
 & S_1 & S_2 & S_3 \\
\hline
S_1 & 0.8 & 0.1 & 0.1 \\
S_2 & 0.9 & 0.0 & 0.1 \\
S_3 & 0.2 & 0.7 & 0.1 \\
\hline
\end{array}$ 
$a_2:$
$\begin{array}{|c|c|c|c|}
\hline
 & S_1 & S_2 & S_3 \\
\hline
S_1 & 0.1 & 0.8 & 0.1 \\
S_2 & 0.9 & 0.1 & 0.0 \\
S_3 & 0.7 & 0.2 & 0.1 \\
\hline
\end{array}$
(1) Starting with a value function that is 0 for all states, perform value iteration for 10 iterations for
$\delta = 0.9$. Do it again for $\delta = 0.1$. (You can either work it out using a table or by computer).
(2) Compare the values of states after 10 iterations with those in Problem 1. Are the values of states
always no lower? Why yes and why not?
(3) Can you see it converging faster in one case? If so, why?
(4) What is the optimal policy?

2. Markov Decision Process - Value Iteration
Consider an infinite horizon Markov Decision Process. There are three states S1, S2, S3 and two
actions a1, a2. Assume the reward functions are R(S1, a1) = R(S1, a2) = 1, R(S2, a1) =
R(S2, a2) = -1 and R(S3, a1) = R(S3, a2) = 3. The state transitions by taking action a1 and a2
are described by the matrices below, respectively.
a1:
S1 S2 S3

S1 0.8 0.1 0.1

S2 0.9 0.0 0.1

S3 0.2 0.7 0.1
a2:
S1 S2 S3

S1 0.1 0.8 0.1

S2 0.9 0.1 0.0

S3 0.7 0.2 0.1
(1) Starting with a value function that is 0 for all states, perform value iteration for 10 iterations for
δ = 0.9. Do it again for δ = 0.1. (You can either work it out using a table or by computer).
(2) Compare the values of states after 10 iterations with those in Problem 1. Are the values of states
always no lower? Why yes and why not?
(3) Can you see it converging faster in one case? If so, why?
(4) What is the optimal policy?

Added by Ann R.

Question

Please give Ace some feedback