Exercise 2 (Discounted Gain) Consider the discrete-time Markov chain with the graph below. Let $\pi_i(t)$ denote the probability of being in state $i$ at time $t$, and let $\pi_i$ denote the proportion of time spent in state $i$ over an infinite horizon. The rewards are $r_{12} = +2$ and $r_{32} = -1$ (all other rewards are zero). Let $\gamma \le 1$ be the discount factor, $v(i)$ the expected total discounted reward starting from state $i$, and $g(i)$ the average reward (per period) starting from state $i$.
1. Determine $\pi_1, \pi_2, \pi_3$.
2. Deduce $g(1), g(2), and g(3)$.
3. Write a linear system to compute $v(1), v(2), and v(3)$.
4. When $\gamma = 0$, what are $v(1), v(2), and v(3)$?
5. When $\gamma = 0.5$, what are $v(1), v(2), and v(3)$?
6. When $\gamma = 1$, what are $v(1), v(2), and v(3)$?