Given gamma =0.7, answer the following questions:
P(s_(3)^(c),r_(3)^(c)|s_(2)^(b),a_(2)^(3))= ?
P(s_(3)^(g),r_(3)^(g)|s_(2)^(d),a_(2)^(7))= ?
P_(s_(2)^(f)s_(3)^(k))^(a_(2)^(11))= ?
Q_(pi )(s_(2)^(e),a_(2)^(8))= ?
Q_(pi )(s_(1),a_(1)^(1))= ?
Q_(pi )(s_(1),a_(1)^(2))= ?
Q_(pi )(s_(1),a_(1)^(3))= ?
V_(pi )(s_(1))= ?
A_(pi )(s_(1),a_(1)^(1))= ?
A_(pi )(s_(1),a_(1)^(2))= ?
A_(pi )(s_(1),a_(1)^(3))= ?
Based on the A-function values you calculated,
briefly explain your strategy to update the policy in
order to achieve higher expected reward.
Using plain language (no equation), explain the
relations among V-function, Q-function, and A-
function, and why we need to consider A-function
to update the policy.
US
60%
Given y = 0.7, answer the following questions: 1.P(ss,rs2,a3)=? 2.P(sg,r|s2,aZ)=?
%08
20%
=10
50%
100%
b = 20
=30
100%
20%
4.Qns2,a=? 5.Qn(s,a)=? 6. Qn(s,a)=? 7.Qn(s1,a3)=? 8.Vs)=? 9. As,a=? 10.As,a=? 11.A,s,a=?
100%
309
r=3
50%
r=5
30%
30%
50%
%001
12.Based on the A-function values vou calculated briefly explain your strategy to update the policy in order to achieve higher expected reward
13.Using plain language (no equation),explain the relations among V-function, Q-function, and A- function, and why we need to consider A-function to update the policy.
309
20%