Recall that Qn ≜ R1+...+Rn−1 n−1 is an estimate of the true expected reward q∗ of an arbitrary arm a. We say that an estimate is biased if the expected value of the estimate does not match the true value, i.e., E[Qn]= q∗ (otherwise, it is unbiased). (a) Consider the sample-average estimate in Equation 2.1. Is it biased or unbiased? Explain briefly. For the remainder of the question, consider the exponential recency-weighted average estimate in Equation 2.5. Assume that 0 < α < 1 (i.e., it is strictly less than 1). (b) If Q1 = 0, is Qn for n > 1 biased? Explain briefly. (c) Derive conditions for when Qn will be unbiased (Q1 can be non-zero). (d) Show that Qn is asymptotically unbiased, i.e., it is an unbiased estimator as n → ∞. (e) Why should we expect that the exponential recency-weighted average will be biased in general