The covariance of random variables X and Y is defined to be the expectation of the interaction of the deviation of X away from its mean E[X] with the deviation of Y away from its mean E[Y]:
Cov(X, Y) := ̃σXY := E[(X - E[X])(Y - E[Y])].
The interpretation of the covariance is as follows: it tells us the linear statistical association between X and Y. When σXY is positive, deviations of X above the mean tend to occur together with deviations of Y above its mean. When σXY is negative, deviations of X and Y about their means tend to go in opposite directions. Covariance captures only "linear" association between X and Y because it detects whether Y is a linear transformation of X or not. By the Cauchy-Schwarz inequality
|Cov(X, Y)|^2 ≤ Var(X) · Var(Y),
with equality if and only if Y = aX + b for some constants a, b.
(a) Verify the following very useful formula for computing the covariance in the case when X, Y are discrete random variables and in the case when X, Y are jointly continuous random variables:
Cov(X, Y) = E[XY] − E[X]E[Y].
(b) Prove the following useful formulas (assume either discrete or continuous case):
Var(aX + bY) = a^2 Var(X) + b^2 Var(Y) + 2ab Cov(X, Y)
Cov(aX, bY + cX) = ab Cov(X, Y) + ac Var(X).
(c) Define the correlation coefficient
ρ(X, Y) := (E[XY] − E[X]E[Y]) / (√(E[X^2] − E[X]^2) · √(E[Y^2] − E[Y]^2))
and show that |ρ(X, Y)| = 1 if and only if Y is a linear transformation of X.
(d) For a general pair of random variables X, Y, the following decomposition of Y
Y = α + βX + U where Cov(U, X) = 0, E[U] = 0
into the sum of a linear transformation of X and a residual U that is uncorrelated with X is known as the linear regression of Y on X. Show that
β = Cov(X, Y) / Var(X)
α = E[Y] − (Cov(X, Y) / Var(X)) · E[X].
The regression function α + βX is the best approximation of random variable Y (in a certain sense) with a linear function of X.