please explain the red highlight part in detail, especially how to apply the first-order condition here when involved matrix
Example: Ordinary Least Squares Estimation
In a linear regression model the dependent variable (y is a linear function of the
independent variables1,2 and an error term). For observation i
yi=xi1B1+xi.22+...+xi.kB+Ei
Assuming i = 1,...,n (that is, n equations), we can use matrix notation to express
the above as
y=X+,
where y is an n 1 column vector, X is an n k matrix, is a k 1 column
vector, and c is an n 1 column vector
22
The goal of the Ordinary Least Squares estimation is to find values of such that
the squared distance between the fitted line" and the actual data points, s?
is minimized.
In matrix notation, this means to minimize c'c:
min (y - X) (y - X3)
The first order condition implies that
2X' (y - X) = 0 X'x3=X'y
where X'X is a k k square matrix. If X'X is nonsingular, then the above yields
= (x'x)-1x'y.