Logistic Regression. In logistic regression, we have a binary dependent variable Y (that takes on the values 0 and 1) and a quantitative independent variable X. Let ̴̵̶̷̸̡̢̧̨̛̖̗̘̙̜̝̞̟̠̣̤̥̦̩̪̫̬̭̮̯̰̱̲̳̹̺̻̼͇͈͉͍͎̀́̂̃̄̅̆̇̈̉̊̋̌̍̎̏̐̑̒̓̔̽̾̿̀́͂̓̈́͆͊͋͌̕̚ͅ͏͓͔͕͖͙͚͐͑͒͗͛ͣͤͥͦͧͨͩͪͫͬͭͮͯ͘͜͟͢͝͞͠͡ͰͱͲͳʹ͵Ͷͷͺͻͼͽ;Ϳ΄΅Ά·ΈΉΊΌΎΏΐΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΩΪΫάέήίΰαβγδεζηθικλμνξοπ(x) = P(Y = 1|X = x). We consider the model Y = E(Y|X = x) + ̴̵̶̷̸̡̢̧̨̛̖̗̘̙̜̝̞̟̠̣̤̥̦̩̪̫̬̭̮̯̰̱̲̳̹̺̻̼͇͈͉͍͎̐̑̒̓̔̽̾̿̀́͂̓̈́͆͊͋͌̕̚ͅ͏͓͔͕͖͙͚͐͑͒͗͛ͣͤͥͦͧͨͩͪͫͬͭͮͯ͘͜͟͢͝͞͠͡ͰͱͲͳʹ͵Ͷͷͺͻͼͽ;Ϳ΄΅Ά·ΈΉΊΌΎΏΐΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΩΪΫάέήίΰαβγδεζηθικλμνξοπρςστυφχψωϊϋόύώϏϐϑϒϓϔϕϖϗϘϙϚϛϜϝϞϟϠϡϢϣϤϥϦϧϨϩϪϫϬϭϮϯϰϱϲϳϴϵ϶ϷϸϹϺϻϼϽϾϿε.
(a) Show that E(Y|X = x) = π(x), E(ε) = 0 and Var(ε) = π(x)(1 - π(x)) and if log(π(x)/(1 - π(x))) = α + βx, then π(x) = exp(α + βx)/(1 + exp(α + βx)).
(b) Suppose the data (x1, y1), (x2, y2), ..., (xn, yn) are observed. Using equation (1) and the likelihood function L((x1, y1), ..., (xn, yn)) = π(x1)^y1(1 - π(x1))^(1-y1) ⋅ ... ⋅ π(xn)^yn(1 - π(xn))^(1-yn), show that the maximum likelihood estimators for α and β can be found by solving the equations ∑(yi - π(xi)) = 0 and ∑xi(yi - π(xi)) = 0.
(c) Given the data (1.0, 0), (2.0, 1), (2.0, 0), (3.0, 1), (3.0, 1), (3.0, 0), (4.0, 1), (4.0, 1), (5.0, 1), verify that the MLE are α̂ = -4.15183 and β̂ = 1.7831, and predict the probability of Y = 1 if x = 6.0.
Hint: Solving the equations in (2) requires use of Mathematica or Wolfram Alpha.