The following questions are related to a generalized linear regression model.
What are the 3 components of a generalized linear model?
For a logistic regression of lung cancer incidences (1=developed lung cancer, 0=did not develop lung cancer) with smoking statuses (1=current or past smoker, 0=never smoker), specify the 3 components that you identified in part a.
A study was conducted to investigate the effects of AZT treatment on the development of AIDS symptoms among HIV-infected veterans. Among 170 individuals who took AZT, 25 developed the symptoms of AIDS by the end of the study. Among the other 168 individuals who did not take AZT (placebo group), 44 developed the symptoms of AIDS.
Construct a two-by-two table. Use AIDS symptoms as columns and AZT treatment as rows.
Based on the table constructed in a., calculate the odds of AIDS in the treatment group, odds of AIDS in the placebo group, and the odds ratio of AIDS comparing the treatment and placebo group.
If we fit a logistic regression model of the development of AIDS symptoms on AZT based on the data provided, write down the model (No need to specify three components in GLM, write one formula connecting the probability of developing AIDS and AZT. Hint: you need to create a dummy variable for AZT).
Without fitting the regression model in c. via Stata, give estimates of the coefficient betas.
Suppose race (1=white, 0=nonwhite) is a potential confounder and we fit a logistic regression model for the development of AIDS symptoms on AZT (1=users, 0=nonuser), race (1=white, 0=nonwhite), and the interaction between AZT and race. Using the output shown below, estimate the probability of developing AIDS symptoms during the study follow-up for a nonwhite AZT user.
[Stata] Property Valuation: Scientific mass appraisal uses linear regression methods to assess property valuation. Twenty-four observations were obtained from a property listing for Champaign IL. The problem is to find the best fitting regression model to predict the sales price (y) using the following independent variables: taxes in $1000s of dollars (x1), number of bedrooms (x2), age of the home in years (x3). The data set is stored in the midterm.dta file.
Answer the following questions, justifying with appropriate analyses. Do not just answer yes or no, you must justify your response and provide data to back it up.
In a fitted regression model that relates the sale price to taxes and building characteristics, would you include all the variables? (Hint: Are all the variables significant?)
A veteran real estate agent has suggested that a model with taxes and the number of bedrooms should adequately describe the sales price. Do you agree?
Present what you consider to be the most adequate model or models for predicting the sale price of homes in Champaign IL.
ID x1 x2 x3 y
1 4.918 4 42 25.9
2 5.021 4 62 29.5
3 4.543 3 40 27.9
4 4.557 3 54 25.9
5 5.06 3 42 29.9
6 3.891 3 56 29.9
7 5.898 3 51 30.9
8 5.604 3 32 28.9
9 5.828 3 32 35.9
10 5.3 3 30 31.5
11 6.271 2 30 31
12 5.959 3 32 30.9
13 5.05 2 46 30
14 8.246 4 50 36.9
15 6.697 3 22 41.9
16 7.784 3 17 40.5
17 9.038 3 23 43.9
18 5.989 3 40 37.9
19 7.542 3 22 37.9
20 8.795 4 50 44.5
21 6.083 3 44 37.9
22 8.361 4 48 38.9
23 8.14 3 3 36.9
24 9.142 4 31 45.8