We are going to model the probability of an aftershock with
1
p(x) = \frac{1}{1 + e^{-(\beta_0 + \beta_1x)}},
(1)
where x will be a variable that we use to make the prediction. We are going to find the best
parameter values \beta_0, \beta_1 to model the data of a given main event, by finding the values of
\beta_0, \beta_1 that minimise
f(\beta_0, \beta_1) = \sum_k -y_k \log(p(x_k; \beta_0, \beta_1)) - (1 - y_k)\log(1 - p(x_k, \beta_0, \beta_1)).
(2)
This expresion corresponds to the negative log-likelihood of a model. Here y_k \in \{0,1\} is the
observed outcome (no aftershock or aftershock present), and x_k is our predictor variable, that
we will define based on information about the earthquake.
(a) Implement a function fit(X, Y, gamma) that receives the vectors with values x_k and
y_k, and a step gamma for the gradient descent method, and returns \beta_0, \beta_1 obtained the
gradient descent method with starting point (0,0). Test it by computing the values for
2001BHUJINO1YAGI using the Euclidean norm of the stresses as X and the value of the
column aftershock as Y.
(b) Implement a function fit_file(fi,fu, gamma) that finds the optimal values of \beta_0, \beta_1
using gradient descent as before, using the data in the aftershock file fi, and the function
fu on the stresses (defined as in Question 1b). Test it by computing the values for
2001BHUJINO1YAGI using the Euclidean norm of the stresses as X and the value of the
column aftershock as Y.
(c) Implement a function factory fit_file_factory(fu,gamma) to fix the values of fu and
gamma in fit_file. Compute the values of \beta_0, \beta_1 for all events in selectedEvents,
using f(s_1,...,s_6) = \log(\sum_i |s_i|) and \gamma = 10^{-3}. Plot the results with \beta_0 in the x-axis
and \beta_1 in the y-axis, one point for each event.