1. Which of the following metrics and tools are most commonly used to evaluate a classification model? A. Confusion matrix B. Cost-sensitive accuracy C. Area under the ROC curve D. None of the above E. All of the above
2. Describe three predictive algorithms and give instances where they can be used.
3. Briefly describe three metrics that can be used to assess the accuracy of a predictive model.
4. Suppose we clustered a set of N data points using two different clustering algorithms: kmeans and Gaussian mixtures. In both cases, we obtained 5 clusters and, in both cases, the centers of the clusters are exactly the same. Can 3 points that are assigned to different clusters in the k-means solution be assigned to the same cluster in the Gaussian mixture solution? If no, explain. If so, sketch an example or explain.
5. Assume we have a set of data from patients who have visited UPMC hospital during the year 2011. A set of features (e.g., temperature, height) have been also extracted for each patient. Our goal is to decide whether a new visiting patient has any of diabetes, heart disease, or Alzheimer (a patient can have one or more of these diseases). A. We have decided to use a neural network to solve this problem. We have two choices: either to train a separate neural network for each of the diseases or to train a single neural network with one output neuron for each disease, but with a shared hidden layer. Which method do you prefer? Justify your answer. B. Some patient features are expensive to collect (e.g., brain scans) whereas others are not (e.g., temperature). Therefore, we have decided to first ask our classification algorithm to predict whether a patient has a disease, and if the classifier is 80% confident that the patient has a disease, then we will do additional examinations to collect additional patient features. In this case, which classification methods do you recommend: neural networks, decision tree, or naive Bayes? Justify your answer in one or two sentences. 10. State and briefly explain the properties of an algorithm.