When deploying a knowledge discovery system to ten or more users, tenfold validation should be the deployment method. Question 97 options: TrueFalse
Added by Renee W.
Step 1
Tenfold (or 10-fold) cross-validation is a statistical method used to assess the performance of a model by dividing the dataset into ten subsets. The model is trained on nine of these subsets and tested on the remaining one, and this process is repeated ten times, Show more…
Show all steps
Your feedback will help us improve your experience
Adi S and 86 other AP CS educators are ready to help you.
Ask a new question
Labs
Want to see this concept in action?
Explore this concept interactively to see how it behaves as you change inputs.
Key Concepts
Recommended Videos
1. Unsupervised data mining techniques specify a target variable. (T/F) 2. The output of a classification data task is continuous. (T/F) 3. In each iteration of k-fold cross-validation, one of the k subsets is used as the test set and the other k-1 subsets are put together to form a training set. (T/F) 4. Bagging iteratively combines multiple weak learners, usually weighted related to the weak learners' accuracy, to create one strong learner. (T/F) 5. A decision tree building process is greedy because at each step of the tree-building process, the best split is made at that step rather than looking ahead and picking a split that will lead to a better tree in some future step. (T/F) 6. Bonferroni’s Principle states that if you look for events of a given type, you can expect to find events to occur even if the data is completely random. (T/F) 7. R-squared is a measure of how much variability of the target variable can be explained by the predictor variables in a multiple linear regression model. (T/F) 8. Binary logistic regression can be used to predict the probability of a categorical dependent variable. (T/F)
Adi S.
IQ scores predict school performance. T or $\mathrm{F}$ ?
Haricharan G.
Answer true or false to the following questions. (40 points in total) 1. If the vector w = (1, 0) in a linear SVM model, the margin of this model is 2. 2. A larger sigma value or Kernel Scale in the RBF kernel leads to a simpler SVM model. 3. A larger box constraint C will likely lead to an SVM model generating a wider margin. 4. A smaller box constraint C will likely lead to a simpler SVM model. 5. Under an RBF SVM model, all the support vectors have the Lagrange multiplier ai = 0 (i.e. alpha_i = 0). 6. If the box constraint of an SVM RBF model is set to 1, the maximum alpha value extracted from all the support vectors of the model will be capped at 1. 7. If we train an SVM RBF model on a dataset of K predictors, we will obtain K number of weights (i.e. theta values) from calling a dual-form function of the model. 8. Before training a decision tree model, the categorical predictors need to be converted to dummy variables (i.e. one-hot encoding). 9. Decision tree models have a bias in selecting categorical predictors that have more categories. 10. Each tree in a random forest was built using a subset of data. 11. One data sample (i.e. one data record) may occur multiple times in one replica under the data bagging method. 12. If a binary-split decision tree needs to be able to predict a dataset of M classes, the number of splits in the tree must be equal to M. 13. When building an SVM RBF model for one-class classification, the model is likely to face the unbalanced class problem. 14. The kNN (k-nearest neighbor) method is a lazy classifier. 15. If a predictor A has lower entropy than another predictor B, then predictor A has higher certainty than B in predicting the target classes. 16. It is possible for a clustering tool to produce the Silhouette coefficients for a set of 6 data points as {0, -0.01, -1, -0.95, 1, 0}. 17. The k-means method can generate clusters of any shape. 18. Under the Naïve Bayes method, the importance of predictors will fluctuate with respect to each record to be predicted. 19. A good clustering result should have high inter-cluster similarity and low intra-cluster similarity. 20. Under the Naïve Bayes method, adjusting the prior probability is one way to alleviate the unbalanced dataset problem.
Recommended Textbooks
Computer Science and Information Technology
Introduction to Programming Using Python
Computer Science - An Overview
Transcript
18,000,000+
Students on Numerade
Trusted by students at 8,000+ universities
Watch the video solution with this free unlock.
EMAIL
PASSWORD