1. Data mining is a tool for allowing users to ______. A. find the hidden relationships in data B. find the relationships in data C. find the visible relationships in data D. find the theoretical relationships in data 2. Which is correct about overfitting? A. There is a strict threshold value to check whether the model is an overfitted one. B. Overfitting should be avoided. C. Overfitting means the model fits well on the test data, but poorly on the training data. D. Cross-validation is able to eliminate overfitting in any circumstances. 3. Suppose one wants to predict the number of customers according to the history information, which supervised learning is suitable? A. structural equation modelling B. clustering C. regression D. classification 4. Given two products A and B, which are sale in regions $R_1$ and $R_2$. The overall statistics says product A has a higher return rate, but if we review the return rates in regions $R_1$ and $R_2$, respectively, product B has a high return rate. This phenomenon is summarized as: | | overall | region $R_1$ | region $R_2$ | | :--- | :--- | :--- | :--- | :--- | :--- | :--- | | product | A | B | A | B | A | B | | total sale | 2100 | 2100 | 100 | 1900 | 2000 | 200 | | return item | 1005 | 300 | 5 | 120 | 1000 | 180 | | return rate | 47.9% | 14.3% | 5.0% | 6.3% | 50.0% | 90.0% | A. Bonferroni correction B. Simpson's paradox C. CRISP-DM methodology D. Bayes' theorem 5. Data mining helps in ______. A. marketing strategies B. inventory management C. sales promotion strategies D. all of the above 6. Capability of data mining is to build ______ models. A. predictive B. imperative C. interrogative D. retrospective 7. Removing duplicate records is a process called ______. A. recovery B. data pruning
Added by Nicole M.
Close
Step 1
The options are: A. find the hidden relationships in data B. find the relationships in data C. find the visible relationships in data D. find the theoretical relationships in data Data mining primarily focuses on discovering patterns and relationships in data that Show moreβ¦
Show all steps
Your feedback will help us improve your experience
Lien Le and 84 other Intro Stats / AP Statistics educators are ready to help you.
Ask a new question
Labs
Want to see this concept in action?
Explore this concept interactively to see how it behaves as you change inputs.
Key Concepts
Recommended Videos
Which is not a name often given to an independent variable that takes on just two values (0 or 1) according to whether or not a given characteristic is absent or present? Select one: a. Absent variable b. Binary variable c. Dummy variable 2. A fitted multiple regression equation is Y = 12 + 3X1 - 5X2 + 7X3 + 2X4. When X1 increases 2 units and X2 increases 2 units as well, while X3 and X4 remain unchanged, what change would you expect in your estimate of Y? Select one: a. Decrease by 2 b. Decrease by 4 c. Increase by 2 d. No change in Y 3. A log transformation might be appropriate to alleviate which problem(s)? Select one: a. Heteroscedastic residuals b. Multicollinearity c. Autocorrelated residuals 4. A high leverage observation will have: Select one: a. an unusual value of the observed Y. b. unusual values of one or more X's. c. a large standardized residual. 5. Which of the following would be most useful in checking the normality assumption of the errors in a regression model? Select one: a. The t statistics for the coefficients b. The F-statistic from the ANOVA table c. The histogram of residuals d. The VIF statistics for the predictors 6. Simple tests for nonlinearity in a regression model can be performed by Select one: a. squaring the standard error. b. including squared predictors. c. deleting predictors one at a time. 7. Heteroscedasticity of residuals in regression suggests that there is: Select one: a. nonconstant variation in the errors. b. multicollinearity among the predictors. c. non-normality in the errors. d. lack of independence in successive errors.
Sri K.
1. Unsupervised data mining techniques specify a target variable. (T/F) 2. The output of a classification data task is continuous. (T/F) 3. In each iteration of k-fold cross-validation, one of the k subsets is used as the test set and the other k-1 subsets are put together to form a training set. (T/F) 4. Bagging iteratively combines multiple weak learners, usually weighted related to the weak learners' accuracy, to create one strong learner. (T/F) 5. A decision tree building process is greedy because at each step of the tree-building process, the best split is made at that step rather than looking ahead and picking a split that will lead to a better tree in some future step. (T/F) 6. Bonferroniβs Principle states that if you look for events of a given type, you can expect to find events to occur even if the data is completely random. (T/F) 7. R-squared is a measure of how much variability of the target variable can be explained by the predictor variables in a multiple linear regression model. (T/F) 8. Binary logistic regression can be used to predict the probability of a categorical dependent variable. (T/F)
Adi S.
Please help to answer these questions: 1. Which of the following data-collection methods is most likely to minimize pressure on participants to provide socially acceptable responses to questions? a. Projective techniques b. Self-report questionnaires c. Qualitative interviews d. Focus groups 2. To determine how frequently nurses record a diagnosis, a sample of 1000 charts is randomly selected from all the patients' charts during the previous year. What kind of sampling method is this? a. Simple random b. Stratified random c. Cluster d. Systematic random e. Convenience f. Quota g. Purposive h. Snowball k. Theoretic 3. When collecting data in a qualitative interview, it is preferable to avoid agreeing with a participant statement to encourage their participation. a. True b. False 4. Internal consistency (typically measured by Cronbach's alpha) is used to assess: a. Reliability b. Validity c. Probability samples d. Non-probability samples 5. A particularly notable challenge for quantitative observational studies is: a. Identifying an appropriate sample b. Confirming findings after the fact c. Operationalizing an observable variable d. None of the above 6. A qualitative research study exploring attitudes towards IV drug users among nurses asks each participant to write short essays in response to a series of pre-written questions. What term best describes this approach? a. Structured interview b. Semi-structured interview c. Unstructured interview d. Focus group interview 7. In qualitative research, data analysis is often begun before the study has been completed. a. True b. False 8. Qualitative research can integrate historical information and media relevant to the topic being explored. a. True b. False 9. Participants are often recruited into qualitative studies using: a. Purposive samples b. Snowball samples c. Convenience samples d. All of the above
Krishna G.
Recommended Textbooks
Elementary Statistics a Step by Step Approach
The Practice of Statistics for AP
Introductory Statistics
Transcript
Watch the video solution with this free unlock.
EMAIL
PASSWORD