Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Predictive Analytics and Data Mining Practice Exam, Exams of Technology

Technology

This practice exam for predictive analytics and data mining certification features 29 multiple-choice questions. It covers key concepts like crisp-dm, kdd, analytics types (descriptive, diagnostic, predictive, prescriptive), data types (nominal, ordinal, interval, ratio), unstructured data, etl, imputation, outlier identification, data transformation (box-cox), encoding (one-hot encoding), scaling, pca, feature selection, normal distribution, hypothesis testing, bayesian concepts, data visualization, regression, and classification trees. Each question includes a detailed explanation, making it a valuable resource for exam preparation and understanding data mining and predictive analytics fundamentals. It tests and reinforces knowledge of essential topics, providing a comprehensive review for students and professionals, ensuring a thorough understanding of principles and techniques.

Typology: Exams

2025/2026

Available from 12/20/2025

shilpi-jain-1 🇮🇳

4.2

(5)

29K documents

1 / 91

This page cannot be seen from the preview

Don't miss anything!

Predictive Analytics and Data Mining Certificate

Practice Exam

**Question 1.** Which phase of the CRISP‑DM methodology focuses on understanding

business objectives and converting them into a data‑driven problem?

A) Data Preparation

B) Business Understanding

C) Modeling

D) Deployment

Answer: B

Explanation: The Business Understanding phase translates business goals into a data mining

problem, defines success criteria, and creates a project plan.

**Question 2.** In the KDD process, which step directly follows data cleaning and integration?

A) Data Selection

B) Data Transformation

C) Data Mining

D) Evaluation

Answer: B

Explanation: After cleaning and integrating data, the next step is transforming it into suitable

formats (e.g., normalization, aggregation) for mining.

**Question 3.** Which analytics type is primarily concerned with answering “What will happen

next?”

A) Descriptive

B) Diagnostic

C) Predictive

D) Prescriptive

Answer: C

Partial preview of the text

Download Predictive Analytics and Data Mining Practice Exam and more Exams Technology in PDF only on Docsity!

Practice Exam

Question 1. Which phase of the CRISP‑DM methodology focuses on understanding business objectives and converting them into a data‑driven problem? A) Data Preparation B) Business Understanding C) Modeling D) Deployment Answer: B Explanation: The Business Understanding phase translates business goals into a data mining problem, defines success criteria, and creates a project plan. Question 2. In the KDD process, which step directly follows data cleaning and integration? A) Data Selection B) Data Transformation C) Data Mining D) Evaluation Answer: B Explanation: After cleaning and integrating data, the next step is transforming it into suitable formats (e.g., normalization, aggregation) for mining. Question 3. Which analytics type is primarily concerned with answering “What will happen next?” A) Descriptive B) Diagnostic C) Predictive D) Prescriptive Answer: C

Practice Exam

Explanation: Predictive analytics uses historical data and statistical models to forecast future events. Question 4. A bank wants to predict customer churn. The variable indicating whether a customer left is the: A) Feature B) Target variable C) Predictor variable D) Independent variable Answer: B Explanation: The target (or dependent) variable is the outcome the model is trained to predict— in this case, churn. Question 5. Which error type is most costly for a fraud detection system that blocks legitimate transactions? A) Type I error (false positive) B) Type II error (false negative) C) Both are equally costly D) Neither is costly Answer: A Explanation: A false positive blocks a legitimate transaction, causing customer dissatisfaction and potential revenue loss. Question 6. Which data type best describes “Customer satisfaction rating on a scale of 1 ‑ 5 ”? A) Nominal B) Ordinal

Practice Exam

C) K‑Nearest Neighbors imputation D) Listwise deletion Answer: C Explanation: K‑NN imputation predicts missing values based on similar records, leveraging more data structure. Question 10. An outlier is identified using the IQR method. Which rule defines an extreme outlier? A) Value < Q1 – 1.5·IQR or > Q3 + 1.5·IQR B) Value < Q1 – 3·IQR or > Q3 + 3·IQR C) Z‑score > 2 or < – 2 D) Value > 99th percentile Answer: B Explanation: Extreme outliers are those beyond 3·IQR from the quartiles; the 1.5·IQR rule defines regular outliers. Question 11. Applying a Box‑Cox transformation is most appropriate when: A) Data contain categorical variables B) Data are already normally distributed C) Data are positively skewed D) Data have missing values Answer: C Explanation: Box‑Cox can reduce positive skewness and help achieve normality. Question 12. Which encoding technique can cause the “dummy variable trap” if not handled properly?

Practice Exam

A) One‑hot encoding B) Label encoding C) Target encoding D) Frequency encoding Answer: A Explanation: One‑hot encoding creates perfectly collinear columns; dropping one dummy avoids the trap. Question 13. Which scaling method is robust to outliers? A) Min‑Max scaling B) Z‑score standardization C) Robust scaling (median and IQR) D) Decimal scaling Answer: C Explanation: Robust scaling uses median and IQR, reducing outlier influence. Question 14. Principal Component Analysis (PCA) primarily aims to: A) Increase the number of features B) Reduce dimensionality while preserving variance C) Select the most correlated features D) Encode categorical variables Answer: B Explanation: PCA creates orthogonal components that capture the maximum variance with fewer dimensions.

Practice Exam

Question 18. A two‑tailed t‑test with a p‑value of 0.03 indicates: A) Fail to reject the null hypothesis at α = 0. B) Reject the null hypothesis at α = 0. C) Reject the null hypothesis at α = 0. D) No conclusion can be drawn Answer: C Explanation: p = 0.03 < 0.05, so the null hypothesis is rejected at the 5% significance level. Question 19. Which test is appropriate for comparing the means of three independent groups? A) Paired t‑test B) One‑way ANOVA C) Chi‑square test D) Wilcoxon rank‑sum test Answer: B Explanation: One‑way ANOVA assesses whether at least one group mean differs from the others. Question 20. In a chi‑square test for independence, a large χ² statistic relative to the critical value suggests: A) Variables are independent B) Variables are dependent C) Sample size is too small D) Data are normally distributed Answer: B

Practice Exam

Explanation: A large χ² indicates observed frequencies deviate significantly from expected frequencies, implying dependence. Question 21. Which Bayesian concept updates prior beliefs with new evidence? A) Likelihood function B) Posterior probability C) P‑value D) Confidence interval Answer: B Explanation: The posterior combines the prior distribution and the likelihood of observed data. Question 22. A histogram is most suitable for visualizing: A) Categorical frequencies B) Distribution of a continuous variable C) Correlation between two variables D) Time‑series trends Answer: B Explanation: Histograms display the frequency of continuous data across bins. Question 23. In a scatter plot, a strong linear pattern indicates: A) High variance B) Strong correlation C) Non‑linear relationship D) Presence of outliers Answer: B

Practice Exam

Explanation: Lasso (L1 penalty) can shrink coefficients to zero, performing variable selection. Question 27. In logistic regression, the odds ratio for a predictor of 1.5 means: A) The predictor increases odds by 150% per unit increase B) The probability increases by 1. C) The log‑odds increase by 1. D) The predictor has no effect Answer: A Explanation: An odds ratio of 1.5 indicates a 50% increase in odds for each unit increase in the predictor. Question 28. Which impurity measure is used by the CART algorithm for classification trees? A) Entropy B) Gini impurity C) Information gain ratio D) Chi‑square Answer: B Explanation: CART uses Gini impurity (or variance for regression) to decide splits. Question 29. Pruning a decision tree primarily helps to: A) Increase depth B) Reduce overfitting C) Add more features D) Convert it to a linear model Answer: B

Practice Exam

Explanation: Pruning removes branches that do not improve validation performance, mitigating overfitting. Question 30. Random Forests achieve variance reduction by: A) Boosting weak learners sequentially B) Using a single deep tree C) Bagging multiple decorrelated trees D) Applying gradient descent Answer: C Explanation: Random Forests train many trees on bootstrap samples with random feature subsets, reducing variance. Question 31. In Gradient Boosting, each new tree is trained to predict: A) The original target variable B) The residual errors of the previous ensemble C) Random noise D) Feature importance scores Answer: B Explanation: Boosting fits each successive learner to the residuals (errors) of the current model to improve performance. Question 32. Which kernel function maps data into an infinite‑dimensional space? A) Linear kernel B) Polynomial kernel C) Radial Basis Function (RBF) kernel D) Sigmoid kernel

Practice Exam

D) The clustering algorithm speed Answer: B Explanation: Plotting within‑cluster sum of squares versus k shows an “elbow” point indicating diminishing returns. Question 36. Silhouette score values close to 1 indicate: A) Overlapping clusters B) Well‑separated, cohesive clusters C) Poor clustering structure D) Random assignment of points Answer: B Explanation: High silhouette values mean each point is close to its own cluster and far from others. Question 37. In hierarchical agglomerative clustering, the “complete linkage” criterion defines cluster distance as: A) Minimum distance between any two points in the clusters B) Maximum distance between any two points in the clusters C) Average distance between all point pairs D) Distance between cluster centroids Answer: B Explanation: Complete linkage uses the farthest pairwise distance, producing compact clusters. Question 38. DBSCAN can discover clusters of arbitrary shape because it: A) Uses k‑means centroids B) Relies on density reachability and connectivity

Practice Exam

C) Requires pre‑specifying the number of clusters D) Optimizes a global variance criterion Answer: B Explanation: DBSCAN groups points based on dense neighborhoods, allowing non‑convex shapes. Question 39. An internal clustering evaluation metric that does NOT require ground‑truth labels is: A) Adjusted Rand Index B) Purity C) Silhouette coefficient D) F‑measure Answer: C Explanation: Silhouette coefficient assesses cohesion and separation using only the data and cluster assignments. Question 40. In market basket analysis, the “support” of an itemset represents: A) The probability that the rule is correct B) The proportion of transactions containing the itemset C) The lift value relative to independence D) The confidence of the rule Answer: B Explanation: Support counts how frequently an itemset appears in the dataset. Question 41. Which of the following is a limitation of the Apriori algorithm? A) It cannot handle continuous variables

Practice Exam

Question 44. Which cross‑validation technique provides the most unbiased estimate of model performance for small datasets? A) Hold‑out validation B. 5‑fold cross‑validation C. Leave‑One‑Out Cross‑Validation (LOOCV) D. Bootstrap validation Answer: C Explanation: LOOCV uses every observation once as a test set, minimizing bias for limited data. Question 45. When dealing with imbalanced classes, which metric is more informative than overall accuracy? A) Mean Squared Error B) F1‑score C) R² D) Adjusted R² Answer: B Explanation: F1‑score balances precision and recall, highlighting performance on minority classes. Question 46. In time‑series forecasting, the “seasonality” component refers to: A) Random noise B) Long‑term trend C) Repeating patterns at fixed intervals D) Exogenous variables Answer: C Explanation: Seasonality captures systematic, periodic fluctuations (e.g., monthly sales peaks).

Practice Exam

Question 47. Which method decomposes a time series into trend, seasonal, and residual components using moving averages? A) ARIMA B) Exponential Smoothing C) STL (Seasonal‑Trend decomposition using Loess) D. Prophet Answer: C Explanation: STL separates the three components via locally weighted regression (LOESS). Question 48. In an ARIMA(p,d,q) model, the parameter d stands for: A) Number of autoregressive terms B) Degree of differencing to achieve stationarity C) Number of moving‑average terms D) Seasonal period Answer: B Explanation: d indicates how many times the series is differenced to remove trends. Question 49. Which regularization technique is appropriate when you suspect many irrelevant features but also want to keep correlated predictors? A) Lasso (L1) B) Ridge (L2) C) Elastic Net (L1 + L2) D) No regularization Answer: C

Practice Exam

Answer: B Explanation: Concept drift occurs when the underlying relationship between features and target evolves, degrading model performance. Question 53. Which of the following best describes “over‑sampling” in handling class imbalance? A) Removing majority class observations B) Replicating minority class observations C) Adding noise to all features D. Reducing dimensionality Answer: B Explanation: Over‑sampling creates additional minority class samples (e.g., SMOTE) to balance the dataset. Question 54. The “bias‑variance trade‑off” implies that: A) Reducing bias always reduces variance B) A model with low bias will always have low variance C) Decreasing bias typically increases variance, and vice versa D. Bias and variance are unrelated Answer: C Explanation: Improving model fit (lower bias) often makes it more sensitive to training data (higher variance). Question 55. Which performance metric is appropriate for evaluating a regression model when large errors are particularly undesirable? A) Mean Absolute Error (MAE) B) Mean Squared Error (MSE)

Practice Exam

C) R²

D) Accuracy Answer: B Explanation: MSE squares errors, penalizing larger deviations more heavily than MAE. Question 56. In a business context, a “lift chart” is used to: A) Compare model accuracy across different algorithms B) Visualize the improvement of a predictive model over random selection C. Display residuals distribution D) Show ROC curves for multiple thresholds Answer: B Explanation: Lift charts illustrate how much better a model is at identifying positive cases compared to random guessing. Question 57. Which of the following is NOT a typical step in feature selection using a wrapper method? A) Training a model on a subset of features B) Evaluating performance on validation data C) Ranking features by correlation with the target D. Iteratively adding/removing features based on model score Answer: C Explanation: Correlation ranking is a filter technique; wrappers involve model training and evaluation. Question 58. In a data warehouse, “star schema” is characterized by: A) Multiple many‑to‑many relationships

Predictive Analytics and Data Mining Practice Exam, Exams of Technology

Related documents

Partial preview of the text

Download Predictive Analytics and Data Mining Practice Exam and more Exams Technology in PDF only on Docsity!

Practice Exam

Practice Exam

Practice Exam

Practice Exam

Practice Exam

Practice Exam

Practice Exam

Practice Exam

Practice Exam

Practice Exam

Practice Exam

Practice Exam

Practice Exam

Practice Exam

C) R²