




























































































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
A practice exam for predictive analytics and machine learning, focusing on business applications. It includes multiple-choice questions covering key concepts, techniques, and best practices in predictive modeling. Each question is followed by a detailed explanation of the correct answer, making it a valuable resource for students and professionals preparing for certification or seeking to enhance their understanding of predictive analytics. The exam covers topics such as crisp-dm framework, regression, classification, data preprocessing, model evaluation, and various machine learning algorithms.
Typology: Exams
1 / 111
This page cannot be seen from the preview
Don't miss anything!





























































































Question 1. Which of the following best describes the primary goal of predictive analytics in a business context? A) Summarize past performance B) Suggest optimal future actions without forecasting C) Estimate future outcomes based on historical data D) Visualize data trends for reporting Answer: C Explanation: Predictive analytics uses historical data to build models that estimate future outcomes, enabling proactive decision‑making. Question 2. In the CRISP‑DM framework, which phase directly follows “Data Understanding”? A) Business Understanding B) Data Preparation C) Modeling D) Deployment Answer: B Explanation: After understanding the data, the next step is to clean, transform, and prepare it for modeling.
Question 3. Which machine‑learning task is most appropriate for predicting a customer’s expected monetary value over the next 12 months? A) Classification B) Clustering C) Regression D) Association rule mining Answer: C Explanation: Customer Lifetime Value is a continuous numeric quantity, making regression the suitable technique. Question 4. Which of the following is a key difference between descriptive and predictive analytics? A) Descriptive analytics forecasts future trends. B) Predictive analytics explains why past events occurred. C) Descriptive analytics summarizes historical data; predictive analytics estimates future events. D) Predictive analytics only uses unsupervised learning. Answer: C Explanation: Descriptive analytics focuses on summarizing past data, whereas predictive analytics builds models to anticipate future outcomes. Question 5. In a credit‑risk scoring model, which type of outcome variable is used? A) Continuous numeric B) Ordinal categorical
Answer: B Explanation: The median is robust to skewed distributions and better represents the central tendency than the mean. Question 8. Which technique is most appropriate for detecting outliers in a dataset with a non‑normal distribution? A) Z‑score (standard deviation) method B) IQR (interquartile range) method C) Linear regression residuals D) Pearson correlation Answer: B Explanation: The IQR method does not assume normality and works well with skewed data. Question 9. For a tree‑based model such as Random Forest, which preprocessing step is usually unnecessary? A) One‑hot encoding of categorical variables B) Standardization (Z‑score scaling) C) Handling missing values D) Removing duplicate rows Answer: B Explanation: Tree‑based algorithms are invariant to monotonic transformations, so scaling is not required.
Question 10. Which encoding technique can cause a high‑dimensional sparse matrix when applied to a categorical variable with many levels? A) Label Encoding B) Target Encoding C) One‑Hot Encoding D) Binary Encoding Answer: C Explanation: One‑Hot Encoding creates a separate binary column for each category, leading to high dimensionality. Question 11. When creating lag features for a time‑series forecasting model, what does a “lag‑ 3 ” feature represent? A) The value three periods ahead B) The average of the last three observations C) The observation three periods prior D) The difference between current and three‑period‑old values Answer: C Explanation: A lag‑3 feature captures the value that occurred three time steps before the current observation. Question 12. Which regularization technique adds an L1 penalty to the loss function? A) Ridge Regression B) Lasso Regression
Explanation: Manhattan distance works well after scaling, ensuring each feature contributes proportionally. Question 15. Naïve Bayes assumes which of the following about predictor variables? A) Linear relationship with the target B) Independence given the class label C) Equal variance across classes D) Monotonic transformation Answer: B Explanation: The core assumption of Naïve Bayes is conditional independence of features given the class. Question 16. In a CART decision tree, which impurity measure is used for binary classification by default in many libraries? A) Mean Squared Error B) Gini Index C) Entropy D) Chi‑square Answer: B Explanation: Gini impurity is the default splitting criterion for classification trees in many implementations.
Question 17. Which of the following is a primary advantage of bagging (e.g., Random Forest) over a single decision tree? A) Reduced bias B) Increased model interpretability C) Lower variance through ensemble averaging D) Guarantees monotonic predictions Answer: C Explanation: Bagging reduces variance by aggregating predictions from many decorrelated trees. Question 18. Gradient Boosting Machines improve model performance by: A) Randomly selecting features for each tree B) Building trees sequentially, each correcting the errors of its predecessor C) Averaging predictions from independently built trees D) Using only linear base learners Answer: B Explanation: Boosting creates a series of models where each new model focuses on the residual errors of the ensemble so far. Question 19. Which cross‑validation method is most appropriate for evaluating a model on time‑ordered data? A) Random K‑Fold CV B) Stratified K‑Fold CV
Explanation: Recall (sensitivity) emphasizes correctly identifying all actual positives, reducing false negatives. Question 22. The ROC curve plots: A) Precision vs. Recall B) True Positive Rate vs. False Positive Rate C) Accuracy vs. Threshold D) F1 Score vs. Threshold Answer: B Explanation: ROC displays the trade‑off between sensitivity (TPR) and 1 – specificity (FPR) across thresholds. Question 23. Which regression metric is scale‑dependent and heavily penalizes large errors? A) MAE B) MAPE C) RMSE D) R‑squared Answer: C Explanation: RMSE squares the errors before averaging, giving more weight to large deviations. Question 24. Adjusted R‑squared differs from regular R‑squared because it: A) Penalizes adding irrelevant predictors
B) Always increases with more variables C) Is only used for logistic regression D) Is computed on the test set Answer: A Explanation: Adjusted R‑squared adjusts for the number of predictors, decreasing if added variables do not improve the model. Question 25. Which technique directly addresses class imbalance by synthesizing new minority‑class examples? A) Random undersampling B) Random oversampling C) SMOTE (Synthetic Minority Over‑sampling Technique) D) Cost‑sensitive learning Answer: C Explanation: SMOTE creates synthetic samples along the line segments between minority instances, balancing the class distribution. Question 26. In Principal Component Analysis, the first principal component is chosen to: A) Minimize the covariance between variables B) Maximize the variance captured from the data C) Minimize the number of features D) Maximize the number of zero loadings
Question 29. Which of the following is a key advantage of hierarchical clustering over K‑Means? A) Requires pre‑specifying the number of clusters B) Produces a dendrogram that shows nested cluster relationships C) Scales linearly with dataset size D) Guarantees spherical clusters Answer: B Explanation: Hierarchical clustering creates a tree‑like structure (dendrogram) illustrating how clusters merge or split. Question 30. In ARIMA(p,d,q) models, the parameter “d” represents: A) Number of autoregressive terms B) Number of moving‑average terms C) Degree of differencing to achieve stationarity D) Seasonal period Answer: C Explanation: “d” indicates how many times the series is differenced to remove trend and achieve stationarity. Question 31. Which deep‑learning architecture is most suitable for modeling sequential data such as click‑stream logs? A) Convolutional Neural Network (CNN)
B) Feed‑forward Multilayer Perceptron (MLP) C) Recurrent Neural Network (RNN) D) Autoencoder Answer: C Explanation: RNNs retain information across time steps, making them ideal for sequential patterns. Question 32. SHAP values provide: A) Global feature importance only B) Local explanations based on game theory, attributing each feature’s contribution to a single prediction C) Model‑agnostic confidence intervals D) A method for hyperparameter tuning Answer: B Explanation: SHAP (Shapley Additive Explanations) quantifies each feature’s marginal contribution for individual predictions. Question 33. Which of the following best describes “concept drift” in a deployed predictive model? A) Changes in the underlying data distribution over time affecting model performance B) Errors caused by faulty code in the model pipeline C) Overfitting to the training data D) A sudden increase in model latency
Explanation: Fairness‑aware methods adjust model training to reduce disparate impact while preserving predictive power. Question 36. Differential privacy primarily aims to: A) Improve model accuracy B) Ensure that the inclusion or exclusion of a single record does not significantly affect the output C) Speed up model training D) Reduce overfitting Answer: B Explanation: Differential privacy adds noise to protect individual data contributions while allowing aggregate analysis. Question 37. Which cloud service is most commonly used to host a RESTful API for serving machine‑learning predictions? A) Amazon S B) Google BigQuery C) AWS Lambda (or Azure Functions) D. Hadoop Distributed File System Answer: C Explanation: Serverless functions like AWS Lambda enable low‑latency, scalable API endpoints for model inference.
Question 38. In a lift chart, the “baseline” line represents: A) A perfect model B) Random guessing (no predictive power) C) The model’s training error D. The optimal threshold Answer: B Explanation: The baseline reflects the expected lift if predictions were random, serving as a reference for model improvement. Question 39. Which evaluation metric is most appropriate for a regression model used to forecast weekly sales where relative error matters more than absolute error? A) RMSE B) MAE C) MAPE D) R‑squared Answer: C Explanation: MAPE expresses error as a percentage of actual values, highlighting relative discrepancies. Question 40. When using L1 regularization (Lasso) for feature selection, a coefficient that becomes exactly zero indicates: A) The feature is highly important B) The feature is irrelevant or redundant for the model
Explanation: Overfitting occurs when a model captures noise in the training data, leading to excellent training performance but poor generalization. Question 43. Which method can be used to reduce variance without increasing bias significantly? A) Increasing model complexity B) Bagging (e.g., Random Forest) C) Removing regularization D. Using a single deep neural network Answer: B Explanation: Bagging aggregates multiple models trained on bootstrapped samples, lowering variance while maintaining bias. Question 44. In a marketing campaign, which model output is most directly useful for “personalized product recommendation”? A) Binary churn prediction B) Predicted probability of purchase for each product C) Cluster label of the customer D. Seasonal trend component Answer: B Explanation: Predicted purchase probabilities enable ranking items per customer for personalized recommendations.
Question 45. Which of the following is NOT a typical step in the data‑preparation phase? A) Feature scaling B) Hyperparameter tuning C) Missing‑value imputation D. Outlier treatment Answer: B Explanation: Hyperparameter tuning occurs after data preparation, during model training and validation. Question 46. Which evaluation technique provides the most reliable estimate of model performance when data is limited? A) Simple train/test split (70/30) B) Leave‑One‑Out Cross‑Validation C. Random subsampling without replacement D. Using the entire dataset for training Answer: B Explanation: LOOCV uses each observation once as a test case, maximizing training data while providing unbiased performance estimates. Question 47. In the context of XGBoost, which parameter controls the depth of each tree? A) eta B) max_depth C. subsample