























































































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
This practice exam is designed for those seeking certification in predictive analytics. It tests knowledge in statistical methods, data analysis, machine learning models, and business forecasting techniques, preparing candidates to use data for decision-making and predictive insights.
Typology: Exams
1 / 95
This page cannot be seen from the preview
Don't miss anything!
























































































Question 1. Which of the following best describes the primary difference between descriptive and predictive analytics? A) Descriptive analytics forecasts future outcomes, while predictive analytics summarizes past data. B) Descriptive analytics uses statistical models, whereas predictive analytics relies on data visualization only. C) Descriptive analytics explains what happened, while predictive analytics estimates what will happen. D) Descriptive analytics is limited to structured data, while predictive analytics works only with unstructured data. Answer: C Explanation: Descriptive analytics focuses on summarizing historical data to understand past events, whereas predictive analytics applies statistical or machine learning models to estimate future outcomes. Question 2. In the CRISP-DM methodology, which phase directly follows “Data Understanding”? A) Business Understanding B) Data Preparation C) Modeling D) Deployment Answer: B Explanation: After gaining an understanding of the data, the next step is to clean, transform, and prepare it for modeling. Question 3. Which statistical measure is most appropriate for describing the central tendency of a highly skewed continuous variable? A) Mean
B) Median C) Mode D) Standard deviation Answer: B Explanation: The median is resistant to extreme values and better represents the center of a skewed distribution than the mean. Question 4. The probability mass function of a Binomial distribution with parameters n=10 and p=0.3 gives the probability of obtaining exactly 4 successes. Which expression computes this probability? A) C(10,4)·0.3⁴·0.7⁶ B) (10⁴)·0.3·0. C) 10·0.3⁴·0.7⁶ D) C(10,4)·0.7⁴·0.3⁶ Answer: A Explanation: The binomial probability formula is C(n,k)·pᵏ·(1-p)ⁿ⁻ᵏ; substituting n=10, k=4, p=0.3 yields option A. Question 5. In hypothesis testing, a p-value of 0.04 indicates which of the following at a 5 % significance level? A) Fail to reject the null hypothesis B) Reject the null hypothesis C) Accept the alternative hypothesis with certainty D) The test is inconclusive Answer: B Explanation: Since 0.04 < 0.05, the result is statistically significant and the null hypothesis is rejected.
Explanation: K-NN imputation uses similar observations to estimate missing values, better retaining the distribution’s variance than simple mean/median substitution. Question 9. Which of the following is a common method for detecting outliers in a univariate continuous feature? A) Pearson correlation B) Z-score greater than 3 C) Chi-square test D) One-hot encoding Answer: B Explanation: Observations with absolute Z-scores > 3 are typically considered outliers in a normally distributed variable. Question 10. Min-Max scaling transforms a feature to which range? A) 0 to 1 B) –1 to 1 C) –∞ to +∞ D) 0 to 100 Answer: A Explanation: Min-Max scaling linearly rescales data so that the minimum becomes 0 and the maximum becomes 1. Question 11. When creating dummy variables for a categorical variable with three levels (Red, Blue, Green), how many dummy columns should be generated to avoid the dummy-variable trap? A) 1 B) 2
Answer: B Explanation: With k categories, k-1 dummy variables are required; the omitted category serves as the reference level. Question 12. In time-series feature engineering, which lag feature would you create to capture the value from two periods ago? A) lag_ B) lag_ C) lead_ D) diff_ Answer: B Explanation: lag_2 stores the observation from two time steps prior, useful for autoregressive modeling. Question 13. A box plot is most useful for visualizing which of the following aspects of a variable? A) Frequency distribution B) Correlation with another variable C) Central tendency and dispersion, including outliers D) Time-based trends Answer: C Explanation: Box plots display median, quartiles, and potential outliers, summarizing distribution shape and spread. Question 14. In a scatter plot matrix of several continuous variables, what does a strong linear pattern between two variables suggest?
Question 17. Which assumption of ordinary least squares regression is violated if residuals exhibit a funnel-shaped pattern when plotted against fitted values? A) Linearity B) Independence C) Homoscedasticity D) Normality Answer: C Explanation: A funnel shape indicates non-constant variance (heteroscedasticity), breaching the homoscedasticity assumption. Question 18. Variance Inflation Factor (VIF) values greater than 10 typically indicate: A) Strong multicollinearity among predictors. B) Overfitting due to too many observations. C) Non-linear relationships requiring transformation. D) Perfect model fit. Answer: A Explanation: High VIF values suggest that a predictor is highly correlated with other predictors, leading to multicollinearity. Question 19. Ridge regression differs from Lasso regression mainly in its penalty term because: A) Ridge uses L1 norm, Lasso uses L2 norm. B) Ridge uses L2 norm, Lasso uses L1 norm. C) Both use L1 norm, but Ridge adds a bias term. D) Both use L2 norm, but Lasso shrinks coefficients to zero. Answer: B
Explanation: Ridge adds an L2 (squared) penalty, while Lasso adds an L (absolute) penalty, leading to coefficient shrinkage and possible variable elimination. Question 20. In logistic regression, the odds ratio associated with a predictor of 1.5 means: A) The odds increase by 150 % for each unit increase in the predictor. B) The odds decrease by 1.5 times for each unit increase. C) The probability of the outcome increases by 1.5 %. D) The log-odds increase by 1.5 units. Answer: A Explanation: An odds ratio of 1.5 indicates that the odds are multiplied by 1. (i.e., increase by 50 %) for each one-unit increase in the predictor. Question 21. Which loss function is minimized when training a binary classification Support Vector Machine? A) Hinge loss B) Logistic loss C) Squared error loss D) Cross-entropy loss Answer: A Explanation: SVMs use hinge loss to maximize the margin between classes while penalizing misclassifications. Question 22. In a decision tree, which impurity measure is based on the concept of information entropy? A) Gini impurity B) Misclassification error
A) The validation set left aside before training. B) The subset of trees that did not use a particular observation during bootstrap sampling. C) The average error across all trees on the training data. D) The error on the test set after model deployment. Answer: B Explanation: OOB error uses observations not included in the bootstrap sample for a given tree, providing an internal performance estimate. Question 26. Which kernel function enables a Support Vector Machine to model non-linear relationships by mapping data into an infinite-dimensional space? A) Linear kernel B) Polynomial kernel C) Radial Basis Function (RBF) kernel D) Sigmoid kernel Answer: C Explanation: The RBF kernel computes similarity based on Euclidean distance and implicitly maps data to a high-dimensional space, capturing complex patterns. Question 27. In k-Nearest Neighbors classification, choosing a very large k value will most likely: A) Increase model variance. B) Decrease bias but increase variance. C) Increase bias and reduce variance. D) Have no effect on bias-variance trade-off. Answer: C
Explanation: A large k smooths decision boundaries, leading to higher bias (oversimplification) while reducing variance (less sensitivity to noise). Question 28. Which of the following is NOT a typical step in the model tuning process? A) Grid search over hyperparameter space. B) Random search for hyperparameters. C) Manual selection of the final model without validation. D) Cross-validation to assess each hyperparameter combination. Answer: C Explanation: Manual selection without validation defeats the purpose of systematic tuning; the other options are standard practices. Question 29. The confusion matrix element representing instances correctly predicted as the negative class is: A) True Positive (TP) B) False Positive (FP) C) True Negative (TN) D) False Negative (FN) Answer: C Explanation: True Negative counts the correctly identified negative cases. Question 30. For an imbalanced binary classification problem, which metric is most informative when the minority class is of primary interest? A) Accuracy B) Precision C) Recall (Sensitivity) D) Specificity
D. Stratified sampling validation Answer: C Explanation: Rolling-origin validation respects temporal order by training on past data and testing on subsequent periods. Question 34. The Augmented Dickey-Fuller (ADF) test is used to assess: A) Autocorrelation in residuals. B) Stationarity of a time-series. C) Seasonality strength. D) Normality of errors. Answer: B Explanation: ADF tests the null hypothesis of a unit root; rejection indicates stationarity. Question 35. When a time-series exhibits both trend and seasonal patterns, which decomposition model is appropriate if the amplitude of the seasonal component grows with the level of the series? A) Additive decomposition B) Multiplicative decomposition C) STL decomposition only D) No decomposition needed Answer: B Explanation: Multiplicative decomposition assumes that seasonal variation changes proportionally with the series level. Question 36. In exponential smoothing, which method is best suited for data with a trend but no seasonality? A) Simple Exponential Smoothing (SES)
B) Holt’s Linear Trend method C) Holt-Winters Seasonal method D) Moving Average Answer: B Explanation: Holt’s method extends SES by adding a component to capture linear trends. Question 37. For an ARIMA(p,d,q) model, the parameter “d” represents: A) Number of autoregressive terms. B) Number of differencing operations applied to achieve stationarity. C) Number of moving-average terms. D) Seasonal period length. Answer: B Explanation: “d” is the order of differencing needed to render the series stationary. Question 38. In the identification of ARIMA parameters, the partial autocorrelation function (PACF) is primarily used to determine: A) The moving-average order (q). B) The differencing order (d). C) The autoregressive order (p). D) Seasonal period (s). Answer: C Explanation: PACF cuts off after lag p for a pure AR process, helping to select the AR order. Question 39. Which forecasting accuracy metric is scale-independent and expressed as a percentage?
Question 42. Principal Component Analysis (PCA) primarily aims to: A) Increase the number of features. B) Reduce dimensionality while preserving as much variance as possible. C. Convert categorical variables into numeric. D. Perform supervised classification. Answer: B Explanation: PCA creates orthogonal components that capture maximal variance, facilitating dimensionality reduction. Question 43. In association rule mining, the “lift” metric greater than 1 indicates: A) The antecedent and consequent are independent. B) The rule is less useful than random chance. C) Positive association; the occurrence of the antecedent increases the likelihood of the consequent. D) Negative correlation between antecedent and consequent. Answer: C Explanation: Lift = confidence / (support of consequent); values > 1 mean the rule predicts the consequent better than chance. Question 44. Which of the following is a common technique for handling high-cardinality categorical variables in tree-based models? A) One-hot encoding all categories. B) Target encoding (mean encoding). C) Dropping the variable entirely. D) Converting to binary using ASCII codes. Answer: B
Explanation: Target encoding replaces categories with the mean of the target, reducing dimensionality while preserving predictive information for tree models. Question 45. When evaluating a regression model on a test set, you obtain an R-squared of – 0.12. What does this indicate? A) The model explains 12 % of variance. B) The model performs worse than a simple mean-only predictor. C) The model has perfect fit. D) There is a calculation error; R-squared cannot be negative. Answer: B Explanation: Negative R² occurs when the model’s residual sum of squares exceeds that of the baseline (mean) model, indicating poor performance. Question 46. In a classification problem with three classes, which metric extends the binary ROC AUC concept? A) One-vs-Rest AUC B) Macro-averaged F1 score only C) Confusion matrix diagonal sum D. Gini coefficient only Answer: A Explanation: For multiclass, AUC can be computed using a One-vs-Rest approach and then averaged (macro or weighted). Question 47. Which regularization technique can perform both variable selection and coefficient shrinkage simultaneously? A) Ridge regression B) Elastic Net
Question 50. Which validation technique provides the most unbiased estimate of model performance when the dataset is small? A) Simple train-test split (70/30) B) 5-fold cross-validation C. Leave-One-Out Cross-Validation (LOOCV) D. Bootstrap .632 estimator Answer: C Explanation: LOOCV uses every observation as a test case once, maximizing training data while providing an almost unbiased performance estimate for small datasets. Question 51. In a hierarchical clustering dendrogram, the height at which two clusters are merged represents: A) The number of observations in each cluster. B) The distance (or dissimilarity) between the clusters. C) The silhouette score. D) The number of features used. Answer: B Explanation: The vertical line height indicates the linkage distance, i.e., how dissimilar the merged clusters are. Question 52. Which distance metric is most appropriate for binary presence/absence data in clustering? A) Euclidean distance B) Manhattan distance C) Jaccard similarity (converted to distance) D) Cosine similarity Answer: C
Explanation: Jaccard focuses on the proportion of shared “1”s relative to the union, making it suitable for binary sparse data. Question 53. In a Gaussian Mixture Model (GMM), the Expectation-Maximization (EM) algorithm iteratively performs which two steps? A) Gradient descent and backpropagation. B) Sampling and bootstrapping. C) E-step (estimate responsibilities) and M-step (update parameters). D) Pruning and bagging. Answer: C Explanation: EM alternates between estimating the probability that each data point belongs to each component (E-step) and maximizing the likelihood by updating component parameters (M-step). Question 54. When deploying a predictive model in production, which practice helps ensure that model inputs remain consistent with the training data schema? A) Retraining the model daily without monitoring. B) Implementing data validation and schema enforcement at the inference layer. C. Ignoring missing values during prediction. D. Using a different feature set for each prediction. Answer: B Explanation: Validating incoming data against the original schema prevents drift and runtime errors. Question 55. Which of the following is a common cause of data leakage in predictive modeling?