




























































































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
This practice exam for a predictive analytics certificate program features 26 multiple-choice questions. It covers key concepts like statistical models, CRISP-DM, data preparation, hypothesis testing, linear algebra, and machine learning. Detailed answer explanations are provided, making it ideal for certification preparation or enhancing predictive analytics knowledge. Topics include data types, distributions, regression, decision trees, and model evaluation, offering a comprehensive review of essential principles. It tests and reinforces knowledge of concepts and methodologies, from basic statistics to advanced machine learning, providing a thorough assessment.
Typology: Exams
1 / 101
This page cannot be seen from the preview
Don't miss anything!





























































































Question 1. Which of the following most accurately defines predictive analytics? A) Summarizing historical data to describe what happened B) Using statistical models to forecast future events C) Optimizing decisions based on simulation outcomes D) Visualizing data trends for stakeholder communication Answer: B Explanation: Predictive analytics employs statistical and machine‑learning models to estimate future outcomes based on historical patterns. Question 2. In the CRISP‑DM methodology, which phase is primarily concerned with assessing the quality of the data and handling missing values? A) Business Understanding B) Data Understanding C) Data Preparation D) Modeling Answer: C Explanation: Data Preparation focuses on cleaning, transforming, and preparing data for modeling, including missing‑value treatment. Question 3. Which of the following is a continuous variable? A) Customer gender (Male/Female) B) Order status (Pending, Shipped, Delivered) C) Daily sales revenue in dollars D) Product category (Electronics, Clothing)
Answer: C Explanation: Continuous variables can take any numeric value within a range; daily revenue is measured on a continuous scale. Question 4. The mean of a normally distributed variable is 50 and the standard deviation is
D) Data have a linear trend over time Answer: B Explanation: Median imputation is robust to outliers, reducing distortion when extreme values are present. Question 9. Which scaling technique transforms each feature to have a mean of 0 and a standard deviation of 1? A) Min‑Max scaling B) Log transformation C) Z‑score standardization D) Decimal scaling Answer: C Explanation: Z‑score (standard score) standardizes features to zero mean and unit variance. Question 10. Creating dummy variables from a categorical feature with three levels results in how many binary columns? A) 1 B) 2 C) 3 D) 4 Answer: B Explanation: For k categories, k‑1 dummy variables are needed to avoid perfect multicollinearity.
Question 11. Which plot is most appropriate for visualizing the distribution of a single continuous variable? A) Scatter plot B) Histogram C) Bar chart D) Pie chart Answer: B Explanation: Histograms display frequency counts across intervals, revealing the shape of a continuous distribution. Question 12. In a correlation matrix, a value of – 0.85 between two variables indicates: A) Strong positive linear relationship B) Weak negative linear relationship C) Strong negative linear relationship D) No linear relationship Answer: C Explanation: Correlation coefficients near – 1 denote a strong inverse linear association. Question 13. Simple Linear Regression assumes that the residuals are: A) Heteroscedastic B) Autocorrelated C) Normally distributed with constant variance
Question 16. Ridge regression primarily addresses which issue in linear models? A) High bias B) Multicollinearity C) Non‑linear relationships D) Categorical predictors Answer: B Explanation: Ridge adds an L2 penalty, shrinking coefficients and reducing variance caused by multicollinearity. Question 17. Lasso regression differs from Ridge regression because Lasso can: A) Increase model bias more than Ridge B) Set some coefficients exactly to zero, performing variable selection C) Only be applied to classification problems D) Use a quadratic penalty term Answer: B Explanation: Lasso’s L1 penalty can zero out less important coefficients, effectively selecting features. Question 18. Which of the following is a correct interpretation of the odds ratio (OR) of 2. for a binary predictor in logistic regression? A) The predictor reduces the odds of the outcome by 2.5 times B) The predictor increases the probability of the outcome by 2.5% C) The odds of the outcome are 2.5 times higher when the predictor is present
D) The predictor has no effect on the outcome Answer: C Explanation: An OR > 1 indicates higher odds of the event when the predictor equals 1. Question 19. In a decision tree, which impurity measure is based on the probability of misclassifying a randomly chosen element? A) Gini impurity B) Entropy C) Mean Squared Error D) Information Gain Ratio Answer: A Explanation: Gini impurity reflects the expected misclassification rate if a randomly selected element were labeled according to class proportions. Question 20. Pruning a decision tree primarily helps to: A) Increase the depth of the tree B) Reduce overfitting and improve generalization C) Convert the tree into a linear model D) Add more splits for each leaf Answer: B Explanation: Pruning removes branches that provide little predictive power, lowering variance.
Answer: C Explanation: The RBF (Gaussian) kernel corresponds to an infinite‑dimensional Hilbert space. Question 24. In k‑Nearest Neighbors classification, the choice of k primarily influences: A) Model’s ability to capture non‑linear relationships B) The bias‑variance trade‑off, with larger k increasing bias and reducing variance C) The depth of the decision boundary D) The number of features used Answer: B Explanation: Larger k smooths the decision surface (higher bias, lower variance); smaller k does the opposite. Question 25. Which of the following is NOT a valid method for evaluating a regression model’s performance? A) Mean Squared Error (MSE) B) R‑squared C) Confusion Matrix D) Mean Absolute Error (MAE) Answer: C Explanation: A confusion matrix applies to classification, not regression. Question 26. A model with high training accuracy but low validation accuracy is likely suffering from: A) Underfitting
B) Overfitting C) Data leakage D) Class imbalance Answer: B Explanation: Overfitting occurs when a model captures noise in the training set, failing to generalize. Question 27. In K‑fold cross‑validation with K = 5, each fold is used as the validation set how many times? A) 1 B) 2 C) 4 D) 5 Answer: D Explanation: Each of the 5 folds serves as the validation set once, while the remaining 4 folds train the model. Question 28. Which metric is most appropriate when the cost of false negatives is much higher than false positives? A) Accuracy B) Precision C) Recall (Sensitivity) D) F1‑Score
B) Presence of autocorrelation C) Stationarity of a series D) Forecast accuracy Answer: C Explanation: ADF tests the null hypothesis that a unit root is present (non‑stationary). Question 32. Differencing a time series once is primarily intended to: A) Remove seasonality B) Make the series stationary by eliminating trend C) Increase the series length D) Smooth out random noise Answer: B Explanation: First‑order differencing subtracts the previous observation, helping to eliminate trends. Question 33. In an ARIMA(p,d,q) model, the ‘q’ component refers to: A) Number of autoregressive terms B) Number of differencing operations C) Number of moving‑average terms D) Seasonal period Answer: C Explanation: ‘q’ is the order of the moving‑average part, modeling past forecast errors.
Question 34. Which method is best suited for forecasting data with both trend and seasonal patterns? A) Simple Moving Average B) Holt’s Linear Trend method C) Holt‑Winters Triple Exponential Smoothing D) Naïve Forecast Answer: C Explanation: Holt‑Winters captures level, trend, and seasonality simultaneously. Question 35. The Mean Absolute Percentage Error (MAPE) is problematic when actual values contain zeros because: A) Division by zero is undefined, leading to infinite errors B) It over‑penalizes large errors C) It becomes negative D) It gives the same result as MAE Answer: A Explanation: MAPE involves dividing by the actual value; zero actuals cause undefined or infinite percentages. Question 36. In clustering, the silhouette coefficient measures: A) The distance between cluster centroids B) The similarity of an object to its own cluster compared to other clusters C) The probability that a point belongs to a cluster
Question 39. In association rule mining, the metric “lift” greater than 1 indicates: A) The antecedent and consequent are independent B) The rule has high confidence but low support C) The occurrence of the antecedent increases the likelihood of the consequent D) The rule is invalid Answer: C Explanation: Lift > 1 suggests a positive association beyond chance between antecedent and consequent. Question 40. Which of the following best describes the “curse of dimensionality”? A) Model performance always improves with more features B) High‑dimensional spaces cause distances between points to become less discriminative C) Data become more sparse as sample size increases D) Computation time decreases with more dimensions Answer: B Explanation: In high dimensions, all points tend to be similarly distant, making pattern detection harder. Question 41. When applying a log transformation to a positively skewed variable, the primary effect is: A) Increasing variance B) Making the distribution more symmetric C) Converting it to a categorical variable D) Removing outliers
Answer: B Explanation: Log transformation compresses large values, reducing skewness and approximating normality. Question 42. Which evaluation metric is most appropriate for imbalanced binary classification where the minority class is the focus? A) Overall accuracy B) Macro‑averaged F1‑Score C) ROC‑AUC D) Precision‑Recall AUC Answer: D Explanation: Precision‑Recall curves emphasize performance on the positive (minority) class and are more informative than ROC‑AUC in severe imbalance. Question 43. In a confusion matrix, the term “True Negative Rate” is synonymous with: A) Specificity B) Sensitivity C) Precision D) Recall Answer: A Explanation: True Negative Rate (TNR) measures the proportion of actual negatives correctly identified, i.e., specificity.
D) Cosine similarity Answer: C Explanation: Hamming distance counts mismatches between binary strings, suiting dummy‑encoded categorical data. Question 47. In time‑series cross‑validation (rolling origin), why is it preferred over random K‑fold splits? A) It provides more folds B) It respects temporal ordering, preventing future data from leaking into training C) It reduces computational cost D) It increases model variance Answer: B Explanation: Rolling origin maintains chronological integrity, essential for realistic forecasting evaluation. Question 48. Which of the following is a common technique to detect multicollinearity before modeling? A) Plotting residuals vs. fitted values B) Calculating pairwise Pearson correlation coefficients C) Performing a chi‑squared test D) Using the Kolmogorov‑Smirnov test Answer: B Explanation: High pairwise correlations suggest potential multicollinearity among predictors.
Question 49. The “bias‑variance trade‑off” states that reducing bias typically leads to: A) Increased variance B) Decreased variance C) No change in variance D) Better interpretability Answer: A Explanation: Simplifying a model (high bias) reduces variance, while complex models (low bias) increase variance; they move inversely. Question 50. In a logistic regression model, the log‑likelihood function is maximized using: A) Ordinary Least Squares B) Gradient Descent or Newton‑Raphson methods C) K‑means clustering D) Principal Component Analysis Answer: B Explanation: Logistic regression uses iterative optimization (e.g., Newton‑Raphson) to maximize the log‑likelihood. Question 51. Which of the following best describes “bagging” in ensemble learning? A) Sequentially correcting errors of previous models B) Training multiple models on different random subsets of data and aggregating predictions C) Combining models with different algorithms into a single meta‑learner