
























































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The Predictive Analytics Modeler Explorer Award Ultimate Exam is a specialized preparation resource focused on predictive analytics and data modeling concepts. Candidates learn statistical analysis, machine learning fundamentals, data visualization, forecasting methods, model evaluation, and business intelligence applications. This exam guide is ideal for professionals seeking to strengthen analytical decision-making and predictive modeling skills.
Typology: Exams
1 / 64
This page cannot be seen from the preview
Don't miss anything!

























































Question 1. Which CRISP-DM phase focuses on converting a business problem into a data-science objective? A) Data Preparation B) Business Understanding C) Modeling D) Evaluation Answer: B Explanation: Business Understanding translates the business problem into analytic goals and defines success criteria. Question 2. In predictive analytics, the model that estimates future values is called a __________. A) Descriptive model B) Diagnostic model C) Predictive model D) Prescriptive model Answer: C Explanation: Predictive models forecast future outcomes; descriptive models explain past data. Question 3. Which learning paradigm is used when the target variable is categorical? A) Supervised Regression B) Unsupervised Clustering
C) Supervised Classification D) Reinforcement Learning Answer: C Explanation: Classification predicts categorical outcomes, a form of supervised learning. Question 4. Which data type can only take the values 0 or 1? A) Ordinal B) Continuous C) Flag D) Nominal Answer: C Explanation: Flag fields are binary indicators (e.g., true/false). Question 5. Which visual tool is most appropriate for detecting outliers in a continuous variable? A) Bar chart B) Histogram C) Scatter plot D) Pie chart Answer: C Explanation: Scatter plots display individual observations, making extreme values visible.
C) Reclassification D) Sampling Answer: C Explanation: Binning groups continuous values into categorical bins, a form of reclassification. Question 9. Which operation adds new rows from another dataset that has the same columns? A) Merging B) Appending C) Pivoting D) Sampling Answer: B Explanation: Appending stacks datasets vertically, increasing the record count. Question 10. In a typical train-test split, what percentage of data is often reserved for testing? A) 10% B) 20% C) 30% D) 50% Answer: B
Explanation: A 70/30 split (training 70%, testing 30%) is common to evaluate model performance. Question 11. Which decision-tree algorithm uses chi-square tests to choose splits? A) C&R Tree B) CHAID C) QUEST D) CART Answer: B Explanation: CHAID (Chi-Square Automatic Interaction Detection) selects splits based on chi-square statistics. Question 12. Logistic regression is primarily used for: A) Predicting continuous outcomes B) Clustering data points C) Predicting binary outcomes D) Reducing dimensionality Answer: C Explanation: Logistic regression models the probability of a binary (yes/no) event. Question 13. A neural network with one hidden layer is capable of approximating:
Answer: A Explanation: K-Means needs the user to define the desired number of clusters (K). Question 16. In market-basket analysis, the “lift” of an association rule measures: A) The support of the rule B) The confidence relative to random chance C) The number of items in the basket D) The correlation coefficient Answer: B Explanation: Lift compares rule confidence to the expected confidence if items were independent. Question 17. A “nugget” in Predictive Modeler Explorer refers to: A) Raw data source B) Model object containing parameters and metadata C) Scoring script D) Evaluation chart Answer: B Explanation: Nuggets are the serialized model artifacts that can be applied to new data.
Question 18. Which confusion-matrix component represents true positives? A) Bottom-right cell B) Top-left cell C) Bottom-left cell D) Top-right cell Answer: B Explanation: In a standard layout, true positives occupy the top-left position (predicted yes & actual yes). Question 19. The F1-score is the harmonic mean of: A) Accuracy and Recall B) Precision and Recall C) Sensitivity and Specificity D) True Positive Rate and False Positive Rate Answer: B Explanation: F1-score balances precision and recall, useful for imbalanced classes. Question 20. Mean Squared Error (MSE) penalizes errors by: A) Taking the absolute value B) Squaring the residuals before averaging C) Using the median of residuals
Question 23. Which hyperparameter controls the maximum number of splits in a decision tree? A) Learning rate B) Tree depth C) Number of clusters D) Number of epochs Answer: B Explanation: Tree depth limits how deep the tree can grow, indirectly limiting splits. Question 24. Bagging primarily reduces model variance by: A) Averaging predictions from multiple bootstrap samples B) Adding regularization penalties C) Pruning tree branches D) Using gradient descent Answer: A Explanation: Bagging (Bootstrap Aggregating) builds several models on resampled data and averages them to lower variance. Question 25. Boosting differs from bagging because it: A) Trains models sequentially, focusing on previous errors B) Uses only a single model C) Randomly drops features at each split D) Does not improve accuracy
Answer: A Explanation: Boosting adjusts weights of mis-classified instances, training models sequentially to correct errors. Question 26. Exporting a model to PMML enables: A) Real-time scoring only in SAS B) Model portability across platforms that support PMML C) Automatic hyperparameter tuning D) Direct database updates Answer: B Explanation: PMML (Predictive Model Markup Language) is an XML standard for sharing models between tools. Question 27. Model drift is detected when: A) Training time exceeds 1 hour B) Model performance degrades on new data C) Number of features increases D) The model file size grows Answer: B Explanation: Drift occurs when the relationship between inputs and target changes, causing performance loss. Question 28. In the CRISP-DM lifecycle, which phase is iterative and may be revisited after evaluation?
Explanation: IQR (Q3-Q1) identifies the spread of the middle 50% and helps flag extreme values. Question 31. Which statistical test is appropriate for assessing the relationship between two nominal variables? A) Pearson correlation B) T-test C) Chi-square test D) ANOVA Answer: C Explanation: Chi-square evaluates independence between categorical variables. Question 32. When creating a derived field using CLEM, the expression “IF Age>30 THEN 1 ELSE 0” produces: A) A continuous variable B) A binary flag C) A text string D) A missing value indicator Answer: B Explanation: The IF-THEN-ELSE logic creates a binary flag indicating whether Age exceeds 30.
Question 33. Which sampling technique ensures each observation has an equal chance of being selected? A) Stratified sampling B) Systematic sampling C) Simple random sampling D) Cluster sampling Answer: C Explanation: Simple random sampling draws each record with equal probability. Question 34. In a regression tree, the leaf node value represents: A) The most frequent class B) The mean of the target variable for that region C) The median of the predictor variables D) The probability of a categorical outcome Answer: B Explanation: Regression trees predict a continuous value; leaf nodes output the average target of training cases falling in that leaf. Question 35. Which metric is insensitive to class imbalance? A) Accuracy B) Precision
Explanation: PCA transforms correlated variables into a smaller set of orthogonal components. Question 38. When a model is overfitted, its performance on the training data is: A) Lower than on test data B) Similar to test data C) Much higher than on test data D) Unrelated to test data Answer: C Explanation: Overfitting leads to excellent training accuracy but poor generalization to unseen data. Question 39. Which of the following is an advantage of using ensemble methods? A) Simpler interpretation B) Reduced computational cost C) Improved predictive accuracy D) Eliminates need for data preprocessing Answer: C Explanation: Ensembles combine multiple models to boost accuracy and robustness. Question 40. In scoring new data, the term “batch scoring” refers to:
A) Real-time API calls B) Scoring a single record at a time C) Scoring a large set of records in one operation D) Manual entry of predictions Answer: C Explanation: Batch scoring processes many records together, often scheduled regularly. Question 41. Which data integration technique is used to combine customer demographics (rows) with transaction history (columns) for the same customer ID? A) Appending B) Merging (horizontal join) C) Sampling D) Binning Answer: B Explanation: Horizontal merging joins datasets on a key, adding new columns to existing rows. Question 42. A model exported as XML is most likely intended for: A) Direct execution in a spreadsheet B) Import into a web-service or application that reads XML C) Visualization in PowerPoint
Question 45. Which evaluation plot helps compare the cumulative capture of positives across model score deciles? A) ROC curve B) Gain chart C) Residual plot D) Scatter matrix Answer: B Explanation: Gain charts display cumulative positive response by decile, illustrating model lift. Question 46. In unsupervised learning, the term “silhouette score” measures: A) Predictive accuracy B) Cluster cohesion and separation C) Feature importance D) Model runtime Answer: B Explanation: Silhouette score quantifies how well each observation fits within its cluster versus other clusters. Question 47. Which of the following is a common technique for handling high cardinality categorical variables? A) One-hot encoding all levels
B. Dropping the variable entirely C) Target encoding (mean encoding) D) Treating them as continuous Answer: C Explanation: Target encoding replaces categories with the mean of the target, reducing dimensionality. Question 48. In a time-series forecasting problem, the appropriate predictive modeling approach is: A) K-Means clustering B) Logistic regression C) ARIMA or exponential smoothing D) Decision tree classification Answer: C Explanation: ARIMA and exponential smoothing are designed for temporal data forecasting. Question 49. Which step directly follows “Modeling” in the CRISP-DM lifecycle? A) Data Preparation B) Evaluation C) Deployment D) Business Understanding Answer: B