















































































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
This certification evaluates core data science competencies including statistics, data analysis, machine learning fundamentals, data visualization, and ethical data usage. Candidates demonstrate the ability to derive insights from data and support data-driven decision-making.
Typology: Exams
1 / 87
This page cannot be seen from the preview
Don't miss anything!
















































































Question 1. In the OSEMN framework, which step directly follows “Scrub”? A) Obtain B) Explore C) Model D) Interpret Answer: B Explanation: After cleaning (Scrub) the data, the next logical phase is Exploratory Data Analysis (Explore) to understand patterns before modeling. Question 2. Which CRISP‑DM phase is primarily concerned with “business understanding”? A) Data Understanding B) Business Understanding C) Deployment D) Data Preparation Answer: B Explanation: The Business Understanding phase defines objectives, success criteria, and project constraints. Question 3. The mean of a dataset is 20 and the standard deviation is 5. Approximately what percentage of data lies between 10 and 30 assuming a normal distribution? A) 68% B) 95% C) 99.7% D) 50% Answer: A Explanation: One standard deviation (±5) from the mean (20) covers about 68% of values in a normal distribution.
Question 4. A hypothesis test yields a p‑value of 0.03. At a significance level of α=0.05, what is the correct decision? A) Fail to reject the null hypothesis B) Reject the null hypothesis C) Increase the sample size D) Change the test statistic Answer: B Explanation: Since p < α, we reject the null hypothesis, indicating statistical significance. Question 5. Which of the following is a property of a symmetric matrix that is useful in linear regression? A) It is always diagonal B) Its eigenvectors are orthogonal C) Its determinant is zero D) It cannot be inverted Answer: B Explanation: Symmetric matrices have orthogonal eigenvectors, aiding in solving normal equations efficiently. Question 6. In calculus, the gradient of a loss function points in the direction of: A) Maximum increase of the loss B) Minimum increase of the loss C) Zero change in the loss D) Random variation Answer: A
Explanation: DISTINCT eliminates duplicate rows in the result set. Question 10. When scraping a website, which HTTP method is most commonly used to retrieve HTML content? A) POST B) PUT C) GET D) DELETE Answer: C Explanation: GET requests fetch resources without side effects, making it ideal for web scraping. Question 11. Which imputation technique preserves the distribution of a numeric variable better than mean imputation? A) Median imputation B) Mode imputation C) K‑nearest neighbors imputation D) Deletion of missing rows Answer: C Explanation: K‑NN imputes based on similar observations, maintaining local structure and distribution. Question 12. Which method detects outliers by measuring the distance of a point from its k‑nearest neighbors? A) Z‑score B) IQR C) DBSCAN D) Mahalanobis distance
Answer: C Explanation: DBSCAN labels points that are far from dense regions as noise (outliers). Question 13. Applying a log transformation to a positively skewed variable primarily helps to: A) Increase variance B) Reduce skewness and stabilize variance C) Convert categorical data to numeric D) Introduce non‑linearity Answer: B Explanation: Log transformation compresses large values, making the distribution more symmetric. Question 14. In univariate analysis, the shape of a histogram provides insight into: A) Causal relationships B) Multicollinearity C) Distribution characteristics (symmetry, skewness) D) Model residuals Answer: C Explanation: A histogram visualizes frequency of a single variable, revealing its distribution shape. Question 15. Pearson’s correlation coefficient measures: A) Causal effect B) Linear association strength between two continuous variables C) Rank‑based similarity D) Difference in means Answer: B
Explanation: Logistic regression applies the logistic (sigmoid) function to map linear scores to probabilities. Question 19. In k‑NN classification, the choice of “k” primarily controls the trade‑off between: A) Model interpretability and speed B) Bias and variance C) Feature scaling and dimensionality D) Overfitting and underfitting Answer: B Explanation: Small k leads to low bias/high variance; large k increases bias/low variance. Question 20. Which kernel function in SVM can map data into an infinite‑dimensional space? A) Linear kernel B) Polynomial kernel C) Radial Basis Function (RBF) kernel D) Sigmoid kernel Answer: C Explanation: The RBF kernel computes similarity based on distance, implicitly projecting data into infinite dimensions. Question 21. Decision trees suffer from high variance because: A) They are linear models B) They use L2 regularization C) Small changes in data can produce very different splits D) They require feature scaling Answer: C
Explanation: Trees are sensitive to data perturbations, leading to different structures and high variance. Question 22. Random Forest reduces variance compared to a single decision tree by: A) Using only one feature per split B) Averaging predictions from many bootstrapped trees C) Pruning all trees aggressively D) Applying L1 regularization Answer: B Explanation: Bagging (bootstrap aggregation) creates diverse trees; averaging their predictions stabilizes output. Question 23. Gradient Boosting builds models sequentially. Each new model attempts to: A) Maximize the margin of the previous model B) Correct the residual errors of the ensemble so far C) Randomly select a subset of features D) Reduce model depth Answer: B Explanation: Boosting fits each learner to the residuals of the combined prior learners, improving performance. Question 24. Which clustering algorithm does NOT require the number of clusters to be specified a priori? A) k‑Means B) Hierarchical Agglomerative Clustering C) DBSCAN D) Gaussian Mixture Models
Explanation: t‑SNE optimizes a probability distribution to keep neighboring points close, revealing local patterns. Question 28. In reinforcement learning, the term “policy” refers to: A) The reward function B) The state transition probabilities C) The mapping from states to actions D) The discount factor Answer: C Explanation: A policy defines the agent’s behavior by selecting actions based on current states. Question 29. The bias‑variance trade‑off describes the relationship between: A) Model complexity and training time B) Training error and test error C) Systematic error (bias) and error due to sensitivity to data fluctuations (variance) D) Number of features and number of samples Answer: C Explanation: Increasing model complexity reduces bias but raises variance; optimal models balance both. Question 30. Which validation technique is most appropriate when the dataset is highly imbalanced? A) Simple hold‑out split B. 10‑fold cross‑validation C. Stratified K‑fold cross‑validation D. Leave‑one‑out cross‑validation Answer: C
Explanation: Stratified K‑fold maintains class proportion in each fold, ensuring minority class is represented. Question 31. The coefficient of determination (R²) can be negative when: A) The model fits perfectly B) The model performs worse than predicting the mean of the target C) There are more predictors than observations D) Multicollinearity is present Answer: B Explanation: R² = 1 – (SS_res / SS_tot); if SS_res > SS_tot, R² becomes negative, indicating a poor model. Question 32. Mean Absolute Error (MAE) is preferred over Mean Squared Error (MSE) when: A) Outliers need to be heavily penalized B) Interpretability in original units is important C) The loss function must be differentiable D) Model training uses gradient descent Answer: B Explanation: MAE measures average absolute deviation, preserving original scale and being less sensitive to outliers. Question 33. The F1‑Score is the harmonic mean of precision and recall. It is especially useful when: A) Classes are balanced B) False positives are more costly than false negatives C) Both false positives and false negatives are important D) Accuracy alone is sufficient
D. Optimizing only continuous parameters Answer: B Explanation: Grid Search systematically evaluates every point in the user‑specified parameter grid. Question 37. Bayesian Optimization is advantageous over Grid Search because it: A) Requires no prior information B) Evaluates hyperparameter configurations sequentially using a surrogate model, reducing evaluations needed C) Guarantees finding the global optimum D. Works only for discrete parameters Answer: B Explanation: Bayesian methods model the performance surface and select promising points, often needing fewer trials. Question 38. In a perceptron, the activation function is typically a: A) Sigmoid B) ReLU C) Step (Heaviside) function D. Softmax Answer: C Explanation: Classic perceptrons use a binary step to output 0 or 1 based on weighted sum. Question 39. Backpropagation primarily computes: A) The forward pass predictions B) The gradient of the loss with respect to each weight using the chain rule C) Hyperparameter values
D. Data preprocessing steps Answer: B Explanation: Backpropagation propagates error gradients backward through the network to update weights. Question 40. Which activation function suffers from the “dying ReLU” problem? A) Sigmoid B) Tanh C. ReLU D. Softplus Answer: C Explanation: ReLU outputs zero for negative inputs; neurons can become stuck at zero and stop learning. Question 41. Convolutional Neural Networks are particularly effective for image data because they: A) Require no training data B) Preserve spatial hierarchies via local receptive fields and weight sharing C. Use recurrent connections D. Operate only on tabular data Answer: B Explanation: Convolutions capture local patterns and share filters across the image, reducing parameters. Question 42. In a CNN, the pooling layer primarily serves to: A) Increase feature map size B. Reduce spatial dimensions and provide translation invariance
B. Predicting the target word from its context C. Clustering words based on frequency D. Using one‑hot encoding directly Answer: A Explanation: Skip‑gram maximizes probability of context words conditioned on the target, yielding dense vectors. Question 46. In the BERT architecture, “masked language modeling” means: A) Training on only half the dataset B) Randomly masking some input tokens and predicting them, enabling bidirectional context learning C. Using only forward direction D. Ignoring punctuation Answer: B Explanation: BERT masks tokens during pre‑training, forcing the model to infer them from both left and right context. Question 47. Sentiment analysis that outputs “positive”, “negative”, or “neutral” is an example of: A) Regression B) Multi‑class classification C. Binary classification D. Clustering Answer: B Explanation: Three distinct labels constitute a multi‑class classification problem. Question 48. Topic modeling with Latent Dirichlet Allocation (LDA) assumes:
A) Documents are generated from a mixture of latent topics, each represented by a distribution over words B. Words are independent of topics C. Topics are ordered sequentially D. Each document contains only one topic Answer: A Explanation: LDA is a generative probabilistic model where each document is a combination of topics. Question 49. Which evaluation metric is appropriate for imbalanced binary classification when the cost of false negatives is high? A) Accuracy B) Precision C. Recall (Sensitivity) D. Specificity Answer: C Explanation: Recall measures the proportion of actual positives correctly identified, crucial when missing positives is costly. Question 50. The “Box‑Cox” transformation is used to: A) Encode categorical variables B. Stabilize variance and make data more normal‑like C. Increase dimensionality D. Perform feature selection Answer: B Explanation: Box‑Cox applies a power transformation parameterized to reduce skewness and heteroscedasticity.
Question 54. In a time‑series forecasting problem, which model inherently captures temporal dependencies without explicit lag features? A) Linear Regression B) Random Forest C) ARIMA D. LSTM Answer: D Explanation: LSTM networks maintain hidden states that model sequential dependencies directly. Question 55. Which loss function is appropriate for binary classification with probabilistic outputs? A) Mean Squared Error B) Huber loss C) Binary Cross‑Entropy (Log Loss) D. Hinge loss Answer: C Explanation: Binary cross‑entropy penalizes the difference between predicted probabilities and true labels. Question 56. When performing one‑hot encoding on a categorical variable with 100 levels, which technique helps avoid the “curse of dimensionality”? A) Label encoding B. Hashing trick (feature hashing) C. Scaling D. Normalization Answer: B
Explanation: Feature hashing maps categories to a fixed number of columns, reducing dimensionality. Question 57. In a confusion matrix, the term “True Negative” (TN) refers to: A) Correctly predicted positive cases B. Incorrectly predicted positive cases C. Correctly predicted negative cases D. Incorrectly predicted negative cases Answer: C Explanation: TN counts instances where the actual class is negative and the model also predicts negative. Question 58. Which regularization technique can be interpreted as adding a Gaussian prior on the model coefficients? A) L1 (Lasso) B) L2 (Ridge) C) Elastic Net D. Dropout Answer: B Explanation: Ridge (L2) corresponds to a zero‑mean Gaussian prior, penalizing large weights. Question 59. The “early stopping” technique in model training is used to: A) Reduce the size of the dataset B. Halt training when validation loss stops improving, preventing overfitting C. Increase learning rate automatically D. Perform feature selection Answer: B