Data Science Machine Learning Practice Exam, Exams of Technology

This practice exam for a data science machine learning certificate features 25 multiple-choice questions. Topics include data retrieval from APIs, missing value handling, feature engineering, PCA, time-series splitting, descriptive statistics, confusion matrix interpretation, regression metrics, bias-variance, decision trees, random forests, GBM, SVM, k-NN, silhouette score, Apriori, grid search, and stratified k-fold cross-validation. Each question includes the correct answer and a detailed explanation, making it a valuable resource for exam preparation and self-assessment. The questions cover essential concepts and techniques, providing a comprehensive review to help students and professionals test their knowledge and identify areas for improvement.

Typology: Exams

2025/2026

Available from 12/20/2025

shilpi-jain-1
shilpi-jain-1 🇮🇳

4.2

(5)

29K documents

1 / 105

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Data Science Machine Learning Certificate
Practice Exam
**Question 1.** Which of the following is the most appropriate method to retrieve data from a
RESTful API that returns JSON?
A) Using a SQL SELECT statement
B) Parsing XML with BeautifulSoup
C) Sending an HTTP GET request and decoding the JSON payload
D) Reading a flat CSV file from disk
Answer: C
Explanation: A RESTful API typically returns data in JSON format, which can be accessed by
sending an HTTP GET request and then decoding the JSON response.
**Question 2.** When a dataset contains missing values that are MCAR (Missing Completely at
Random), which imputation technique is generally acceptable without introducing bias?
A) Deleting the rows with missing values
B) Mean imputation for numerical variables
C) Using a predictive model to estimate missing values
D) Replacing with the most frequent category
Answer: B
Explanation: If data are MCAR, mean imputation preserves the overall distribution and does not
introduce systematic bias.
**Question 3.** Which scaling technique preserves the shape of the original distribution while
limiting values to a specific range?
A) ZScore standardization
B) MinMax scaling
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43
pf44
pf45
pf46
pf47
pf48
pf49
pf4a
pf4b
pf4c
pf4d
pf4e
pf4f
pf50
pf51
pf52
pf53
pf54
pf55
pf56
pf57
pf58
pf59
pf5a
pf5b
pf5c
pf5d
pf5e
pf5f
pf60
pf61
pf62
pf63
pf64

Partial preview of the text

Download Data Science Machine Learning Practice Exam and more Exams Technology in PDF only on Docsity!

Practice Exam

Question 1. Which of the following is the most appropriate method to retrieve data from a RESTful API that returns JSON? A) Using a SQL SELECT statement B) Parsing XML with BeautifulSoup C) Sending an HTTP GET request and decoding the JSON payload D) Reading a flat CSV file from disk Answer: C Explanation: A RESTful API typically returns data in JSON format, which can be accessed by sending an HTTP GET request and then decoding the JSON response. Question 2. When a dataset contains missing values that are MCAR (Missing Completely at Random), which imputation technique is generally acceptable without introducing bias? A) Deleting the rows with missing values B) Mean imputation for numerical variables C) Using a predictive model to estimate missing values D) Replacing with the most frequent category Answer: B Explanation: If data are MCAR, mean imputation preserves the overall distribution and does not introduce systematic bias. Question 3. Which scaling technique preserves the shape of the original distribution while limiting values to a specific range? A) Z‑Score standardization B) Min‑Max scaling

Practice Exam

C) Log transformation D) Robust scaling Answer: B Explanation: Min‑Max scaling linearly rescales data to a defined range (e.g., 0‑1) without altering the distribution’s shape. Question 4. In feature engineering, extracting the “day of week” from a datetime column is an example of: A) Dimensionality reduction B) Encoding categorical variables C) Creating a derived feature D) Target encoding Answer: C Explanation: “Day of week” is a new feature derived from an existing datetime column. Question 5. Which encoding method is most suitable for high‑cardinality categorical variables when the model is a tree‑based algorithm? A) One‑Hot Encoding B) Label Encoding C) Target Encoding D) Binary Encoding Answer: C

Practice Exam

B) Standard deviation C) Median D) Range Answer: C Explanation: The median is the middle value and is not affected by extreme values, unlike the mean. Question 9. A box plot displays which of the following summary measures? A) Mean, variance, and standard deviation B) Minimum, Q1, median, Q3, and maximum (excluding outliers) C) Frequency distribution of categories D) Correlation coefficients Answer: B Explanation: A box plot visualizes the five‑number summary: minimum, first quartile, median, third quartile, and maximum (with outliers shown separately). Question 10. Which correlation coefficient is appropriate for measuring monotonic relationships between two ordinal variables? A) Pearson correlation B) Spearman rank correlation C) Point‑biserial correlation D) Cramér’s V Answer: B

Practice Exam

Explanation: Spearman’s rho assesses monotonic relationships using rank‑based calculations, suitable for ordinal data. Question 11. In a confusion matrix for binary classification, which cell represents false negatives? A) Top‑left B) Top‑right C) Bottom‑left D) Bottom‑right Answer: C Explanation: In the standard layout, rows are actual classes and columns are predicted classes; bottom‑left corresponds to actual positive but predicted negative (false negative). Question 12. The Area Under the ROC Curve (AUC) is a measure of: A) Overall classification accuracy B) Model’s ability to rank positive instances higher than negative ones C) The trade‑off between precision and recall D) The proportion of correctly predicted classes Answer: B Explanation: AUC evaluates how well the model separates positive from negative classes across all possible thresholds. Question 13. Which regression metric is most sensitive to large errors? A) Mean Absolute Error (MAE)

Practice Exam

Explanation: The logistic (sigmoid) function outputs values strictly between 0 and 1, suitable for probability estimates. Question 16. Which property of a decision tree makes it prone to overfitting? A) High depth and many leaf nodes B) Use of Gini impurity C) Binary splits only D) Pruning after training Answer: A Explanation: Deep trees with many leaves capture noise in the training data, leading to overfitting. Question 17. Random Forest reduces variance primarily by: A) Boosting weak learners sequentially B) Using bagging (bootstrap sampling) and feature randomness C) Adding L1 regularization to each tree D) Stacking multiple decision trees Answer: B Explanation: Random Forest builds many trees on different bootstrap samples and random subsets of features, decreasing variance. Question 18. Gradient Boosting Machines (GBM) differ from Random Forest because they: A) Train trees in parallel B) Combine weak learners sequentially, focusing on residual errors

Practice Exam

C) Use only one tree with high depth D) Do not require hyperparameter tuning Answer: B Explanation: GBM builds trees one after another, each trying to correct the mistakes of the previous ensemble. Question 19. In Support Vector Machines, the kernel trick allows: A) Direct calculation of distances in high‑dimensional space without explicit transformation B) Faster training by reducing the number of support vectors C) Automatic feature scaling D) Handling missing values internally Answer: A Explanation: The kernel function computes inner products in a transformed feature space, enabling non‑linear separation without explicit mapping. Question 20. Which distance metric is most appropriate for categorical variables in a K‑Nearest Neighbors classifier? A) Euclidean distance B) Manhattan distance C) Hamming distance D) Cosine similarity Answer: C

Practice Exam

C) Grid Search uses Bayesian inference to select parameters D) Grid Search only works for linear models Answer: B Explanation: Grid Search systematically explores every combination of specified hyperparameter values. Question 24. Stratified K‑Fold cross‑validation is most beneficial when: A) The dataset is very small B) The target variable is binary or multiclass and classes are imbalanced C) Time‑series order must be preserved D) Model training is extremely fast Answer: B Explanation: Stratification ensures each fold maintains the same class distribution as the full dataset, which is crucial for imbalanced classification problems. Question 25. L1 regularization (Lasso) tends to produce models that are: A) Dense with many small coefficients B) Sparse, setting some coefficients exactly to zero C) Unbiased regardless of feature count D) Identical to Ridge regression Answer: B Explanation: L1 penalty encourages sparsity, often eliminating less important features by driving their coefficients to zero.

Practice Exam

Question 26. Bagging and Boosting differ in that bagging primarily aims to: A) Reduce bias by sequentially focusing on errors B) Reduce variance by averaging many high‑variance models C) Increase model complexity by stacking layers D) Perform feature selection automatically Answer: B Explanation: Bagging (e.g., Random Forest) builds independent models on bootstrap samples and averages them, reducing variance. Question 27. Model persistence in Python is typically achieved with which library? A) NumPy B) Pandas C) Pickle (or Joblib) D) Matplotlib Answer: C Explanation: Pickle and Joblib serialize Python objects, including trained scikit‑learn models, for later reuse. Question 28. Deploying a machine‑learning model as a RESTful service is most commonly done using which framework? A) TensorFlow.js B) Flask or FastAPI C) Hadoop MapReduce

Practice Exam

Question 31. SHAP values are used to: A) Reduce dimensionality of high‑dimensional data B) Explain individual predictions by attributing feature contributions C) Optimize hyperparameters via Bayesian methods D) Encode categorical variables Answer: B Explanation: SHAP (Shapley Additive Explanations) quantifies each feature’s contribution to a specific prediction. Question 32. LIME provides model explanations by: A) Training a global surrogate model B) Perturbing the input locally and fitting an interpretable model C) Computing feature importance from tree ensembles D) Using deep learning attention maps Answer: B Explanation: LIME creates a locally faithful, simple model (e.g., linear) around a specific instance to explain predictions. Question 33. Differential privacy primarily aims to: A) Encrypt model parameters during training B) Ensure that the inclusion or exclusion of a single record does not significantly affect output C) Remove all personally identifiable information from the dataset before training

Practice Exam

D) Share model predictions without any accuracy loss Answer: B Explanation: Differential privacy adds calibrated noise so that any single individual's data has minimal impact on the model’s results. Question 34. Which of the following is a common metric to evaluate fairness across demographic groups? A) R‑squared B) Equalized odds C) Silhouette score D) AIC Answer: B Explanation: Equalized odds requires that true positive and false positive rates be similar across groups, a standard fairness metric. Question 35. In a production pipeline, which component is responsible for detecting data drift? A) Model training script B) Data ingestion monitoring service C) Hyperparameter tuner D) Model serialization module Answer: B

Practice Exam

A) Always remove any data point beyond 2 standard deviations B) Apply a robust scaler that uses the interquartile range C) Convert outliers to missing values and then drop rows D) Use a logarithmic transformation on the target variable only Answer: B Explanation: Robust scaling (e.g., using IQR) reduces the influence of outliers without discarding data. Question 39. In cross‑validation, the term “leakage” refers to: A) Using the test set during model training B) Sharing the same random seed across folds C) Parallelizing the folds on multiple cores D) Storing models in a cloud bucket Answer: A Explanation: Data leakage occurs when information from the validation or test set inadvertently influences training, inflating performance estimates. Question 40. Which loss function is most appropriate for binary classification with imbalanced classes? A) Mean Squared Error B) Hinge loss C) Binary cross‑entropy with class weighting D) Absolute error

Practice Exam

Answer: C Explanation: Binary cross‑entropy with class weights penalizes mistakes on the minority class more heavily, improving learning on imbalanced data. Question 41. A model that predicts a continuous target but is evaluated using R‑squared of 0.0 indicates: A) Perfect predictions B) No linear relationship between predictions and actual values C) Overfitting D) High variance Answer: B Explanation: R‑squared of 0 means the model explains none of the variance; predictions are no better than using the mean. Question 42. Which of the following is a characteristic of a “cold start” problem in recommender systems? A) Too many users request recommendations simultaneously B) New users or items have no historical interaction data C) The model’s parameters become stale over time D) The recommendation algorithm runs out of memory Answer: B Explanation: Cold start refers to the difficulty of making predictions for entities lacking prior interaction data.

Practice Exam

Answer: B Explanation: Larger K (e.g., 10‑fold) provides more training data per fold and smoother estimates, reducing variance of the metric. Question 46. Which of the following best describes “early stopping” in gradient boosting? A) Halting training after a fixed number of trees regardless of performance B) Stopping when validation loss ceases to improve for a predefined number of iterations C) Pruning trees after they are fully grown D) Reducing learning rate to zero after the first epoch Answer: B Explanation: Early stopping monitors validation performance and terminates training when improvement stalls, preventing overfitting. Question 47. In a binary classification problem, a model with high precision but low recall is best suited for: A) Situations where false positives are costly B) Situations where false negatives are costly C) Balanced datasets where both errors matter equally D) Scenarios where overall accuracy is the only concern Answer: A Explanation: High precision means few false positives; low recall indicates many false negatives, suitable when false positives are more undesirable.

Practice Exam

Question 48. Which of the following is a common method to detect multicollinearity among predictors? A) Computing the confusion matrix B) Calculating the Variance Inflation Factor (VIF) C) Performing K‑Means clustering on the features D) Using the Silhouette score Answer: B Explanation: VIF quantifies how much the variance of a coefficient is inflated due to linear dependence with other predictors. Question 49. When converting a text column into numeric features using TF‑IDF, which of the following statements is true? A) TF‑IDF values are always binary (0 or 1) B) TF‑IDF emphasizes terms that are frequent in a document but rare across the corpus C) TF‑IDF cannot be used with logistic regression D) TF‑IDF automatically removes stop words Answer: B Explanation: TF‑IDF balances term frequency within a document against inverse document frequency across the corpus, highlighting discriminative words. Question 50. In a production environment, which practice helps ensure reproducibility of model predictions? A) Randomly initializing model weights at each request B) Logging the exact version of code, libraries, and model artifacts used for inference