Machine Learning Professional Practice Exam, Exams of Technology

The Machine Learning Professional exam evaluates expertise in machine learning algorithms, models, and systems. It includes topics such as supervised and unsupervised learning, neural networks, deep learning, and the application of machine learning techniques to real-world problems.

Typology: Exams

2025/2026

Available from 01/16/2026

shilpi-jain-1
shilpi-jain-1 🇮🇳

4.2

(5)

29K documents

1 / 104

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Machine Learning Professional Practice Exam
**Question 1.** Which loss function is most appropriate for a binary classification problem with
imbalanced classes?
A) Mean Squared Error
B) Hinge Loss
C) Weighted CrossEntropy
D) LogCosh
Answer: C
Explanation: Weighted crossentropy adds classspecific weights, penalizing errors on the minority class
more heavily, which helps with imbalance.
**Question 2.** In linear algebra, what does the term “rank” of a matrix refer to?
A) Number of rows
B) Number of columns
C) Number of linearly independent rows (or columns)
D) Sum of diagonal elements
Answer: C
Explanation: The rank is the dimension of the vector space spanned by its rows or columns, i.e., the
count of linearly independent rows/columns.
**Question 3.** Which of the following statements best describes the biasvariance tradeoff?
A) Increasing model complexity always reduces both bias and variance.
B) High bias models underfit, while high variance models overfit.
C) Bias and variance are unrelated to model performance.
D) Regularization increases bias and reduces variance simultaneously.
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43
pf44
pf45
pf46
pf47
pf48
pf49
pf4a
pf4b
pf4c
pf4d
pf4e
pf4f
pf50
pf51
pf52
pf53
pf54
pf55
pf56
pf57
pf58
pf59
pf5a
pf5b
pf5c
pf5d
pf5e
pf5f
pf60
pf61
pf62
pf63
pf64

Partial preview of the text

Download Machine Learning Professional Practice Exam and more Exams Technology in PDF only on Docsity!

Question 1. Which loss function is most appropriate for a binary classification problem with imbalanced classes? A) Mean Squared Error B) Hinge Loss C) Weighted Cross‑Entropy D) Log‑Cosh Answer: C Explanation: Weighted cross‑entropy adds class‑specific weights, penalizing errors on the minority class more heavily, which helps with imbalance. Question 2. In linear algebra, what does the term “rank” of a matrix refer to? A) Number of rows B) Number of columns C) Number of linearly independent rows (or columns) D) Sum of diagonal elements Answer: C Explanation: The rank is the dimension of the vector space spanned by its rows or columns, i.e., the count of linearly independent rows/columns. Question 3. Which of the following statements best describes the bias‑variance trade‑off? A) Increasing model complexity always reduces both bias and variance. B) High bias models underfit, while high variance models overfit. C) Bias and variance are unrelated to model performance. D) Regularization increases bias and reduces variance simultaneously.

Answer: B Explanation: High bias leads to underfitting (systematic errors), whereas high variance leads to overfitting (sensitivity to noise). Balancing them is key. Question 4. Which metric is most suitable for evaluating a model on a highly imbalanced binary classification where the positive class is rare? A) Accuracy B) Precision C) Recall D) F1‑Score Answer: D Explanation: F1‑Score balances precision and recall, providing a single measure that reflects performance on the minority class better than accuracy. Question 5. In gradient descent, what does the learning rate control? A) Number of features used B) Step size taken in the direction of the gradient C) Number of epochs D) Regularization strength Answer: B Explanation: The learning rate determines how far the parameters move along the gradient direction each iteration.

D) PCA

Answer: B Explanation: In a normal distribution, observations beyond three standard deviations from the mean are considered outliers. Question 9. In a confusion matrix for binary classification, which cell corresponds to false negatives? A) Top‑left B) Top‑right C) Bottom‑left D) Bottom‑right Answer: C Explanation: The bottom‑left cell counts actual positives predicted as negative (false negatives). Question 10. Which dimensionality‑reduction method preserves class separability by maximizing between‑class variance while minimizing within‑class variance? A) PCA B) LDA C) t‑SNE D) UMAP Answer: B Explanation: Linear Discriminant Analysis (LDA) explicitly seeks directions that best separate classes.

Question 11. Which of the following is a non‑parametric clustering algorithm that can discover arbitrarily shaped clusters? A) K‑Means B) Gaussian Mixture Models C) DBSCAN D) Agglomerative Hierarchical Clustering Answer: C Explanation: DBSCAN groups points based on density, allowing detection of clusters with irregular shapes and noise points. Question 12. When using time‑series cross‑validation, which technique respects temporal ordering? A) Random K‑Fold B) Stratified K‑Fold C Leave‑One‑Out D) Walk‑Forward Validation Answer: D Explanation: Walk‑forward validation trains on past data and tests on future data, preserving chronology. Question 13. In a Random Forest, why does increasing the number of trees generally improve performance up to a point? A) It reduces bias by adding depth. B) It reduces variance through bagging and averaging. C) It increases overfitting.

Question 16. Which regularization technique adds a penalty proportional to the absolute value of coefficients? A) L1 regularization (Lasso) B) L2 regularization (Ridge) C) Elastic Net D) Dropout Answer: A Explanation: L1 regularization adds |w| to the loss, encouraging sparsity by driving some coefficients to zero. Question 17. In the context of model deployment, what does “canary release” refer to? A) Deploying to all users simultaneously B) Deploying a new model to a small subset of traffic for monitoring before full rollout C) Using a model only for offline batch jobs D) Rolling back to previous version automatically Answer: B Explanation: Canary releases expose a limited audience to the new model, allowing detection of issues before full deployment. Question 18. Which of the following is a model‑agnostic method for explaining individual predictions? A) Feature importance from tree depth B Coefficients of linear regression C) SHAP values

D Weights of a neural network Answer: C Explanation: SHAP (Shapley Additive Explanations) provides local, model‑agnostic contribution scores for each feature. Question 19. What does the term “concept drift” describe in production ML systems? A) Change in hardware resources B) Shift in the statistical relationship between input features and target variable over time C) Increase in model size D Loss of training data Answer: B Explanation: Concept drift occurs when the underlying data generating process changes, causing model performance degradation. Question 20. Which serialization format enables interoperability across programming languages and platforms for ML models? A) Pickle B) joblib C) ONNX D) CSV Answer: C Explanation: ONNX (Open Neural Network Exchange) provides a standardized representation usable in many frameworks and languages.

C) Influence of a single training example (kernel width) D) Learning rate Answer: C Explanation: γ determines the spread of the RBF kernel; high γ leads to narrow influence, low γ yields broader influence. Question 24. Which of the following is a key difference between online (real‑time) and batch inference? A) Online inference processes one request at a time with low latency; batch processes many records together with higher latency. B) Batch inference requires GPU; online does not. C) Online inference cannot be containerized. D) Batch inference always yields higher accuracy. Answer: A Explanation: Online inference serves individual requests quickly, while batch inference aggregates many inputs, often for offline analysis. Question 25. When performing feature scaling, which method preserves the original distribution shape of the data? A) Min‑Max scaling B) Standardization (z‑score) C) Log transformation D) Rank scaling Answer: B

Explanation: Standardization subtracts the mean and divides by the standard deviation, preserving the distribution’s shape (only shifting and scaling). Question 26. Which statement best describes transfer learning? A) Training a model from scratch on a new dataset. B) Fine‑tuning a pre‑trained model on a related task to leverage learned representations. C) Using ensemble methods to combine unrelated models. D Converting a model into a different programming language. Answer: B Explanation: Transfer learning re‑uses knowledge from a model trained on a large dataset, adapting it to a new but related problem. Question 27. In a data lake architecture, raw data is stored in its ___ form. A) Structured B) Normalized C) Original (raw) D) Aggregated Answer: C Explanation: Data lakes retain data in its native format, allowing flexible downstream processing. Question 28. Which of the following is a common technique for handling high‑cardinality categorical variables? A) One‑hot encoding B) Label encoding

Question 31. In reinforcement learning, what does the term “policy” refer to? A) The reward function B) The environment dynamics C) The mapping from states to actions D) The discount factor Answer: C Explanation: A policy defines the agent’s behavior by specifying which action to take in each state. Question 32. Which statistical test is appropriate for comparing the means of two independent samples with unequal variances? A) Paired t‑test B) Student’s t‑test (equal variances) C) Welch’s t‑test D) Chi‑square test Answer: C Explanation: Welch’s t‑test adjusts for unequal variances and sample sizes between two groups. Question 33. What is the purpose of using a “validation set” separate from the training set? A) To increase model size B) To tune hyperparameters and assess generalization without contaminating test results C) To store model checkpoints only D) To compute loss during training

Answer: B Explanation: A validation set provides unbiased feedback for model selection and hyperparameter tuning while keeping the test set untouched for final evaluation. Question 34. Which of the following is a common method to mitigate overfitting in deep neural networks? A) Increase number of layers indefinitely B) Use dropout layers during training C) Remove activation functions D) Use only linear models Answer: B Explanation: Dropout randomly deactivates neurons during training, preventing co‑adaptation and reducing overfitting. Question 35. In the context of MLOps, what does “artifact store” typically hold? A) Raw data files B) Model binaries, logs, and metadata generated during experiments C) User credentials D) Hyperparameter search spaces Answer: B Explanation: An artifact store preserves versioned outputs like trained models, evaluation reports, and other experiment artifacts. Question 36. Which clustering evaluation metric does NOT require ground‑truth labels?

Answer: A Explanation: In high‑dimensional spaces, points tend to be equidistant, making many algorithms (e.g., KNN) less effective. Question 39. In Bayesian inference, what does the posterior distribution represent? A) Prior belief before seeing data B) Likelihood of data given parameters C) Updated belief about parameters after observing data D) Marginal probability of data Answer: C Explanation: Posterior combines prior knowledge with observed data via Bayes’ theorem to produce the updated distribution over parameters. Question 40. Which of the following is a privacy‑preserving technique that adds calibrated noise to query results? A) Data encryption B) Differential privacy C) Tokenization D) Anonymization Answer: B Explanation: Differential privacy provides formal guarantees by injecting random noise proportional to query sensitivity. Question 41. In a convolutional neural network, what is the purpose of the pooling layer?

A) Increase the number of parameters B) Reduce spatial dimensions and achieve translation invariance C) Perform non‑linear activation D) Normalize feature maps Answer: B Explanation: Pooling (e.g., max‑pool) downsamples feature maps, decreasing computational load and providing invariance to small translations. Question 42. Which of the following statements about the ROC curve is true? A) It plots precision vs. recall. B) It is insensitive to class imbalance. C) It requires probability thresholds to be set. D) It measures calibration of probabilities. Answer: B Explanation: The ROC curve evaluates true positive rate vs. false positive rate, which is not directly affected by class distribution. Question 43. What does the term “feature store” refer to in an MLOps pipeline? A) A database for raw data ingestion B) A centralized repository for engineered features, enabling reuse across models C) A place to store model weights D) A logging system for model predictions Answer: B

C) Normalizing data D) Using a validation set Answer: B Explanation: Data leakage occurs when information unavailable at prediction time is inadvertently used during training, inflating performance. Question 47. Which of the following is a primary advantage of using LightGBM over traditional gradient boosting? A) Only works with categorical data B) Uses leaf‑wise tree growth, leading to faster training and lower memory usage C) Requires no hyperparameter tuning D) Guarantees global optimum Answer: B Explanation: LightGBM grows trees leaf‑wise (best‑first), which can be more efficient and achieve higher accuracy with less memory. Question 48. In the context of model monitoring, what does “concept drift detection” typically involve? A) Tracking CPU usage B) Comparing distribution of predictions or features over time using statistical tests (e.g., KS test) C) Measuring network latency D) Logging model version numbers Answer: B

Explanation: Detecting drift often uses statistical tests to compare current data or prediction distributions against a baseline. Question 49. Which of the following best describes the purpose of “shadow mode” deployment? A) Deploying a model to production without serving any predictions, only logging its outputs alongside the current model for comparison. B) Replacing the production model instantly. C) Running the model on a separate server for backup. D Using the model only for batch jobs. Answer: A Explanation: Shadow mode runs the new model in parallel, logging its predictions without affecting users, enabling safe evaluation. Question 50. Which kernel function maps input data into an infinite‑dimensional space, enabling linear separation of many non‑linear problems? A) Linear kernel B) Polynomial kernel C) RBF (Gaussian) kernel D) Sigmoid kernel Answer: C Explanation: The RBF kernel corresponds to an infinite‑dimensional feature space, allowing complex decision boundaries. Question 51. In a time‑series forecasting pipeline, which transformation is commonly applied to achieve stationarity?