GSDC Certified Machine Learning Exam, Exams of Technology

The Machine Learning Exam validates understanding of supervised and unsupervised learning, model training, evaluation, optimization, and deployment. Candidates demonstrate skills in applying machine learning techniques to solve predictive and analytical problems across industries.

Typology: Exams

2025/2026

Available from 01/23/2026

shilpi-jain-2
shilpi-jain-2 🇮🇳

1

(1)

25K documents

1 / 90

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
GSDC Certified Machine Learning Exam
**Question 1. Which learning paradigm explicitly uses a reward signal to guide the agent’s
actions?**
A) Supervised learning
B) Unsupervised learning
C) Reinforcement learning
D) Semisupervised learning
Answer: C
Explanation: Reinforcement learning agents receive scalar rewards from the environment, which
they aim to maximize over time.
**Question 2. In which scenario is semisupervised learning most beneficial?**
A) When a large labeled dataset is available
B) When only a few labeled examples exist but many unlabeled ones are present
C) When data are completely unlabeled
D) When the problem is purely reinforcementbased
Answer: B
Explanation: Semisupervised learning leverages a small labeled set together with abundant
unlabeled data to improve model performance.
**Question 3. Which of the following best describes batch learning?**
A) Model updates after each individual sample
B) Model updates after a fixed number of samples
C) Model updates after the entire dataset is processed
D) Model updates using streaming data only
Answer: C
Explanation: Batch learning trains the model on the whole dataset before any parameter
updates are made.
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43
pf44
pf45
pf46
pf47
pf48
pf49
pf4a
pf4b
pf4c
pf4d
pf4e
pf4f
pf50
pf51
pf52
pf53
pf54
pf55
pf56
pf57
pf58
pf59
pf5a

Partial preview of the text

Download GSDC Certified Machine Learning Exam and more Exams Technology in PDF only on Docsity!

Question 1. Which learning paradigm explicitly uses a reward signal to guide the agent’s actions? A) Supervised learning B) Unsupervised learning C) Reinforcement learning D) Semi‑supervised learning Answer: C Explanation: Reinforcement learning agents receive scalar rewards from the environment, which they aim to maximize over time. Question 2. In which scenario is semi‑supervised learning most beneficial? A) When a large labeled dataset is available B) When only a few labeled examples exist but many unlabeled ones are present C) When data are completely unlabeled D) When the problem is purely reinforcement‑based Answer: B Explanation: Semi‑supervised learning leverages a small labeled set together with abundant unlabeled data to improve model performance. Question 3. Which of the following best describes batch learning? A) Model updates after each individual sample B) Model updates after a fixed number of samples C) Model updates after the entire dataset is processed D) Model updates using streaming data only Answer: C Explanation: Batch learning trains the model on the whole dataset before any parameter updates are made.

Question 4. Which matrix operation is used to compute the covariance matrix of a dataset X (where rows are samples, columns are features)? A) X · Xᵀ B) Xᵀ · X C) (X − μ)ᵀ · (X − μ) / (N‑1) D) (X + μ) · (X + μ)ᵀ Answer: C Explanation: Centering X by subtracting the mean μ and then computing (X‑μ)ᵀ·(X‑μ)/(N‑1) yields the sample covariance matrix. Question 5. In gradient descent, which of the following guarantees convergence to a global minimum for convex loss functions? A) Using a fixed large learning rate B) Using a learning rate that decays over iterations C) Using momentum D) Using stochastic updates only Answer: B Explanation: A decaying learning rate ensures steps become smaller, allowing convergence to the global optimum in convex problems. Question 6. The derivative of the sigmoid function σ(z)=1/(1+e⁻ᶻ) with respect to z is: A) σ(z)·(1‑σ(z)) B) σ(z)² C) 1‑σ(z) D) e⁻ᶻ/(1+e⁻ᶻ)² Answer: A

Answer: B Explanation: The theorem states that when averaged over all possible data-generating distributions, every algorithm has the same expected error. Question 10. The bias‑variance trade‑off describes the relationship between: A) Model bias and data variance only B) Training error and test error C) Model complexity, bias, and variance D) Regularization strength and learning rate Answer: C Explanation: Increasing model complexity reduces bias but increases variance, and vice versa; optimal performance balances the two. Question 11. Which KPI would be most appropriate for a binary classification model used to detect fraudulent transactions? A) Mean Absolute Error B) Accuracy C) Area Under the ROC Curve (AUC) D) R² score Answer: C Explanation: AUC captures the trade‑off between true positive and false positive rates, which is critical in fraud detection where class imbalance exists. Question 12. When handling missing values, which method is generally preferred for numeric features with a skewed distribution? A) Mean imputation B) Median imputation

C) Mode imputation D) Random deletion Answer: B Explanation: Median is robust to skewed data and reduces bias introduced by extreme values. Question 13. Which technique is most suitable for detecting outliers in a high‑dimensional dataset? A) Z‑score on each feature individually B) Interquartile Range (IQR) per feature C) Isolation Forest D) Simple thresholding of max values Answer: C Explanation: Isolation Forest isolates anomalies by randomly partitioning data, performing well even in high dimensions. Question 14. Min‑Max scaling transforms a feature x to the range [0,1] using which formula? A) (x‑μ)/σ B) (x‑min)/(max‑min) C) (x‑median)/(Q3‑Q1) D) log(x) Answer: B Explanation: Min‑Max scaling subtracts the feature’s minimum and divides by the range (max‑min). Question 15. Which encoding method can lead to high dimensionality when applied to a categorical variable with many levels?

Question 18. Principal Component Analysis (PCA) seeks to: A) Maximize class separation B) Preserve pairwise distances C) Capture maximum variance in orthogonal directions D) Reduce categorical variables to a single numeric variable Answer: C Explanation: PCA finds orthogonal axes (principal components) that explain the most variance in the data. Question 19. In t‑SNE, the primary objective is to: A) Preserve global distances B) Preserve local neighborhoods in a low‑dimensional embedding C) Maximize variance explained D) Produce linear combinations of features Answer: B Explanation: t‑SNE emphasizes preserving local structure, making it useful for visualizing clusters. Question 20. Linear Discriminant Analysis (LDA) differs from PCA because LDA: A) Maximizes between‑class variance while minimizing within‑class variance B) Does not require class labels C) Is a non‑linear dimensionality reduction technique D) Uses kernel functions by default Answer: A Explanation: LDA is supervised; it seeks directions that best separate classes.

Question 21. Which feature‑selection method evaluates subsets of features by training a model and measuring performance? A) Filter method B. Wrapper method C) Embedded method D) Mutual information Answer: B Explanation: Wrapper methods use a predictive model to assess the quality of feature subsets. Question 22. The Lasso (L1) regularization primarily promotes: A) Small but non‑zero coefficients for all features B) Shrinkage of coefficients toward zero without eliminating any C) Sparse solutions by driving some coefficients exactly to zero D) Increased model complexity Answer: C Explanation: L1 penalty can set coefficients to exactly zero, performing feature selection. Question 23. Ridge (L2) regularization differs from Lasso in that it: A) Produces sparse models B) Penalizes the absolute value of coefficients C) Shrinks coefficients uniformly but rarely to zero D) Is only applicable to logistic regression Answer: C Explanation: L2 penalty reduces magnitude of all coefficients but does not force exact zeros. Question 24. Elastic Net regularization combines which two penalties?

A) It eliminates the need for regularization B) It reduces computational complexity to O(1) C) It allows inner products in high‑dimensional space to be computed directly from original features D) It guarantees a global optimum for any loss function Answer: C Explanation: Kernels compute dot products in transformed space without explicit transformation, saving computation. Question 28. In K‑Nearest Neighbors, the choice of distance metric most directly affects: A) Model interpretability B) Training time C) Which neighbors are considered “nearest” and thus predictions D) Number of parameters to tune Answer: C Explanation: Distance metric defines similarity; changing it alters neighbor selection and predictions. Question 29. Naïve Bayes classifiers assume: A) Features are independent given the class label B) Features follow a multivariate Gaussian distribution C) Decision boundaries are linear D) No prior probabilities are needed Answer: A Explanation: The conditional independence assumption simplifies the joint likelihood computation.

Question 30. Which decision‑tree algorithm uses the Gini impurity measure for splitting? A) ID B) C4. C) CART D) CHAID Answer: C Explanation: CART (Classification and Regression Trees) employs Gini impurity (or MSE for regression) to select splits. Question 31. In the ID3 algorithm, the attribute with the highest: A) Information gain is selected for splitting B) Gini impurity is selected for splitting C) Variance reduction is selected for splitting D) Correlation with target is selected for splitting Answer: A Explanation: ID3 uses information gain (based on entropy) to choose the most informative attribute. Question 32. Random Forest reduces variance primarily by: A) Using deeper trees than a single decision tree B) Averaging predictions of many decorrelated trees built on bootstrapped samples C) Pruning each tree aggressively D) Applying L1 regularization to each tree Answer: B Explanation: Bagging (bootstrap aggregating) creates diverse trees; averaging their predictions reduces variance.

Question 36. Which clustering algorithm requires the number of clusters K to be specified in advance? A) DBSCAN B) Hierarchical agglomerative clustering C) K‑Means D) Mean‑Shift Answer: C Explanation: K‑Means partitions data into K clusters, where K must be predetermined. Question 37. DBSCAN defines a cluster based on: A) Fixed number of nearest neighbors B) Density connectivity using ε‑neighborhood and minimum points C) Hierarchical merging of clusters D) Centroid proximity Answer: B Explanation: DBSCAN groups points that are density‑reachable within radius ε and have at least MinPts neighbors. Question 38. In hierarchical agglomerative clustering, the “linkage” criterion determines: A) How many clusters to start with B) The distance metric between individual points only C) How to compute distance between clusters (e.g., single, complete, average) D) The stopping condition based on silhouette score Answer: C Explanation: Linkage defines how inter‑cluster distances are measured during the merging process.

Question 39. The silhouette coefficient for a data point measures: A) The probability that the point belongs to its assigned cluster B) The ratio of intra‑cluster distance to inter‑cluster distance C) The difference between its average distance to its own cluster and to the nearest other cluster, normalized D) The number of nearest neighbors in the same cluster Answer: C Explanation: Silhouette = (b‑a)/max(a,b), where a is intra‑cluster distance and b is nearest‑cluster distance. Question 40. The Apriori algorithm prunes candidate itemsets based on which property? A) All subsets of a frequent itemset must also be frequent B) All supersets of an infrequent itemset must be infrequent C) Only maximal itemsets are retained D) Itemsets must have equal support Answer: A Explanation: Apriori uses the “downward closure” property: if an itemset is frequent, all its subsets are also frequent. Question 41. In market‑basket analysis, a lift value greater than 1 indicates: A) No association between antecedent and consequent B) Negative correlation between items C) Positive association; items occur together more often than expected under independence D) Perfect confidence Answer: C

Answer: A Explanation: The single‑layer perceptron converges only when the data are linearly separable. Question 45. In a multilayer perceptron (MLP), the activation function that mitigates the vanishing gradient problem most effectively is: A) Sigmoid B) Tanh C) ReLU D) Linear Answer: C Explanation: ReLU’s gradient is constant (1) for positive inputs, reducing vanishing gradients during backpropagation. Question 46. The softmax activation function is primarily used in the output layer of: A) Binary classification networks B) Regression networks C) Multi‑class classification networks D) Autoencoders Answer: C Explanation: Softmax converts raw scores into a probability distribution over multiple classes. Question 47. Xavier (Glorot) initialization is designed to keep the variance of activations: A) Increasing across layers B) Decreasing across layers C) Approximately constant across layers D) Equal to zero Answer: C

Explanation: Xavier initialization sets weight variance based on the number of input and output units to maintain stable activations. Question 48. In convolutional neural networks, the operation that reduces spatial dimensions while retaining the most salient features is called: A) Convolution B) Pooling (e.g., max‑pooling) C) Fully connected layer D) Dropout Answer: B Explanation: Pooling aggregates nearby activations, typically reducing resolution and providing translation invariance. Question 49. Which of the following statements about padding in CNNs is true? A) Padding always increases the number of parameters B) “Same” padding preserves the input spatial dimensions after convolution C) Padding is only used for 1‑D convolutions D) Padding reduces the receptive field size Answer: B Explanation: “Same” padding adds zeros so that output height/width equals input dimensions. Question 50. Recurrent Neural Networks (RNNs) suffer from which major training difficulty? A) Lack of ability to handle variable‑length sequences B) Vanishing and exploding gradients over long time steps C) Inability to share parameters across time steps D) Requirement for large batch sizes only

D) Training from scratch is computationally cheap Answer: C Explanation: Pre‑trained models provide useful feature representations that can be fine‑tuned on limited data. Question 54. Fine‑tuning a pre‑trained model differs from feature extraction because: A) Fine‑tuning updates all model weights, while feature extraction freezes the base and only trains a new head B) Feature extraction requires more data than fine‑tuning C) Fine‑tuning uses a different loss function than feature extraction D) Feature extraction modifies the architecture, fine‑tuning does not Answer: A Explanation: In fine‑tuning, the entire network (or a large portion) is retrained; in feature extraction, the pre‑trained backbone is frozen. Question 55. Which optimizer combines adaptive learning rates with momentum, often yielding faster convergence than plain SGD? A) RMSprop B) Adam C) Adagrad D) Nesterov Accelerated Gradient (NAG) Answer: B Explanation: Adam maintains per‑parameter adaptive learning rates and incorporates momentum estimates. Question 56. Early stopping helps prevent overfitting by: A) Reducing model depth automatically

B) Monitoring validation loss and halting training when it stops improving C) Adding L1 regularization dynamically D) Increasing batch size over epochs Answer: B Explanation: Training is stopped once validation performance ceases to improve, avoiding unnecessary fitting to noise. Question 57. Which loss function is appropriate for multi‑label classification where each instance can belong to multiple classes? A) Categorical cross‑entropy B) Binary cross‑entropy applied independently to each label C) Hinge loss D) Mean Squared Error Answer: B Explanation: Binary cross‑entropy evaluates each label as a separate binary problem. Question 58. In a confusion matrix, the term “precision” is defined as: A) TP / (TP + FN) B) TP / (TP + FP) C) TN / (TN + FP) D) (TP + TN) / Total Answer: B Explanation: Precision measures the proportion of positive predictions that are correct. Question 59. Which metric is most suitable for imbalanced binary classification when the cost of false negatives is much higher than false positives? A) Accuracy