Workbooks Machine learning Practice Exam, Exams of Technology

An exam focused on ML fundamentals including supervised/unsupervised learning, data preprocessing, model evaluation, feature engineering, classification, regression, clustering, neural networks, and overfitting/underfitting concepts. Includes problem-solving tasks using datasets, algorithm selection questions, and application-based scenarios across industries.

Typology: Exams

2025/2026

Available from 01/07/2026

shilpi-jain-1
shilpi-jain-1 🇮🇳

4.2

(5)

29K documents

1 / 90

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Workbooks Machine learning Practice Exam
**Question 1.** Which of the following best describes the relationship between Machine Learning (ML)
and Artificial Intelligence (AI)?
A) ML is a subset of AI that focuses on algorithms that improve from data.
B) AI is a subset of ML that deals only with neural networks.
C) ML and AI are completely unrelated fields.
D) AI is a subset of ML that only uses supervised learning.
Answer: A
Explanation: Machine Learning is a branch of Artificial Intelligence that develops algorithms capable of
learning patterns from data, making ML a subset of AI.
**Question 2.** In supervised learning, what term refers to the output variable that the model tries to
predict?
A) Feature
B) Sample
C) Label
D) Instance
Answer: C
Explanation: The label (or target) is the known output used during training to teach the model how to
make predictions.
**Question 3.** Which of the following is NOT a typical stage in the machinelearning workflow?
A) Problem definition
B) Data acquisition
C) Feature elimination
D) Model deployment
Answer: C
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43
pf44
pf45
pf46
pf47
pf48
pf49
pf4a
pf4b
pf4c
pf4d
pf4e
pf4f
pf50
pf51
pf52
pf53
pf54
pf55
pf56
pf57
pf58
pf59
pf5a

Partial preview of the text

Download Workbooks Machine learning Practice Exam and more Exams Technology in PDF only on Docsity!

Question 1. Which of the following best describes the relationship between Machine Learning (ML) and Artificial Intelligence (AI)? A) ML is a subset of AI that focuses on algorithms that improve from data. B) AI is a subset of ML that deals only with neural networks. C) ML and AI are completely unrelated fields. D) AI is a subset of ML that only uses supervised learning. Answer: A Explanation: Machine Learning is a branch of Artificial Intelligence that develops algorithms capable of learning patterns from data, making ML a subset of AI. Question 2. In supervised learning, what term refers to the output variable that the model tries to predict? A) Feature B) Sample C) Label D) Instance Answer: C Explanation: The label (or target) is the known output used during training to teach the model how to make predictions. Question 3. Which of the following is NOT a typical stage in the machine‑learning workflow? A) Problem definition B) Data acquisition C) Feature elimination D) Model deployment Answer: C

Explanation: “Feature elimination” is not a standard workflow stage; feature selection or engineering are, but elimination alone is not a distinct phase. Question 4. A model that consistently underfits the training data is likely suffering from: A) High variance B) High bias C) Data leakage D) Over‑parameterization Answer: B Explanation: High bias leads to underfitting, where the model is too simple to capture underlying patterns. Question 5. Which matrix operation computes the dot product of two vectors a and b? A) a + b B) a · b = ab C) a × b (cross product) D) a / b Answer: B Explanation: The dot product is calculated as the transpose of a multiplied by b, yielding a scalar. Question 6. The probability density function of a standard normal distribution has a mean of: A) 0 B) 1 C) - 1 D) 0.

C) Mode imputation D) Constant imputation Answer: C Explanation: Mode imputation uses the most common (modal) value, appropriate for categorical or discrete numeric features. Question 10. The Inter‑Quartile Range (IQR) method identifies outliers by comparing a data point to: A) Mean ± 3 × standard deviation B) 1.5 × IQR above the third quartile or below the first quartile C) Median absolute deviation D) Z‑score greater than 2 Answer: B Explanation: The IQR rule flags points lying beyond Q1 − 1.5·IQR or Q3 + 1.5·IQR as outliers. Question 11. Min‑Max scaling transforms a feature to which range? A) [‑1, 1] B) [0, 1] C) [‑∞, ∞] D) [−0.5, 0.5] Answer: B Explanation: Min‑Max scaling linearly maps values to the interval [0, 1] using (x − min)/(max − min). Question 12. One‑hot encoding a categorical variable with three distinct categories results in: A) A single integer column. B) Three binary columns, each representing one category.

C) Two columns with values 0, 1, 2. D) A continuous probability distribution. Answer: B Explanation: One‑hot encoding creates a binary column for each possible category, ensuring mutual exclusivity. Question 13. In text processing, TF‑IDF stands for: A) Term Frequency‑Inverse Document Frequency B) Total Frequency‑Inverse Data Factor C) Token Frequency‑Independent Data Feature D) Term Frequency‑Indexed Data Form Answer: A Explanation: TF‑IDF weighs terms by how often they appear in a document (TF) and how rare they are across the corpus (IDF). Question 14. Principal Component Analysis (PCA) primarily aims to: A) Increase the number of features. B) Reduce dimensionality while preserving variance. C) Perform supervised classification. D) Encode categorical variables. Answer: B Explanation: PCA transforms data onto orthogonal components that capture maximal variance, reducing dimensionality. Question 15. Which feature‑selection method evaluates each feature independently using a statistical test?

Question 18. In linear regression, the cost function commonly minimized is: A) Mean Absolute Error (MAE) B) Mean Squared Error (MSE) C) Hinge loss D) Cross‑entropy loss Answer: B Explanation: Ordinary Least Squares minimizes the MSE, which penalizes larger errors more heavily. Question 19. The Normal Equation provides a closed‑form solution for linear regression parameters when: A) The number of features exceeds the number of samples. B) The design matrix is non‑invertible. C) The loss function is MSE and no regularization is applied. D) Gradient descent is used. Answer: C Explanation: The Normal Equation θ = (XᵀX)⁻¹Xᵀy solves for θ analytically under MSE loss without regularization. Question 20. L2 regularization (Ridge) adds which term to the loss function? A) λ ∑|θᵢ| B) λ ∑θᵢ² C) λ ∑θᵢ³ D) λ ∑log|θᵢ| Answer: B Explanation: Ridge regression penalizes the squared magnitude of coefficients, encouraging smaller but non‑zero weights.

Question 21. Which regression model can capture non‑linear relationships by augmenting features with polynomial terms? A) Linear regression B) Ridge regression C) Polynomial regression D) Lasso regression Answer: C Explanation: Polynomial regression explicitly includes higher‑order terms, allowing the model to fit curves. Question 22. In logistic regression, the sigmoid function maps any real‑valued input to: A) (‑∞, ∞) B) (‑1, 1) C) (0, 1) D) (‑0.5, 0.5) Answer: C Explanation: The sigmoid σ(z)=1/(1+e⁻ᶻ) outputs values strictly between 0 and 1, interpretable as probabilities. Question 23. The loss function used for binary classification in logistic regression is: A) Mean Squared Error B) Hinge loss C) Cross‑entropy (log loss) D) Absolute deviation Answer: C

Answer: B Explanation: The RBF kernel computes similarity in an infinite‑dimensional space, allowing SVM to find non‑linear separating hyperplanes. Question 27. In decision tree learning, Gini impurity is defined as: A) 1 − ∑pᵢ² where pᵢ is the proportion of class i. B) −∑pᵢ log₂ pᵢ. C) ∑|pᵢ − pⱼ|. D) The variance of the target variable. Answer: A Explanation: Gini impurity measures the probability of misclassifying a randomly chosen element if it were labeled according to the class distribution. Question 28. Pruning a decision tree helps to: A) Increase model complexity. B) Reduce overfitting by removing noisy branches. C) Convert the tree into a linear model. D) Increase the depth of the tree. Answer: B Explanation: Pruning eliminates branches that provide little predictive power, improving generalization. Question 29. Naïve Bayes assumes which of the following about features? A) They are linearly correlated. B) They are conditionally independent given the class label. C) They follow a uniform distribution. D) They have equal variance.

Answer: B Explanation: The core assumption of Naïve Bayes is conditional independence among features for each class. Question 30. Which variant of Naïve Bayes is appropriate for modeling word counts in text classification? A) Gaussian Naïve Bayes B) Bernoulli Naïve Bayes C) Multinomial Naïve Bayes D) Categorical Naïve Bayes Answer: C Explanation: Multinomial Naïve Bayes models the distribution of discrete counts, ideal for bag‑of‑words representations. Question 31. Random Forest reduces variance primarily by: A) Using a single deep decision tree. B) Averaging predictions from many decorrelated trees built on bootstrapped samples. C) Applying L1 regularization to each tree. D) Pruning each tree aggressively. Answer: B Explanation: Bagging (bootstrap aggregation) and random feature selection decorrelate trees; averaging their predictions reduces variance. Question 32. Gradient Boosting Machines (GBM) differ from AdaBoost in that GBM: A) Uses only decision stumps. B) Optimizes a differentiable loss function via gradient descent.

B) The within‑cluster sum of squares (WCSS) to find a point where adding more clusters yields diminishing returns. C) The distance between cluster centroids. D) The number of iterations required for convergence. Answer: B Explanation: Plotting WCSS versus K shows a sharp decrease (the “elbow”) where additional clusters no longer significantly reduce variance. Question 36. Which linkage criterion in hierarchical agglomerative clustering merges clusters based on the maximum distance between any two points (one from each cluster)? A) Single linkage B) Complete linkage C) Average linkage D) Ward’s method Answer: B Explanation: Complete linkage uses the farthest pairwise distance, leading to compact clusters. Question 37. DBSCAN can identify clusters of arbitrary shape because it: A) Requires a predefined number of clusters K. B) Uses density reachability and connectivity rather than centroids. C) Relies on hierarchical tree structures. D) Optimizes a silhouette score. Answer: B Explanation: DBSCAN groups points that are densely packed and marks low‑density points as noise, allowing irregular cluster shapes.

Question 38. The silhouette coefficient for a data point ranges between: A) 0 and 1 B) – 1 and 1 C) – ∞ and ∞ D) 0 and ∞ Answer: B Explanation: Silhouette values near 1 indicate well‑matched points, near 0 ambiguous, and negative values suggest misclassification. Question 39. In the Apriori algorithm, the “support” of an itemset is defined as: A) The probability that the rule is correct. B) The proportion of transactions containing the itemset. C) The confidence of the rule. D) The lift of the rule. Answer: B Explanation: Support measures how frequently an itemset appears in the dataset, used to prune infrequent candidates. Question 40. Lift greater than 1 in association rule mining indicates: A) The rule is less interesting than random chance. B) The antecedent and consequent occur together more often than expected if independent. C) The rule has low confidence. D) The support is zero. Answer: B Explanation: Lift = confidence / expected confidence; >1 means the items co‑occur more than by chance.

Explanation: The original perceptron uses a binary step function to output 0 or 1 based on the weighted sum. Question 44. The universal approximation theorem states that a feed‑forward neural network with at least one hidden layer can: A) Exactly represent any function with finite data. B) Approximate any continuous function on a compact subset of ℝⁿ, given sufficient neurons. C) Solve any NP‑hard problem. D) Converge in a single epoch. Answer: B Explanation: With enough hidden units and appropriate activation, a neural network can approximate any continuous mapping. Question 45. Which activation function suffers from the “dying ReLU” problem? A) Sigmoid B) Tanh C) ReLU D) Softmax Answer: C Explanation: ReLU outputs zero for negative inputs; if many neurons receive negative inputs, they may never activate again (die). Question 46. In backpropagation, the chain rule is used to compute: A) The forward pass activations. B) The gradient of the loss with respect to each weight. C) The learning rate schedule.

D) The model’s bias term only. Answer: B Explanation: The chain rule propagates error derivatives from the output layer back through each layer to update weights. Question 47. Mini‑batch gradient descent differs from stochastic gradient descent (SGD) by: A) Using the entire dataset for each update. B) Updating parameters after computing gradients on a small subset (batch) of samples, reducing variance compared to pure SGD. C) Not requiring a learning rate. D) Guaranteeing convergence in fewer epochs. Answer: B Explanation: Mini‑batch GD balances the noisy updates of SGD with the stability of batch GD by processing batches of size >1. Question 48. The Adam optimizer combines which two techniques? A) Momentum and Nesterov acceleration. B) AdaGrad and RMSprop (adaptive learning rates with momentum). C) Stochastic gradient descent and Newton’s method. D) Learning rate decay and dropout. Answer: B Explanation: Adam computes adaptive learning rates like RMSprop and incorporates momentum (first‑moment estimate), leading to fast convergence. Question 49. In a convolutional layer, the term “filter” (or kernel) refers to: A) The activation function applied after convolution.

B. Adding more hidden units. C. Employing gating mechanisms such as LSTM or GRU cells. D. Using sigmoid activation only. Answer: C Explanation: LSTM and GRU introduce gates that preserve gradients over long time spans, alleviating vanishing gradients. Question 53. An autoencoder’s reconstruction loss is typically measured with: A) Hinge loss B) Cross‑entropy (for binary data) or MSE (for continuous data) C. Kullback‑Leibler divergence D. Cosine similarity Answer: B Explanation: The loss quantifies the difference between the input and its reconstruction; MSE or binary cross‑entropy are common choices. Question 54. In a Generative Adversarial Network (GAN), the discriminator’s objective is to: A) Generate realistic samples. B) Minimize the distance between real and fake data distributions. C. Accurately classify inputs as real or generated. D. Encode data into a latent space. Answer: C Explanation: The discriminator learns to distinguish between real data and the generator’s synthetic samples. Question 55. The attention mechanism in Transformer models primarily allows the network to:

A. Convolve over fixed‑size windows. B. Focus on relevant parts of the input sequence when generating each output token. C. Reduce the number of parameters via weight sharing. D. Perform pooling over time steps. Answer: B Explanation: Self‑attention computes weighted sums of all positions, enabling each token to attend to others dynamically. Question 56. In regression evaluation, a higher R² indicates: A. Worse fit to the data. B. More variance explained by the model relative to a baseline. C. Higher bias. D. Larger prediction errors. Answer: B Explanation: R² = 1 − (SS_res / SS_tot) measures proportion of variance captured; closer to 1 means better explanatory power. Question 57. Which metric is most appropriate for imbalanced binary classification when false negatives are particularly costly? A. Accuracy B. Precision C. Recall D. F1‑Score Answer: C Explanation: Recall (sensitivity) emphasizes correctly identifying positive cases, reducing false negatives.