Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Workbooks Machine learning Practice Exam, Exams of Technology

Technology

An exam focused on ML fundamentals including supervised/unsupervised learning, data preprocessing, model evaluation, feature engineering, classification, regression, clustering, neural networks, and overfitting/underfitting concepts. Includes problem-solving tasks using datasets, algorithm selection questions, and application-based scenarios across industries.

Typology: Exams

2025/2026

Available from 01/07/2026

shilpi-jain-1 🇮🇳

4.2

(5)

29K documents

1 / 90

This page cannot be seen from the preview

Don't miss anything!

Workbooks Machine learning Practice Exam

**Question 1.** Which of the following best describes the relationship between Machine Learning (ML)

and Artificial Intelligence (AI)?

A) ML is a subset of AI that focuses on algorithms that improve from data.

B) AI is a subset of ML that deals only with neural networks.

C) ML and AI are completely unrelated fields.

D) AI is a subset of ML that only uses supervised learning.

Answer: A

Explanation: Machine Learning is a branch of Artificial Intelligence that develops algorithms capable of

learning patterns from data, making ML a subset of AI.

**Question 2.** In supervised learning, what term refers to the output variable that the model tries to

predict?

A) Feature

B) Sample

C) Label

D) Instance

Answer: C

Explanation: The label (or target) is the known output used during training to teach the model how to

make predictions.

**Question 3.** Which of the following is NOT a typical stage in the machine‑learning workflow?

A) Problem definition

B) Data acquisition

C) Feature elimination

D) Model deployment

Answer: C

Partial preview of the text

Download Workbooks Machine learning Practice Exam and more Exams Technology in PDF only on Docsity!

Question 1. Which of the following best describes the relationship between Machine Learning (ML) and Artificial Intelligence (AI)? A) ML is a subset of AI that focuses on algorithms that improve from data. B) AI is a subset of ML that deals only with neural networks. C) ML and AI are completely unrelated fields. D) AI is a subset of ML that only uses supervised learning. Answer: A Explanation: Machine Learning is a branch of Artificial Intelligence that develops algorithms capable of learning patterns from data, making ML a subset of AI. Question 2. In supervised learning, what term refers to the output variable that the model tries to predict? A) Feature B) Sample C) Label D) Instance Answer: C Explanation: The label (or target) is the known output used during training to teach the model how to make predictions. Question 3. Which of the following is NOT a typical stage in the machine‑learning workflow? A) Problem definition B) Data acquisition C) Feature elimination D) Model deployment Answer: C

Explanation: “Feature elimination” is not a standard workflow stage; feature selection or engineering are, but elimination alone is not a distinct phase. Question 4. A model that consistently underfits the training data is likely suffering from: A) High variance B) High bias C) Data leakage D) Over‑parameterization Answer: B Explanation: High bias leads to underfitting, where the model is too simple to capture underlying patterns. Question 5. Which matrix operation computes the dot product of two vectors a and b? A) a + b B) a · b = aᵀb C) a × b (cross product) D) a / b Answer: B Explanation: The dot product is calculated as the transpose of a multiplied by b, yielding a scalar. Question 6. The probability density function of a standard normal distribution has a mean of: A) 0 B) 1 C) - 1 D) 0.

C) Mode imputation D) Constant imputation Answer: C Explanation: Mode imputation uses the most common (modal) value, appropriate for categorical or discrete numeric features. Question 10. The Inter‑Quartile Range (IQR) method identifies outliers by comparing a data point to: A) Mean ± 3 × standard deviation B) 1.5 × IQR above the third quartile or below the first quartile C) Median absolute deviation D) Z‑score greater than 2 Answer: B Explanation: The IQR rule flags points lying beyond Q1 − 1.5·IQR or Q3 + 1.5·IQR as outliers. Question 11. Min‑Max scaling transforms a feature to which range? A) [‑1, 1] B) [0, 1] C) [‑∞, ∞] D) [−0.5, 0.5] Answer: B Explanation: Min‑Max scaling linearly maps values to the interval [0, 1] using (x − min)/(max − min). Question 12. One‑hot encoding a categorical variable with three distinct categories results in: A) A single integer column. B) Three binary columns, each representing one category.

C) Two columns with values 0, 1, 2. D) A continuous probability distribution. Answer: B Explanation: One‑hot encoding creates a binary column for each possible category, ensuring mutual exclusivity. Question 13. In text processing, TF‑IDF stands for: A) Term Frequency‑Inverse Document Frequency B) Total Frequency‑Inverse Data Factor C) Token Frequency‑Independent Data Feature D) Term Frequency‑Indexed Data Form Answer: A Explanation: TF‑IDF weighs terms by how often they appear in a document (TF) and how rare they are across the corpus (IDF). Question 14. Principal Component Analysis (PCA) primarily aims to: A) Increase the number of features. B) Reduce dimensionality while preserving variance. C) Perform supervised classification. D) Encode categorical variables. Answer: B Explanation: PCA transforms data onto orthogonal components that capture maximal variance, reducing dimensionality. Question 15. Which feature‑selection method evaluates each feature independently using a statistical test?

Question 18. In linear regression, the cost function commonly minimized is: A) Mean Absolute Error (MAE) B) Mean Squared Error (MSE) C) Hinge loss D) Cross‑entropy loss Answer: B Explanation: Ordinary Least Squares minimizes the MSE, which penalizes larger errors more heavily. Question 19. The Normal Equation provides a closed‑form solution for linear regression parameters when: A) The number of features exceeds the number of samples. B) The design matrix is non‑invertible. C) The loss function is MSE and no regularization is applied. D) Gradient descent is used. Answer: C Explanation: The Normal Equation θ = (XᵀX)⁻¹Xᵀy solves for θ analytically under MSE loss without regularization. Question 20. L2 regularization (Ridge) adds which term to the loss function? A) λ ∑|θᵢ| B) λ ∑θᵢ² C) λ ∑θᵢ³ D) λ ∑log|θᵢ| Answer: B Explanation: Ridge regression penalizes the squared magnitude of coefficients, encouraging smaller but non‑zero weights.

Question 21. Which regression model can capture non‑linear relationships by augmenting features with polynomial terms? A) Linear regression B) Ridge regression C) Polynomial regression D) Lasso regression Answer: C Explanation: Polynomial regression explicitly includes higher‑order terms, allowing the model to fit curves. Question 22. In logistic regression, the sigmoid function maps any real‑valued input to: A) (‑∞, ∞) B) (‑1, 1) C) (0, 1) D) (‑0.5, 0.5) Answer: C Explanation: The sigmoid σ(z)=1/(1+e⁻ᶻ) outputs values strictly between 0 and 1, interpretable as probabilities. Question 23. The loss function used for binary classification in logistic regression is: A) Mean Squared Error B) Hinge loss C) Cross‑entropy (log loss) D) Absolute deviation Answer: C

Answer: B Explanation: The RBF kernel computes similarity in an infinite‑dimensional space, allowing SVM to find non‑linear separating hyperplanes. Question 27. In decision tree learning, Gini impurity is defined as: A) 1 − ∑pᵢ² where pᵢ is the proportion of class i. B) −∑pᵢ log₂ pᵢ. C) ∑|pᵢ − pⱼ|. D) The variance of the target variable. Answer: A Explanation: Gini impurity measures the probability of misclassifying a randomly chosen element if it were labeled according to the class distribution. Question 28. Pruning a decision tree helps to: A) Increase model complexity. B) Reduce overfitting by removing noisy branches. C) Convert the tree into a linear model. D) Increase the depth of the tree. Answer: B Explanation: Pruning eliminates branches that provide little predictive power, improving generalization. Question 29. Naïve Bayes assumes which of the following about features? A) They are linearly correlated. B) They are conditionally independent given the class label. C) They follow a uniform distribution. D) They have equal variance.

Answer: B Explanation: The core assumption of Naïve Bayes is conditional independence among features for each class. Question 30. Which variant of Naïve Bayes is appropriate for modeling word counts in text classification? A) Gaussian Naïve Bayes B) Bernoulli Naïve Bayes C) Multinomial Naïve Bayes D) Categorical Naïve Bayes Answer: C Explanation: Multinomial Naïve Bayes models the distribution of discrete counts, ideal for bag‑of‑words representations. Question 31. Random Forest reduces variance primarily by: A) Using a single deep decision tree. B) Averaging predictions from many decorrelated trees built on bootstrapped samples. C) Applying L1 regularization to each tree. D) Pruning each tree aggressively. Answer: B Explanation: Bagging (bootstrap aggregation) and random feature selection decorrelate trees; averaging their predictions reduces variance. Question 32. Gradient Boosting Machines (GBM) differ from AdaBoost in that GBM: A) Uses only decision stumps. B) Optimizes a differentiable loss function via gradient descent.

B) The within‑cluster sum of squares (WCSS) to find a point where adding more clusters yields diminishing returns. C) The distance between cluster centroids. D) The number of iterations required for convergence. Answer: B Explanation: Plotting WCSS versus K shows a sharp decrease (the “elbow”) where additional clusters no longer significantly reduce variance. Question 36. Which linkage criterion in hierarchical agglomerative clustering merges clusters based on the maximum distance between any two points (one from each cluster)? A) Single linkage B) Complete linkage C) Average linkage D) Ward’s method Answer: B Explanation: Complete linkage uses the farthest pairwise distance, leading to compact clusters. Question 37. DBSCAN can identify clusters of arbitrary shape because it: A) Requires a predefined number of clusters K. B) Uses density reachability and connectivity rather than centroids. C) Relies on hierarchical tree structures. D) Optimizes a silhouette score. Answer: B Explanation: DBSCAN groups points that are densely packed and marks low‑density points as noise, allowing irregular cluster shapes.

Question 38. The silhouette coefficient for a data point ranges between: A) 0 and 1 B) – 1 and 1 C) – ∞ and ∞ D) 0 and ∞ Answer: B Explanation: Silhouette values near 1 indicate well‑matched points, near 0 ambiguous, and negative values suggest misclassification. Question 39. In the Apriori algorithm, the “support” of an itemset is defined as: A) The probability that the rule is correct. B) The proportion of transactions containing the itemset. C) The confidence of the rule. D) The lift of the rule. Answer: B Explanation: Support measures how frequently an itemset appears in the dataset, used to prune infrequent candidates. Question 40. Lift greater than 1 in association rule mining indicates: A) The rule is less interesting than random chance. B) The antecedent and consequent occur together more often than expected if independent. C) The rule has low confidence. D) The support is zero. Answer: B Explanation: Lift = confidence / expected confidence; >1 means the items co‑occur more than by chance.

Explanation: The original perceptron uses a binary step function to output 0 or 1 based on the weighted sum. Question 44. The universal approximation theorem states that a feed‑forward neural network with at least one hidden layer can: A) Exactly represent any function with finite data. B) Approximate any continuous function on a compact subset of ℝⁿ, given sufficient neurons. C) Solve any NP‑hard problem. D) Converge in a single epoch. Answer: B Explanation: With enough hidden units and appropriate activation, a neural network can approximate any continuous mapping. Question 45. Which activation function suffers from the “dying ReLU” problem? A) Sigmoid B) Tanh C) ReLU D) Softmax Answer: C Explanation: ReLU outputs zero for negative inputs; if many neurons receive negative inputs, they may never activate again (die). Question 46. In backpropagation, the chain rule is used to compute: A) The forward pass activations. B) The gradient of the loss with respect to each weight. C) The learning rate schedule.

D) The model’s bias term only. Answer: B Explanation: The chain rule propagates error derivatives from the output layer back through each layer to update weights. Question 47. Mini‑batch gradient descent differs from stochastic gradient descent (SGD) by: A) Using the entire dataset for each update. B) Updating parameters after computing gradients on a small subset (batch) of samples, reducing variance compared to pure SGD. C) Not requiring a learning rate. D) Guaranteeing convergence in fewer epochs. Answer: B Explanation: Mini‑batch GD balances the noisy updates of SGD with the stability of batch GD by processing batches of size >1. Question 48. The Adam optimizer combines which two techniques? A) Momentum and Nesterov acceleration. B) AdaGrad and RMSprop (adaptive learning rates with momentum). C) Stochastic gradient descent and Newton’s method. D) Learning rate decay and dropout. Answer: B Explanation: Adam computes adaptive learning rates like RMSprop and incorporates momentum (first‑moment estimate), leading to fast convergence. Question 49. In a convolutional layer, the term “filter” (or kernel) refers to: A) The activation function applied after convolution.

B. Adding more hidden units. C. Employing gating mechanisms such as LSTM or GRU cells. D. Using sigmoid activation only. Answer: C Explanation: LSTM and GRU introduce gates that preserve gradients over long time spans, alleviating vanishing gradients. Question 53. An autoencoder’s reconstruction loss is typically measured with: A) Hinge loss B) Cross‑entropy (for binary data) or MSE (for continuous data) C. Kullback‑Leibler divergence D. Cosine similarity Answer: B Explanation: The loss quantifies the difference between the input and its reconstruction; MSE or binary cross‑entropy are common choices. Question 54. In a Generative Adversarial Network (GAN), the discriminator’s objective is to: A) Generate realistic samples. B) Minimize the distance between real and fake data distributions. C. Accurately classify inputs as real or generated. D. Encode data into a latent space. Answer: C Explanation: The discriminator learns to distinguish between real data and the generator’s synthetic samples. Question 55. The attention mechanism in Transformer models primarily allows the network to:

A. Convolve over fixed‑size windows. B. Focus on relevant parts of the input sequence when generating each output token. C. Reduce the number of parameters via weight sharing. D. Perform pooling over time steps. Answer: B Explanation: Self‑attention computes weighted sums of all positions, enabling each token to attend to others dynamically. Question 56. In regression evaluation, a higher R² indicates: A. Worse fit to the data. B. More variance explained by the model relative to a baseline. C. Higher bias. D. Larger prediction errors. Answer: B Explanation: R² = 1 − (SS_res / SS_tot) measures proportion of variance captured; closer to 1 means better explanatory power. Question 57. Which metric is most appropriate for imbalanced binary classification when false negatives are particularly costly? A. Accuracy B. Precision C. Recall D. F1‑Score Answer: C Explanation: Recall (sensitivity) emphasizes correctly identifying positive cases, reducing false negatives.

Workbooks Machine learning Practice Exam, Exams of Technology

Related documents

Partial preview of the text

Download Workbooks Machine learning Practice Exam and more Exams Technology in PDF only on Docsity!