




























































































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
This practice exam tests machine learning knowledge with Python. It features multiple-choice questions on supervised learning, bias-variance, cross-validation, linear algebra, probability, and scikit-learn. Detailed answer explanations aid learning. It's ideal for certification prep or reinforcing understanding of machine learning concepts and Python implementation. Covering essential techniques, it comprehensively reviews key areas, testing knowledge of algorithms, preprocessing, and model evaluation. Questions promote critical thinking and problem-solving, crucial for machine learning success. It's an excellent tool for self-assessment and advanced topic preparation.
Typology: Exams
1 / 102
This page cannot be seen from the preview
Don't miss anything!





























































































Question 1. Which of the following best describes supervised learning? A) Learning from unlabeled data to discover hidden patterns B) Learning from labeled data to predict outcomes C) Learning by interacting with an environment and receiving rewards D) Learning without any human‑provided feedback Answer: B Explanation: Supervised learning uses input‑output pairs (labels) to train models that can predict the output for new inputs. Question 2. In the bias‑variance trade‑off, a model that is too simple typically suffers from: A) High variance B) High bias C) Low bias and low variance D) Overfitting Answer: B Explanation: An overly simple model cannot capture the underlying pattern, leading to high bias (underfitting). Question 3. Which dataset split is used exclusively for final performance reporting after all model tuning is completed? A) Training set B) Validation set C) Test set D) Cross‑validation set
Answer: C Explanation: The test set is held out until the very end to provide an unbiased estimate of generalization performance. Question 4. In K‑fold cross‑validation with K=5, how many times is the model trained? A) 1 B) 5 C) 10 D) 25 Answer: B Explanation: The data is split into 5 folds; the model is trained 5 times, each time using 4 folds for training and 1 for validation. Question 5. Which linear algebra operation is essential for computing the dot product between two feature vectors? A) Matrix transpose B) Element‑wise multiplication followed by sum C) Matrix inversion D) Eigenvalue decomposition Answer: B Explanation: The dot product equals the sum of element‑wise products of the two vectors. Question 6. The gradient of a loss function points in the direction of:
Answer: B Explanation: Naive Bayes directly applies Bayes’ theorem with the simplifying assumption of feature independence. Question 9. Which NumPy function creates an array of shape (3,4) filled with zeros? A) np.ones((3,4)) B) np.empty((3,4)) C) np.zeros((3,4)) D) np.arange(12).reshape(3,4) Answer: C Explanation: np.zeros returns an array of the specified shape filled with zeros. Question 10. In pandas, which method is used to combine two DataFrames column‑wise based on a common key? A) concat() B) merge() C) join() D) append() Answer: B Explanation: merge() performs SQL‑style joins on one or more keys, aligning columns accordingly. Question 11. Which seaborn function creates a pairwise scatter plot matrix with histograms on the diagonal?
A) sns.boxplot() B) sns.heatmap() C) sns.pairplot() D) sns.distplot() Answer: C Explanation: pairplot() visualizes relationships between each pair of variables and includes distributions on the diagonal. Question 12. In scikit‑learn, which method fits a model to training data? A) transform() B) predict() C) fit() D) score() Answer: C Explanation: The fit() method learns model parameters from the provided training data. Question 13. Which imputation technique replaces missing numeric values with the median of the column? A) SimpleImputer(strategy='most_frequent') B) SimpleImputer(strategy='mean') C) SimpleImputer(strategy='median') D) SimpleImputer(strategy='constant')
A) Ordinal variables with natural ordering B) Nominal variables without ordering C) Binary variables only D) Continuous numeric variables Answer: B Explanation: One‑Hot Encoding creates a binary column for each category, suitable when categories have no intrinsic order. Question 17. Which of the following creates polynomial features up to degree 3 for a single numeric column x? A) PolynomialFeatures(degree=3, include_bias=False) B) StandardScaler() C) OneHotEncoder() D) MinMaxScaler() Answer: A Explanation: PolynomialFeatures expands input features into all polynomial combinations up to the specified degree. Question 18. In time‑series feature engineering, a “lag‑ 1 ” feature represents:** A) The value of the series at the current time step B) The value one time step ahead of the current observation C) The value one time step behind the current observation D) The cumulative sum up to the current time
Answer: C Explanation: A lag‑1 feature shifts the series by one step backward, providing the previous observation as a predictor. Question 19. Principal Component Analysis (PCA) seeks to maximize:** A) Between‑class variance B) Within‑class variance C) Total variance retained in the projected space D) Correlation between original features Answer: C Explanation: PCA finds orthogonal directions (principal components) that capture the most variance of the data. Question 20. Which feature‑selection method uses a model’s internal importance scores? A) Correlation filter B) Chi‑squared test C) Recursive Feature Elimination (RFE) D) Embedded method (e.g., tree‑based importance) Answer: D Explanation: Embedded methods leverage the model’s own assessment of feature relevance, such as Gini importance in Random Forests. Question 21. In linear regression, the ordinary least squares solution minimizes:** A) Sum of absolute errors (L1 loss)
Explanation: L1 penalty drives some coefficients to exactly zero, performing feature selection. Question 24. In a decision tree, the Gini impurity for a node containing 70% class 0 and 30% class 1 is:** A) 0. B) 0. C) 0. D) 0. Answer: B Explanation: Gini = 1 − (0.7² + 0.3²) = 1 − (0.49 + 0.09) = 0.42. Question 25. Bagging primarily reduces a model’s:** A) Bias B) Variance C) Training time D) Number of features Answer: B Explanation: By averaging predictions over many bootstrap samples, bagging stabilizes the estimator, lowering variance. Question 26. Gradient Boosting builds trees sequentially to:** A) Increase model bias B) Reduce training error by correcting residuals of previous trees
C) Randomly sample features at each split D) Prune trees aggressively Answer: B Explanation: Each new tree is fitted to the residual errors of the ensemble so far, gradually improving performance. Question 27. Which hyperparameter of K‑Nearest Neighbors controls the trade‑off between bias and variance? A) Distance metric B) Number of neighbors K C) Leaf size D) Maximum depth Answer: B Explanation: Small K leads to low bias but high variance; large K increases bias and reduces variance. Question 28. The kernel trick in Support Vector Machines allows:** A) Faster training on linear data B) Implicit mapping of data into high‑dimensional space without explicit computation C) Automatic feature selection D) Direct probability estimates Answer: B
B) Total intra‑cluster entropy C) Silhouette coefficient D) Distance between cluster centroids Answer: A Explanation: K‑Means iteratively reduces the within‑cluster sum of squares (WCSS). Question 32. Agglomerative hierarchical clustering differs from divisive clustering because it:** A) Starts with each point as its own cluster and merges upward B) Starts with one cluster and splits downward C) Requires a predefined number of clusters D) Uses k‑means as a subroutine Answer: A Explanation: Agglomerative (bottom‑up) begins with singleton clusters and merges the closest pairs iteratively. Question 33. The Silhouette Score for a point close to its own cluster centroid and far from neighboring clusters will be:** A) Near – 1 B) Near 0 C) Near 1 D) Undefined Answer: C
Explanation: Silhouette values near 1 indicate well‑clustered points; negative values suggest mis‑assignment. Question 34. In a perceptron, the activation function commonly used for binary classification is:** A) ReLU B) Sigmoid (or step) C) Softmax D) Tanh Answer: B Explanation: The perceptron uses a step (or sigmoid) function to output a binary decision. Question 35. Backpropagation computes gradients for all layers by applying the:** A) Chain rule of calculus B) Newton‑Raphson method C) Monte Carlo simulation D) Genetic algorithm Answer: A Explanation: The chain rule enables efficient calculation of partial derivatives from output back through hidden layers. Question 36. Which optimizer adapts the learning rate for each parameter based on estimates of first and second moments of the gradients? A) Stochastic Gradient Descent (SGD)
Explanation: Gates regulate information flow, allowing gradients to propagate over longer sequences. Question 39. Grid Search differs from Randomized Search in hyperparameter tuning because:** A) Grid Search evaluates a random subset of parameter combinations B) Grid Search exhaustively evaluates every combination in a predefined grid C) Grid Search uses Bayesian optimization D) Grid Search only works for continuous parameters Answer: B Explanation: Grid Search systematically tries all points on the specified hyperparameter grid. Question 40. Which Python library is most commonly used to serialize a scikit‑learn model for later reuse? A) json B) pickle C) csv D) h5py Answer: B Explanation: pickle (or joblib) can store Python objects, including trained models, to disk. Question 41. Model drift refers to:** A) Decrease in training speed over epochs B) Change in data distribution causing performance degradation over time
C) Overfitting on the training set D) Increase in model size after deployment Answer: B Explanation: Drift occurs when the statistical properties of the input data shift, making the model less accurate. Question 42. SHAP values are used to:** A) Optimize hyperparameters B) Visualize decision boundaries C) Explain individual predictions by attributing feature contributions D) Perform dimensionality reduction Answer: C Explanation: SHAP (SHapley Additive exPlanations) quantifies each feature’s contribution to a specific prediction. Question 43. Which metric is appropriate for evaluating a regression model’s predictive accuracy? A) Accuracy B) Precision C) Mean Absolute Error (MAE) D) Confusion matrix Answer: C
B) One‑Hot Encoding C) SMOTE (Synthetic Minority Over‑sampling Technique) D) Principal Component Analysis Answer: C Explanation: SMOTE generates synthetic minority samples to balance class distribution. Question 47. In the context of model evaluation, a “confusion matrix” is useful for:** A) Regression problems only B) Visualizing true vs. predicted class counts in classification C) Measuring correlation between features D) Selecting hyperparameters Answer: B Explanation: The confusion matrix tabulates TP, FP, FN, and TN for a classifier. Question 48. Which activation function is least likely to cause the “dying ReLU” problem? A) ReLU B) Leaky ReLU C) Sigmoid D) Hard‑tanh Answer: B Explanation: Leaky ReLU allows a small gradient when the input is negative, mitigating dead neurons.
Question 49. The “early stopping” technique helps to:** A) Increase model size B) Prevent overfitting by halting training when validation loss stops improving C) Speed up inference time D) Reduce the number of features Answer: B Explanation: Monitoring validation performance and stopping early avoids over‑training. Question 50. When deploying a model via Flask, which HTTP method is typically used to send input data for prediction? A) GET B) POST C) PUT D) DELETE Answer: B Explanation: POST requests carry payloads (e.g., JSON) securely to the server for processing. Question 51. Which of the following is a key advantage of using joblib.dump over pickle.dump for large scikit‑learn models? A) Faster serialization of large NumPy arrays B) Compatibility with Java applications C) Automatic model versioning