Machine Learning with Python: Practice Exam Questions, Exams of Technology

This practice exam tests machine learning knowledge with Python. It features multiple-choice questions on supervised learning, bias-variance, cross-validation, linear algebra, probability, and scikit-learn. Detailed answer explanations aid learning. It's ideal for certification prep or reinforcing understanding of machine learning concepts and Python implementation. Covering essential techniques, it comprehensively reviews key areas, testing knowledge of algorithms, preprocessing, and model evaluation. Questions promote critical thinking and problem-solving, crucial for machine learning success. It's an excellent tool for self-assessment and advanced topic preparation.

Typology: Exams

2025/2026

Available from 12/22/2025

shilpi-jain-1
shilpi-jain-1 🇮🇳

4.2

(5)

29K documents

1 / 102

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Machine Learning using Python Theory and
Application Certificate Practice Exam
**Question 1.** Which of the following best describes supervised learning?
A) Learning from unlabeled data to discover hidden patterns
B) Learning from labeled data to predict outcomes
C) Learning by interacting with an environment and receiving rewards
D) Learning without any humanprovided feedback
Answer: B
Explanation: Supervised learning uses inputoutput pairs (labels) to train models that can
predict the output for new inputs.
**Question 2.** In the biasvariance tradeoff, a model that is too simple typically suffers from:
A) High variance
B) High bias
C) Low bias and low variance
D) Overfitting
Answer: B
Explanation: An overly simple model cannot capture the underlying pattern, leading to high bias
(underfitting).
**Question 3.** Which dataset split is used exclusively for final performance reporting after all
model tuning is completed?
A) Training set
B) Validation set
C) Test set
D) Crossvalidation set
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43
pf44
pf45
pf46
pf47
pf48
pf49
pf4a
pf4b
pf4c
pf4d
pf4e
pf4f
pf50
pf51
pf52
pf53
pf54
pf55
pf56
pf57
pf58
pf59
pf5a
pf5b
pf5c
pf5d
pf5e
pf5f
pf60
pf61
pf62
pf63
pf64

Partial preview of the text

Download Machine Learning with Python: Practice Exam Questions and more Exams Technology in PDF only on Docsity!

Application Certificate Practice Exam

Question 1. Which of the following best describes supervised learning? A) Learning from unlabeled data to discover hidden patterns B) Learning from labeled data to predict outcomes C) Learning by interacting with an environment and receiving rewards D) Learning without any human‑provided feedback Answer: B Explanation: Supervised learning uses input‑output pairs (labels) to train models that can predict the output for new inputs. Question 2. In the bias‑variance trade‑off, a model that is too simple typically suffers from: A) High variance B) High bias C) Low bias and low variance D) Overfitting Answer: B Explanation: An overly simple model cannot capture the underlying pattern, leading to high bias (underfitting). Question 3. Which dataset split is used exclusively for final performance reporting after all model tuning is completed? A) Training set B) Validation set C) Test set D) Cross‑validation set

Application Certificate Practice Exam

Answer: C Explanation: The test set is held out until the very end to provide an unbiased estimate of generalization performance. Question 4. In K‑fold cross‑validation with K=5, how many times is the model trained? A) 1 B) 5 C) 10 D) 25 Answer: B Explanation: The data is split into 5 folds; the model is trained 5 times, each time using 4 folds for training and 1 for validation. Question 5. Which linear algebra operation is essential for computing the dot product between two feature vectors? A) Matrix transpose B) Element‑wise multiplication followed by sum C) Matrix inversion D) Eigenvalue decomposition Answer: B Explanation: The dot product equals the sum of element‑wise products of the two vectors. Question 6. The gradient of a loss function points in the direction of:

Application Certificate Practice Exam

Answer: B Explanation: Naive Bayes directly applies Bayes’ theorem with the simplifying assumption of feature independence. Question 9. Which NumPy function creates an array of shape (3,4) filled with zeros? A) np.ones((3,4)) B) np.empty((3,4)) C) np.zeros((3,4)) D) np.arange(12).reshape(3,4) Answer: C Explanation: np.zeros returns an array of the specified shape filled with zeros. Question 10. In pandas, which method is used to combine two DataFrames column‑wise based on a common key? A) concat() B) merge() C) join() D) append() Answer: B Explanation: merge() performs SQL‑style joins on one or more keys, aligning columns accordingly. Question 11. Which seaborn function creates a pairwise scatter plot matrix with histograms on the diagonal?

Application Certificate Practice Exam

A) sns.boxplot() B) sns.heatmap() C) sns.pairplot() D) sns.distplot() Answer: C Explanation: pairplot() visualizes relationships between each pair of variables and includes distributions on the diagonal. Question 12. In scikit‑learn, which method fits a model to training data? A) transform() B) predict() C) fit() D) score() Answer: C Explanation: The fit() method learns model parameters from the provided training data. Question 13. Which imputation technique replaces missing numeric values with the median of the column? A) SimpleImputer(strategy='most_frequent') B) SimpleImputer(strategy='mean') C) SimpleImputer(strategy='median') D) SimpleImputer(strategy='constant')

Application Certificate Practice Exam

A) Ordinal variables with natural ordering B) Nominal variables without ordering C) Binary variables only D) Continuous numeric variables Answer: B Explanation: One‑Hot Encoding creates a binary column for each category, suitable when categories have no intrinsic order. Question 17. Which of the following creates polynomial features up to degree 3 for a single numeric column x? A) PolynomialFeatures(degree=3, include_bias=False) B) StandardScaler() C) OneHotEncoder() D) MinMaxScaler() Answer: A Explanation: PolynomialFeatures expands input features into all polynomial combinations up to the specified degree. Question 18. In time‑series feature engineering, a “lag‑ 1 ” feature represents:** A) The value of the series at the current time step B) The value one time step ahead of the current observation C) The value one time step behind the current observation D) The cumulative sum up to the current time

Application Certificate Practice Exam

Answer: C Explanation: A lag‑1 feature shifts the series by one step backward, providing the previous observation as a predictor. Question 19. Principal Component Analysis (PCA) seeks to maximize:** A) Between‑class variance B) Within‑class variance C) Total variance retained in the projected space D) Correlation between original features Answer: C Explanation: PCA finds orthogonal directions (principal components) that capture the most variance of the data. Question 20. Which feature‑selection method uses a model’s internal importance scores? A) Correlation filter B) Chi‑squared test C) Recursive Feature Elimination (RFE) D) Embedded method (e.g., tree‑based importance) Answer: D Explanation: Embedded methods leverage the model’s own assessment of feature relevance, such as Gini importance in Random Forests. Question 21. In linear regression, the ordinary least squares solution minimizes:** A) Sum of absolute errors (L1 loss)

Application Certificate Practice Exam

Explanation: L1 penalty drives some coefficients to exactly zero, performing feature selection. Question 24. In a decision tree, the Gini impurity for a node containing 70% class 0 and 30% class 1 is:** A) 0. B) 0. C) 0. D) 0. Answer: B Explanation: Gini = 1 − (0.7² + 0.3²) = 1 − (0.49 + 0.09) = 0.42. Question 25. Bagging primarily reduces a model’s:** A) Bias B) Variance C) Training time D) Number of features Answer: B Explanation: By averaging predictions over many bootstrap samples, bagging stabilizes the estimator, lowering variance. Question 26. Gradient Boosting builds trees sequentially to:** A) Increase model bias B) Reduce training error by correcting residuals of previous trees

Application Certificate Practice Exam

C) Randomly sample features at each split D) Prune trees aggressively Answer: B Explanation: Each new tree is fitted to the residual errors of the ensemble so far, gradually improving performance. Question 27. Which hyperparameter of K‑Nearest Neighbors controls the trade‑off between bias and variance? A) Distance metric B) Number of neighbors K C) Leaf size D) Maximum depth Answer: B Explanation: Small K leads to low bias but high variance; large K increases bias and reduces variance. Question 28. The kernel trick in Support Vector Machines allows:** A) Faster training on linear data B) Implicit mapping of data into high‑dimensional space without explicit computation C) Automatic feature selection D) Direct probability estimates Answer: B

Application Certificate Practice Exam

B) Total intra‑cluster entropy C) Silhouette coefficient D) Distance between cluster centroids Answer: A Explanation: K‑Means iteratively reduces the within‑cluster sum of squares (WCSS). Question 32. Agglomerative hierarchical clustering differs from divisive clustering because it:** A) Starts with each point as its own cluster and merges upward B) Starts with one cluster and splits downward C) Requires a predefined number of clusters D) Uses k‑means as a subroutine Answer: A Explanation: Agglomerative (bottom‑up) begins with singleton clusters and merges the closest pairs iteratively. Question 33. The Silhouette Score for a point close to its own cluster centroid and far from neighboring clusters will be:** A) Near – 1 B) Near 0 C) Near 1 D) Undefined Answer: C

Application Certificate Practice Exam

Explanation: Silhouette values near 1 indicate well‑clustered points; negative values suggest mis‑assignment. Question 34. In a perceptron, the activation function commonly used for binary classification is:** A) ReLU B) Sigmoid (or step) C) Softmax D) Tanh Answer: B Explanation: The perceptron uses a step (or sigmoid) function to output a binary decision. Question 35. Backpropagation computes gradients for all layers by applying the:** A) Chain rule of calculus B) Newton‑Raphson method C) Monte Carlo simulation D) Genetic algorithm Answer: A Explanation: The chain rule enables efficient calculation of partial derivatives from output back through hidden layers. Question 36. Which optimizer adapts the learning rate for each parameter based on estimates of first and second moments of the gradients? A) Stochastic Gradient Descent (SGD)

Application Certificate Practice Exam

Explanation: Gates regulate information flow, allowing gradients to propagate over longer sequences. Question 39. Grid Search differs from Randomized Search in hyperparameter tuning because:** A) Grid Search evaluates a random subset of parameter combinations B) Grid Search exhaustively evaluates every combination in a predefined grid C) Grid Search uses Bayesian optimization D) Grid Search only works for continuous parameters Answer: B Explanation: Grid Search systematically tries all points on the specified hyperparameter grid. Question 40. Which Python library is most commonly used to serialize a scikit‑learn model for later reuse? A) json B) pickle C) csv D) h5py Answer: B Explanation: pickle (or joblib) can store Python objects, including trained models, to disk. Question 41. Model drift refers to:** A) Decrease in training speed over epochs B) Change in data distribution causing performance degradation over time

Application Certificate Practice Exam

C) Overfitting on the training set D) Increase in model size after deployment Answer: B Explanation: Drift occurs when the statistical properties of the input data shift, making the model less accurate. Question 42. SHAP values are used to:** A) Optimize hyperparameters B) Visualize decision boundaries C) Explain individual predictions by attributing feature contributions D) Perform dimensionality reduction Answer: C Explanation: SHAP (SHapley Additive exPlanations) quantifies each feature’s contribution to a specific prediction. Question 43. Which metric is appropriate for evaluating a regression model’s predictive accuracy? A) Accuracy B) Precision C) Mean Absolute Error (MAE) D) Confusion matrix Answer: C

Application Certificate Practice Exam

B) One‑Hot Encoding C) SMOTE (Synthetic Minority Over‑sampling Technique) D) Principal Component Analysis Answer: C Explanation: SMOTE generates synthetic minority samples to balance class distribution. Question 47. In the context of model evaluation, a “confusion matrix” is useful for:** A) Regression problems only B) Visualizing true vs. predicted class counts in classification C) Measuring correlation between features D) Selecting hyperparameters Answer: B Explanation: The confusion matrix tabulates TP, FP, FN, and TN for a classifier. Question 48. Which activation function is least likely to cause the “dying ReLU” problem? A) ReLU B) Leaky ReLU C) Sigmoid D) Hard‑tanh Answer: B Explanation: Leaky ReLU allows a small gradient when the input is negative, mitigating dead neurons.

Application Certificate Practice Exam

Question 49. The “early stopping” technique helps to:** A) Increase model size B) Prevent overfitting by halting training when validation loss stops improving C) Speed up inference time D) Reduce the number of features Answer: B Explanation: Monitoring validation performance and stopping early avoids over‑training. Question 50. When deploying a model via Flask, which HTTP method is typically used to send input data for prediction? A) GET B) POST C) PUT D) DELETE Answer: B Explanation: POST requests carry payloads (e.g., JSON) securely to the server for processing. Question 51. Which of the following is a key advantage of using joblib.dump over pickle.dump for large scikit‑learn models? A) Faster serialization of large NumPy arrays B) Compatibility with Java applications C) Automatic model versioning