Machine Learning with Python: Practice Exam Questions, Exams of Technology

A set of practice exam questions focused on machine learning with python. It covers key concepts and techniques, including supervised learning, feature engineering, data manipulation with numpy and pandas, visualization with seaborn, and various machine learning algorithms implemented in scikit-learn. Each question is accompanied by a detailed explanation, making it a valuable resource for students and practitioners preparing for certification or seeking to reinforce their understanding of machine learning principles and python implementation. The questions cover topics such as data preprocessing, model selection, and evaluation metrics, offering a comprehensive review of essential machine learning concepts. This resource is designed to enhance practical skills and theoretical knowledge in machine learning with python.

Typology: Exams

2025/2026

Available from 12/21/2025

shilpi-jain-1
shilpi-jain-1 🇮🇳

4.2

(5)

29K documents

1 / 89

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Machine Learning with Python A Practical
Introduction Certificate Practice Exam
**Question 1.** Which learning paradigm aims to map inputs to outputs using labeled
examples?
A) Unsupervised Learning
B) Reinforcement Learning
C) Supervised Learning
D) SemiSupervised Learning
Answer: C
Explanation: Supervised learning uses labeled data to learn a function that maps inputs to
known outputs.
**Question 2.** In the typical ML workflow, which step comes immediately after data cleaning?
A) Model Deployment
B) Feature Engineering
C) Data Acquisition
D) Model Evaluation
Answer: B
Explanation: After cleaning, practitioners usually engineer features to improve model
performance before training.
**Question 3.** Which NumPy function creates an array of shape (3,4) filled with zeros?
A) `np.empty((3,4))`
B) `np.ones((3,4))`
C) `np.zeros((3,4))`
D) `np.arange(12).reshape(3,4)`
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43
pf44
pf45
pf46
pf47
pf48
pf49
pf4a
pf4b
pf4c
pf4d
pf4e
pf4f
pf50
pf51
pf52
pf53
pf54
pf55
pf56
pf57
pf58
pf59

Partial preview of the text

Download Machine Learning with Python: Practice Exam Questions and more Exams Technology in PDF only on Docsity!

Introduction Certificate Practice Exam

Question 1. Which learning paradigm aims to map inputs to outputs using labeled examples? A) Unsupervised Learning B) Reinforcement Learning C) Supervised Learning D) Semi‑Supervised Learning Answer: C Explanation: Supervised learning uses labeled data to learn a function that maps inputs to known outputs. Question 2. In the typical ML workflow, which step comes immediately after data cleaning? A) Model Deployment B) Feature Engineering C) Data Acquisition D) Model Evaluation Answer: B Explanation: After cleaning, practitioners usually engineer features to improve model performance before training. Question 3. Which NumPy function creates an array of shape (3,4) filled with zeros? A) np.empty((3,4)) B) np.ones((3,4)) C) np.zeros((3,4)) D) np.arange(12).reshape(3,4)

Introduction Certificate Practice Exam

Answer: C Explanation: np.zeros returns an array of the given shape filled with 0.0. Question 4. In pandas, which method returns the number of non‑null observations per column? A) df.count() B) df.size C) df.shape D) df.info() Answer: A Explanation: DataFrame.count() counts non‑null values for each column. Question 5. Which seaborn function is best suited for visualizing the distribution of a single numeric variable? A) sns.boxplot() B) sns.histplot() C) sns.scatterplot() D) sns.heatmap() Answer: B Explanation: sns.histplot creates histograms (or KDE) to show a variable’s distribution. Question 6. Which scikit‑learn class implements the k‑Nearest Neighbors algorithm for classification? A) LinearRegression B) KNeighborsClassifier C) SVC

Introduction Certificate Practice Exam

B) 1.

C) 2.

D) 3.

Answer: D Explanation: A common rule marks |Z| > 3 as an extreme outlier. Question 10. Which scaling technique transforms data to have zero mean and unit variance? A) Min‑Max Scaling B) Robust Scaling C) Standardization D) Log Transformation Answer: C Explanation: Standardization (Z‑score scaling) centers data at 0 and scales to unit variance. Question 11. One‑Hot Encoding is appropriate for which type of variable? A) Ordinal with natural ordering B) Nominal categorical without order C) Continuous numeric D) Binary target variable Answer: B Explanation: One‑Hot creates a binary column for each category, suitable when categories have no hierarchy. Question 12. Label Encoding can unintentionally introduce what problem for tree‑based models?

Introduction Certificate Practice Exam

A) Multicollinearity B) Artificial ordering C) Data leakage D) Overfitting to noise Answer: B Explanation: Assigning integer codes implies an order that may mislead algorithms that treat values ordinally. Question 13. Binning a continuous variable into quartiles results in how many distinct bins? A) 2 B) 3 C) 4 D) 5 Answer: C Explanation: Quartiles divide data into four equal‑frequency groups. Question 14. Adding a product term X1 * X2 to a regression model is an example of creating: A) A polynomial feature B) An interaction term C) A regularization term D) A target variable Answer: B Explanation: Interaction terms capture joint effects of two predictors. Question 15. Principal Component Analysis (PCA) primarily seeks to maximize:

Introduction Certificate Practice Exam

A) The feature decreases the odds of the positive class B) The feature has no effect on odds C) The feature increases the odds of the positive class D) The model is overfitted Answer: C Explanation: Positive log‑odds coefficients raise the probability of class 1. Question 19. Which impurity measure is based on the probability of misclassifying a randomly chosen element? A) Gini Index B) Entropy C) Information Gain D) Chi‑square Answer: A Explanation: Gini impurity = 1 − ∑p_i², reflecting misclassification probability. Question 20. Information Gain in a decision tree is computed using: A) Gini impurity reduction B) Reduction in variance C) Decrease in entropy after a split D) Increase in silhouette score Answer: C Explanation: Information Gain = parent entropy − weighted child entropy. Question 21. Random Forest reduces variance primarily through:

Introduction Certificate Practice Exam

A) Gradient descent B) Bagging of multiple decision trees C) Pruning each tree aggressively D) Using a single deep tree Answer: B Explanation: Bagging (bootstrap aggregation) averages many decorrelated trees, lowering variance. Question 22. Which hyperparameter controls the depth of each tree in a Random Forest? A) n_estimators B) max_features C) max_depth D) learning_rate Answer: C Explanation: max_depth limits how deep individual trees can grow. Question 23. Gradient Boosting builds trees sequentially to: A) Reduce bias by correcting residual errors of previous trees B) Increase variance for better fit C) Randomly sample features at each split D) Perform bagging of independent trees Answer: A Explanation: Each new tree is fit to the residuals, focusing on errors of the ensemble. Question 24. In XGBoost, which parameter controls the contribution of each tree to the final model?

Introduction Certificate Practice Exam

A) Increases model variance B) Decreases model bias C) Increases bias and reduces variance D) Has no effect on bias‑variance trade‑off Answer: C Explanation: Larger k smooths decision boundaries (higher bias) but makes predictions more stable (lower variance). Question 28. Which distance metric is most appropriate for high‑dimensional sparse data? A) Euclidean distance B) Manhattan distance C) Cosine similarity (converted to distance) D) Hamming distance Answer: C Explanation: Cosine similarity focuses on orientation rather than magnitude, handling sparsity well. Question 29. Simple linear regression estimates the relationship between one predictor and the target by minimizing: A) Sum of absolute errors B) Sum of squared residuals (RSS) C) Hinge loss D) Log‑loss Answer: B Explanation: Ordinary Least Squares minimizes the Residual Sum of Squares.

Introduction Certificate Practice Exam

Question 30. In multiple linear regression, the coefficient of determination (R^2) measures: A) The proportion of variance explained by the model B) The average absolute error C) The correlation between predictors D) The probability of overfitting Answer: A Explanation: (R^2 = 1 - \frac{SS_{res}}{SS_{tot}}) quantifies explained variance. Question 31. Lasso regression differs from Ridge regression by using which penalty? A) L2 (squared) penalty B) L1 (absolute) penalty C) Elastic penalty D) No penalty Answer: B Explanation: Lasso adds an L1 norm penalty, encouraging sparsity (coefficients to zero). Question 32. Which regularization technique can both shrink coefficients and perform variable selection? A) Ridge B) Lasso C) Elastic Net D) None of the above Answer: C Explanation: Elastic Net combines L1 and L2 penalties, offering shrinkage and selection.

Introduction Certificate Practice Exam

Question 36. Which metric is most appropriate when the classes are highly imbalanced? A) Accuracy B) Precision C) Recall D) F1‑Score Answer: D Explanation: F1‑Score balances precision and recall, providing a more informative measure under imbalance. Question 37. In a confusion matrix, the term False Positive refers to: A) Correctly predicted negative instances B) Incorrectly predicted positive instances C) Correctly predicted positive instances D) Incorrectly predicted negative instances Answer: B Explanation: A false positive occurs when the model predicts the positive class but the true class is negative. Question 38. The ROC curve plots: A) Precision vs. Recall B) True Positive Rate vs. False Positive Rate C) Accuracy vs. Threshold D) F1‑Score vs. Threshold Answer: B Explanation: ROC visualizes the trade‑off between sensitivity (TPR) and 1‑specificity (FPR) across thresholds.

Introduction Certificate Practice Exam

Question 39. An AUC value of 0.5 indicates: A) Perfect discrimination B) No discriminative ability (random guessing) C) Excellent model performance D) Overfitting Answer: B Explanation: AUC = 0.5 corresponds to the diagonal line of a random classifier. Question 40. Which regression metric is most sensitive to outliers? A) Mean Absolute Error (MAE) B) Mean Squared Error (MSE) C) Median Absolute Deviation D) R² Score Answer: B Explanation: Squaring errors amplifies large deviations, making MSE highly sensitive to outliers. Question 41. The coefficient of determination (R^2) can become negative when: A) The model predicts perfectly B) The model performs worse than predicting the mean of the target C) There are more predictors than observations D) The data are standardized Answer: B Explanation: If the residual sum of squares exceeds total sum of squares, (R^2) falls below zero.

Introduction Certificate Practice Exam

Answer: B Explanation: The elbow point indicates diminishing returns in SSE reduction as k increases. Question 45. In hierarchical clustering, a dendrogram cut at a higher linkage distance results in: A) More clusters B) Fewer clusters C) Same number of clusters regardless of cut D) No clusters Answer: B Explanation: Cutting higher yields larger merged groups, thus fewer clusters. Question 46. Which linkage criterion defines the distance between two clusters as the maximum distance between their members? A) Single linkage B) Complete linkage C) Average linkage D) Ward linkage Answer: B Explanation: Complete linkage uses the farthest pairwise distance. Question 47. Grid Search differs from Randomized Search primarily in: A) Number of hyperparameter combinations evaluated B) Ability to handle categorical parameters C) Use of Bayesian optimization D) Requirement of a validation set

Introduction Certificate Practice Exam

Answer: A Explanation: Grid Search exhaustively evaluates every point in a predefined grid; Randomized Search samples a fixed number of random combinations. Question 48. When using RandomizedSearchCV, the n_iter parameter controls: A) Number of cross‑validation folds B) Number of hyperparameter settings to try C) Maximum depth of decision trees D) Number of parallel jobs Answer: B Explanation: n_iter specifies how many random parameter configurations are sampled. Question 49. In a scikit‑learn Pipeline, which step must appear before the estimator to avoid data leakage? A) Model training B) Hyperparameter tuning C) Feature scaling or encoding D) Model persistence Answer: C Explanation: Preprocessing must be applied within the pipeline so that scaling/encoding is fit only on training folds. Question 50. Which library function is used to persist a scikit‑learn model to disk? A) numpy.save B) pickle.dump C) matplotlib.savefig

Introduction Certificate Practice Exam

C) It centers data to mean 0 and scales to unit variance D) It only works on binary features Answer: C Explanation: StandardScaler performs Z‑score standardization. Question 54. The fit_transform method of a transformer in a pipeline: A) Fits the transformer on the test set only B) Fits on the training data and then transforms the same data C) Only transforms data without fitting D) Saves the model to disk automatically Answer: B Explanation: fit_transform learns parameters from the input (training) and returns the transformed data. Question 55. Which evaluation metric is appropriate for a regression problem where the cost of over‑prediction is twice that of under‑prediction? A) Mean Absolute Error (MAE) B) Mean Squared Error (MSE) C) Weighted Mean Absolute Error D) R² Score Answer: C Explanation: A weighted MAE can assign different penalties to over‑ and under‑predictions. Question 56. In a confusion matrix for a binary classifier, the sum of all four cells equals: A) The number of true positives only B) The total number of predictions made

Introduction Certificate Practice Exam

C) The number of false negatives only D) Twice the number of samples Answer: B Explanation: The matrix accounts for every observation, so the sum equals the total sample count. Question 57. Which of the following is not a typical advantage of using a Random Forest over a single Decision Tree? A) Reduced overfitting B) Higher interpretability C) Ability to handle high‑dimensional data D) Implicit feature importance estimation Answer: B Explanation: Random Forests are less interpretable than a single tree because they aggregate many trees. Question 58. The learning_rate hyperparameter in Gradient Boosting controls: A) Number of trees in the ensemble B) Depth of each tree C) Contribution of each new tree to the overall model D) Minimum samples per leaf Answer: C Explanation: A smaller learning rate shrinks each tree’s impact, often requiring more trees. Question 59. Which kernel function corresponds to an infinite‑dimensional feature space? A) Linear kernel