

















































































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
A set of practice exam questions focused on machine learning with python. It covers key concepts and techniques, including supervised learning, feature engineering, data manipulation with numpy and pandas, visualization with seaborn, and various machine learning algorithms implemented in scikit-learn. Each question is accompanied by a detailed explanation, making it a valuable resource for students and practitioners preparing for certification or seeking to reinforce their understanding of machine learning principles and python implementation. The questions cover topics such as data preprocessing, model selection, and evaluation metrics, offering a comprehensive review of essential machine learning concepts. This resource is designed to enhance practical skills and theoretical knowledge in machine learning with python.
Typology: Exams
1 / 89
This page cannot be seen from the preview
Don't miss anything!


















































































Question 1. Which learning paradigm aims to map inputs to outputs using labeled examples? A) Unsupervised Learning B) Reinforcement Learning C) Supervised Learning D) Semi‑Supervised Learning Answer: C Explanation: Supervised learning uses labeled data to learn a function that maps inputs to known outputs. Question 2. In the typical ML workflow, which step comes immediately after data cleaning? A) Model Deployment B) Feature Engineering C) Data Acquisition D) Model Evaluation Answer: B Explanation: After cleaning, practitioners usually engineer features to improve model performance before training. Question 3. Which NumPy function creates an array of shape (3,4) filled with zeros? A) np.empty((3,4)) B) np.ones((3,4)) C) np.zeros((3,4)) D) np.arange(12).reshape(3,4)
Answer: C Explanation: np.zeros returns an array of the given shape filled with 0.0. Question 4. In pandas, which method returns the number of non‑null observations per column? A) df.count() B) df.size C) df.shape D) df.info() Answer: A Explanation: DataFrame.count() counts non‑null values for each column. Question 5. Which seaborn function is best suited for visualizing the distribution of a single numeric variable? A) sns.boxplot() B) sns.histplot() C) sns.scatterplot() D) sns.heatmap() Answer: B Explanation: sns.histplot creates histograms (or KDE) to show a variable’s distribution. Question 6. Which scikit‑learn class implements the k‑Nearest Neighbors algorithm for classification? A) LinearRegression B) KNeighborsClassifier C) SVC
Answer: D Explanation: A common rule marks |Z| > 3 as an extreme outlier. Question 10. Which scaling technique transforms data to have zero mean and unit variance? A) Min‑Max Scaling B) Robust Scaling C) Standardization D) Log Transformation Answer: C Explanation: Standardization (Z‑score scaling) centers data at 0 and scales to unit variance. Question 11. One‑Hot Encoding is appropriate for which type of variable? A) Ordinal with natural ordering B) Nominal categorical without order C) Continuous numeric D) Binary target variable Answer: B Explanation: One‑Hot creates a binary column for each category, suitable when categories have no hierarchy. Question 12. Label Encoding can unintentionally introduce what problem for tree‑based models?
A) Multicollinearity B) Artificial ordering C) Data leakage D) Overfitting to noise Answer: B Explanation: Assigning integer codes implies an order that may mislead algorithms that treat values ordinally. Question 13. Binning a continuous variable into quartiles results in how many distinct bins? A) 2 B) 3 C) 4 D) 5 Answer: C Explanation: Quartiles divide data into four equal‑frequency groups. Question 14. Adding a product term X1 * X2 to a regression model is an example of creating: A) A polynomial feature B) An interaction term C) A regularization term D) A target variable Answer: B Explanation: Interaction terms capture joint effects of two predictors. Question 15. Principal Component Analysis (PCA) primarily seeks to maximize:
A) The feature decreases the odds of the positive class B) The feature has no effect on odds C) The feature increases the odds of the positive class D) The model is overfitted Answer: C Explanation: Positive log‑odds coefficients raise the probability of class 1. Question 19. Which impurity measure is based on the probability of misclassifying a randomly chosen element? A) Gini Index B) Entropy C) Information Gain D) Chi‑square Answer: A Explanation: Gini impurity = 1 − ∑p_i², reflecting misclassification probability. Question 20. Information Gain in a decision tree is computed using: A) Gini impurity reduction B) Reduction in variance C) Decrease in entropy after a split D) Increase in silhouette score Answer: C Explanation: Information Gain = parent entropy − weighted child entropy. Question 21. Random Forest reduces variance primarily through:
A) Gradient descent B) Bagging of multiple decision trees C) Pruning each tree aggressively D) Using a single deep tree Answer: B Explanation: Bagging (bootstrap aggregation) averages many decorrelated trees, lowering variance. Question 22. Which hyperparameter controls the depth of each tree in a Random Forest? A) n_estimators B) max_features C) max_depth D) learning_rate Answer: C Explanation: max_depth limits how deep individual trees can grow. Question 23. Gradient Boosting builds trees sequentially to: A) Reduce bias by correcting residual errors of previous trees B) Increase variance for better fit C) Randomly sample features at each split D) Perform bagging of independent trees Answer: A Explanation: Each new tree is fit to the residuals, focusing on errors of the ensemble. Question 24. In XGBoost, which parameter controls the contribution of each tree to the final model?
A) Increases model variance B) Decreases model bias C) Increases bias and reduces variance D) Has no effect on bias‑variance trade‑off Answer: C Explanation: Larger k smooths decision boundaries (higher bias) but makes predictions more stable (lower variance). Question 28. Which distance metric is most appropriate for high‑dimensional sparse data? A) Euclidean distance B) Manhattan distance C) Cosine similarity (converted to distance) D) Hamming distance Answer: C Explanation: Cosine similarity focuses on orientation rather than magnitude, handling sparsity well. Question 29. Simple linear regression estimates the relationship between one predictor and the target by minimizing: A) Sum of absolute errors B) Sum of squared residuals (RSS) C) Hinge loss D) Log‑loss Answer: B Explanation: Ordinary Least Squares minimizes the Residual Sum of Squares.
Question 30. In multiple linear regression, the coefficient of determination (R^2) measures: A) The proportion of variance explained by the model B) The average absolute error C) The correlation between predictors D) The probability of overfitting Answer: A Explanation: (R^2 = 1 - \frac{SS_{res}}{SS_{tot}}) quantifies explained variance. Question 31. Lasso regression differs from Ridge regression by using which penalty? A) L2 (squared) penalty B) L1 (absolute) penalty C) Elastic penalty D) No penalty Answer: B Explanation: Lasso adds an L1 norm penalty, encouraging sparsity (coefficients to zero). Question 32. Which regularization technique can both shrink coefficients and perform variable selection? A) Ridge B) Lasso C) Elastic Net D) None of the above Answer: C Explanation: Elastic Net combines L1 and L2 penalties, offering shrinkage and selection.
Question 36. Which metric is most appropriate when the classes are highly imbalanced? A) Accuracy B) Precision C) Recall D) F1‑Score Answer: D Explanation: F1‑Score balances precision and recall, providing a more informative measure under imbalance. Question 37. In a confusion matrix, the term False Positive refers to: A) Correctly predicted negative instances B) Incorrectly predicted positive instances C) Correctly predicted positive instances D) Incorrectly predicted negative instances Answer: B Explanation: A false positive occurs when the model predicts the positive class but the true class is negative. Question 38. The ROC curve plots: A) Precision vs. Recall B) True Positive Rate vs. False Positive Rate C) Accuracy vs. Threshold D) F1‑Score vs. Threshold Answer: B Explanation: ROC visualizes the trade‑off between sensitivity (TPR) and 1‑specificity (FPR) across thresholds.
Question 39. An AUC value of 0.5 indicates: A) Perfect discrimination B) No discriminative ability (random guessing) C) Excellent model performance D) Overfitting Answer: B Explanation: AUC = 0.5 corresponds to the diagonal line of a random classifier. Question 40. Which regression metric is most sensitive to outliers? A) Mean Absolute Error (MAE) B) Mean Squared Error (MSE) C) Median Absolute Deviation D) R² Score Answer: B Explanation: Squaring errors amplifies large deviations, making MSE highly sensitive to outliers. Question 41. The coefficient of determination (R^2) can become negative when: A) The model predicts perfectly B) The model performs worse than predicting the mean of the target C) There are more predictors than observations D) The data are standardized Answer: B Explanation: If the residual sum of squares exceeds total sum of squares, (R^2) falls below zero.
Answer: B Explanation: The elbow point indicates diminishing returns in SSE reduction as k increases. Question 45. In hierarchical clustering, a dendrogram cut at a higher linkage distance results in: A) More clusters B) Fewer clusters C) Same number of clusters regardless of cut D) No clusters Answer: B Explanation: Cutting higher yields larger merged groups, thus fewer clusters. Question 46. Which linkage criterion defines the distance between two clusters as the maximum distance between their members? A) Single linkage B) Complete linkage C) Average linkage D) Ward linkage Answer: B Explanation: Complete linkage uses the farthest pairwise distance. Question 47. Grid Search differs from Randomized Search primarily in: A) Number of hyperparameter combinations evaluated B) Ability to handle categorical parameters C) Use of Bayesian optimization D) Requirement of a validation set
Answer: A Explanation: Grid Search exhaustively evaluates every point in a predefined grid; Randomized Search samples a fixed number of random combinations. Question 48. When using RandomizedSearchCV, the n_iter parameter controls: A) Number of cross‑validation folds B) Number of hyperparameter settings to try C) Maximum depth of decision trees D) Number of parallel jobs Answer: B Explanation: n_iter specifies how many random parameter configurations are sampled. Question 49. In a scikit‑learn Pipeline, which step must appear before the estimator to avoid data leakage? A) Model training B) Hyperparameter tuning C) Feature scaling or encoding D) Model persistence Answer: C Explanation: Preprocessing must be applied within the pipeline so that scaling/encoding is fit only on training folds. Question 50. Which library function is used to persist a scikit‑learn model to disk? A) numpy.save B) pickle.dump C) matplotlib.savefig
C) It centers data to mean 0 and scales to unit variance D) It only works on binary features Answer: C Explanation: StandardScaler performs Z‑score standardization. Question 54. The fit_transform method of a transformer in a pipeline: A) Fits the transformer on the test set only B) Fits on the training data and then transforms the same data C) Only transforms data without fitting D) Saves the model to disk automatically Answer: B Explanation: fit_transform learns parameters from the input (training) and returns the transformed data. Question 55. Which evaluation metric is appropriate for a regression problem where the cost of over‑prediction is twice that of under‑prediction? A) Mean Absolute Error (MAE) B) Mean Squared Error (MSE) C) Weighted Mean Absolute Error D) R² Score Answer: C Explanation: A weighted MAE can assign different penalties to over‑ and under‑predictions. Question 56. In a confusion matrix for a binary classifier, the sum of all four cells equals: A) The number of true positives only B) The total number of predictions made
C) The number of false negatives only D) Twice the number of samples Answer: B Explanation: The matrix accounts for every observation, so the sum equals the total sample count. Question 57. Which of the following is not a typical advantage of using a Random Forest over a single Decision Tree? A) Reduced overfitting B) Higher interpretability C) Ability to handle high‑dimensional data D) Implicit feature importance estimation Answer: B Explanation: Random Forests are less interpretable than a single tree because they aggregate many trees. Question 58. The learning_rate hyperparameter in Gradient Boosting controls: A) Number of trees in the ensemble B) Depth of each tree C) Contribution of each new tree to the overall model D) Minimum samples per leaf Answer: C Explanation: A smaller learning rate shrinks each tree’s impact, often requiring more trees. Question 59. Which kernel function corresponds to an infinite‑dimensional feature space? A) Linear kernel