PrepIQ Beingcert Machine Learning Ultimate Exam, Exams of Technology

This certification provides knowledge of machine learning algorithms, supervised and unsupervised learning, model training, and data analysis techniques. It is regulated by BeingCert. The exam evaluates predictive modelling and algorithmic understanding.

Typology: Exams

2025/2026

Available from 06/01/2026

shilpi-jain-3
shilpi-jain-3 🇮🇳

2.5

(11)

80K documents

1 / 56

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
PrepIQ Beingcert Machine Learning
Ultimate Exam
**Question 1.** Which measure of central tendency is most appropriate for a
heavily right-skewed distribution?
A) Mean
B) Median
C) Mode
D) Geometric mean
Answer: B
Explanation: The median is resistant to extreme values and better represents
the typical value in a skewed distribution, whereas the mean is pulled toward
the long tail.
**Question 2.** In a dataset, the standard deviation is 0 while the variance is
non-zero. Which statement is true?
A) This situation is impossible.
B) The data contain both positive and negative values.
C) The data are constant but have rounding errors.
D) The variance was calculated using a biased estimator.
Answer: A
Explanation: Standard deviation is the square root of variance; if variance is
non-zero, standard deviation cannot be zero.
**Question 3.** Which of the following techniques is best for detecting
outliers in a univariate numeric feature?
A) Correlation matrix
B) Boxplot (IQR rule)
C) One-hot encoding
D) Min-max scaling
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38

Partial preview of the text

Download PrepIQ Beingcert Machine Learning Ultimate Exam and more Exams Technology in PDF only on Docsity!

Ultimate Exam

Question 1. Which measure of central tendency is most appropriate for a heavily right-skewed distribution? A) Mean B) Median C) Mode D) Geometric mean Answer: B Explanation: The median is resistant to extreme values and better represents the typical value in a skewed distribution, whereas the mean is pulled toward the long tail. Question 2. In a dataset, the standard deviation is 0 while the variance is non-zero. Which statement is true? A) This situation is impossible. B) The data contain both positive and negative values. C) The data are constant but have rounding errors. D) The variance was calculated using a biased estimator. Answer: A Explanation: Standard deviation is the square root of variance; if variance is non-zero, standard deviation cannot be zero. Question 3. Which of the following techniques is best for detecting outliers in a univariate numeric feature? A) Correlation matrix B) Boxplot (IQR rule) C) One-hot encoding D) Min-max scaling

Ultimate Exam

Answer: B Explanation: The interquartile range (IQR) rule identifies points below Q1-1.5·IQR or above Q3+1.5·IQR as outliers. Question 4. A histogram of a variable shows a bell-shaped curve symmetric around the mean. Which distribution most likely generated the data? A) Uniform B) Exponential C) Normal D) Poisson Answer: C Explanation: A symmetric bell-shaped histogram is characteristic of a normal distribution. Question 5. Which correlation coefficient is appropriate for measuring monotonic but non-linear relationships? A) Pearson B) Spearman C) Point-biserial D) Cramér’s V Answer: B Explanation: Spearman’s rho ranks the data and captures monotonic relationships regardless of linearity. Question 6. In a scatter plot matrix, you notice a strong linear pattern between two variables. Which preprocessing step should you consider next? A) Drop one variable to avoid multicollinearity.

Ultimate Exam

Question 9. Which imputation method is most likely to preserve the underlying distribution of a numeric feature with a multimodal shape? A) Mean imputation B) Median imputation C) K-Nearest Neighbors imputation D) Constant zero imputation Answer: C Explanation: K-NN imputation uses similar instances to estimate missing values, better reflecting multimodal patterns than central-tendency measures. Question 10. In a dataset with categorical variable “Color” having three levels, which encoding creates the most compact representation without imposing ordinal relationships? A) Label encoding B) Binary encoding C) One-hot encoding D) Frequency encoding Answer: C Explanation: One-hot encoding creates a separate binary column for each level, avoiding any implied ordering. Question 11. Which scaling method will transform a feature to have mean 0 and standard deviation 1? A) Min-max scaling B) Log transformation C) Z-score standardization D) Robust scaling

Ultimate Exam

Answer: C Explanation: Z-score standardization subtracts the mean and divides by the standard deviation, yielding a distribution with μ = 0, σ = 1. Question 12. A feature selection method that evaluates each predictor individually based on its correlation with the target is an example of a: A) Wrapper method B) Embedded method C) Filter method D) Hybrid method Answer: C Explanation: Filter methods assess features using statistical measures (e.g., correlation) independently of any learning algorithm. Question 13. Which of the following is a wrapper-based feature selection technique? A) Chi-square test B) Recursive Feature Elimination (RFE) C) Lasso regularization D) Mutual information Answer: B Explanation: RFE repeatedly trains a model and removes the least important features, making it a wrapper approach. Question 14. In linear regression, the cost function commonly minimized is: A) Sum of absolute errors B) Mean squared error

Ultimate Exam

A) Adjusted R² B) Variance Inflation Factor (VIF) C) Silhouette score D) Gini impurity Answer: B Explanation: VIF quantifies how much the variance of a coefficient is inflated due to correlation with other predictors. Question 18. Which regularization technique adds a penalty proportional to the absolute value of coefficients? A) Ridge (L2) B) Lasso (L1) C) Elastic Net D) Dropout Answer: B Explanation: Lasso (L1) regularization adds |w| to the loss, encouraging sparsity in the coefficient vector. Question 19. Decision-tree regression predicts a numeric target by: A) Averaging leaf node values weighted by distance. B) Using the mean of training samples in the leaf node. C) Applying a logistic function at each node. D) Maximizing information gain on the target variable. Answer: B Explanation: In regression trees, each leaf stores the mean of the target values of the training samples that fall into that leaf.

Ultimate Exam

Question 20. Random Forest regressor reduces overfitting primarily by: A) Pruning each tree to a single depth. B) Using bootstrapped samples and random feature subsets for each tree. C) Applying L1 regularization to leaf predictions. D) Training a single deep decision tree. Answer: B Explanation: Bagging (bootstrap aggregation) and random feature selection decorrelate trees, lowering variance and overfitting. Question 21. Logistic regression outputs probabilities via which link function? A) Identity B) Logit (sigmoid) C) Softmax D) ReLU Answer: B Explanation: The sigmoid (logit) function maps any real-valued input to the (0,1) interval, providing class probabilities for binary logistic regression. Question 22. For a binary classification problem with imbalanced classes, which metric is most informative when the minority class is the focus? A) Accuracy B) ROC-AUC C) Precision-Recall AUC D) Mean squared error Answer: C

Ultimate Exam

C) They have equal variance. D) They are ordinal. Answer: B Explanation: The “naïve” assumption is that features are independent of each other once the class label is known. Question 26. Which of the following is true about the decision boundary of a Support Vector Machine with a linear kernel? A) It always passes through the origin. B) It maximizes the margin between classes. C) It minimizes the number of support vectors. D) It is equivalent to a k-NN classifier with k = 1. Answer: B Explanation: Linear SVM finds the hyperplane that maximizes the distance (margin) to the nearest points of each class. Question 27. The elbow method for K-Means clustering selects the optimal k by: A) Maximizing silhouette score. B) Finding the point where within-cluster sum of squares (WCSS) stops decreasing rapidly. C) Minimizing the number of clusters. D) Using the Bayesian Information Criterion. Answer: B Explanation: The “elbow” is the point where adding more clusters yields diminishing reductions in WCSS.

Ultimate Exam

Question 28. Hierarchical agglomerative clustering with complete linkage defines the distance between two clusters as: A) The minimum pairwise distance. B) The average pairwise distance. C) The maximum pairwise distance. D) The centroid distance. Answer: C Explanation: Complete linkage uses the farthest pairwise distance, producing compact clusters. Question 29. In association rule mining, the metric “lift” measures: A) The absolute frequency of an itemset. B) The increase in confidence over random chance. C) The ratio of observed support to expected support under independence. D) The number of transactions containing the rule. Answer: C Explanation: Lift = confidence / (support of consequent). Values >1 indicate the rule performs better than chance. Question 30. Which algorithm is most suitable for detecting rare anomalies in a high-dimensional dataset? A) K-Means B) Isolation Forest C) Apriori D) Linear Regression Answer: B

Ultimate Exam

C) 0.

D) 0.

Answer: A Explanation: Precision = TP / (TP + FP) = 50 / (50 + 10) = 0.833 ≈ 0.83. Question 34. For a binary classifier, the ROC curve plots: A) Recall vs. Precision. B) True Positive Rate vs. False Positive Rate. C) Accuracy vs. Threshold. D) F1-Score vs. Threshold. Answer: B Explanation: The Receiver Operating Characteristic curve shows the trade-off between TPR (sensitivity) and FPR as the decision threshold varies. Question 35. Which ensemble method builds models sequentially, where each new model focuses on the errors of its predecessor? A) Bagging B) Random Subspace C) Boosting D) Stacking Answer: C Explanation: Boosting iteratively adds weak learners that concentrate on previously misclassified instances, improving overall performance. Question 36. In XGBoost, the term “shrinkage” refers to: A) Reducing the number of trees. B) Applying L1 regularization only.

Ultimate Exam

C) Multiplying each tree’s contribution by a learning rate η. D) Pruning leaves with low importance. Answer: C Explanation: Shrinkage scales the weight of each new tree by η (learning rate), preventing overfitting and improving generalization. Question 37. Grid Search differs from Random Search in hyperparameter optimization primarily because: A) Grid Search evaluates a random subset of the space. B) Grid Search exhaustively evaluates all combinations on a predefined grid. C) Random Search uses Bayesian inference. D) Grid Search can only tune one parameter at a time. Answer: B Explanation: Grid Search systematically explores each point on a discretized hyperparameter grid, while Random Search samples randomly. Question 38. Which validation strategy is most appropriate when data exhibit temporal ordering? A) Random k-fold cross-validation. B. Stratified k-fold. C. Time-Series split (forward chaining). D. Leave-One-Out. Answer: C Explanation: Time-Series split respects chronological order, training on past data and testing on future data, avoiding leakage. Question 39. A model with training R² = 0.95 and validation R² = 0. likely suffers from:

Ultimate Exam

Question 42. Which technique can be used to reduce the dimensionality of a sparse high-dimensional text dataset while preserving class separability? A) PCA B. LDA (Linear Discriminant Analysis) C. Truncated SVD (Latent Semantic Analysis) D. t-SNE Answer: C Explanation: Truncated SVD works directly on sparse matrices (e.g., TF-IDF) and is commonly used for latent semantic analysis in text. Question 43. When encoding an ordinal categorical variable (e.g., “low”, “medium”, “high”), which method is most appropriate? A) One-hot encoding B) Label encoding preserving order C) Binary encoding D) Frequency encoding Answer: B Explanation: Label encoding that respects the intrinsic ordering retains ordinal information without creating unnecessary dimensions. Question 44. In a regression tree, the split criterion that minimizes the sum of squared residuals is called: A) Gini impurity B) Entropy C) Mean Squared Error (MSE) reduction D) Information gain ratio Answer: C

Ultimate Exam

Explanation: Regression trees use reduction in MSE (or variance) as the impurity measure for selecting splits. Question 45. Which regularization technique can simultaneously perform feature selection and shrinkage? A) Ridge (L2) B) Lasso (L1) C) Elastic Net D. Dropout Answer: C Explanation: Elastic Net combines L1 and L2 penalties, encouraging sparsity (feature selection) while retaining stability of ridge. Question 46. In a binary classification problem, the decision threshold is lowered from 0.5 to 0.2. What is the expected effect on precision and recall? A) Precision ↑, Recall ↓ B) Precision ↓, Recall ↑ C) Both precision and recall ↑ D) Both precision and recall ↓ Answer: B Explanation: Lowering the threshold classifies more instances as positive, increasing true positives (higher recall) but also more false positives (lower precision). Question 47. Which distance metric is most appropriate for K-Means clustering on data that have been standardized to zero mean and unit variance? A) Manhattan distance B) Euclidean distance

Ultimate Exam

B) Bagging reduces bias but not variance. C) It aggregates predictions by averaging (regression) or voting (classification). D) It requires the base models to be neural networks. Answer: C Explanation: Bagging (Bootstrap Aggregating) combines predictions via averaging for continuous targets or majority voting for categorical targets. Question 51. Which evaluation metric is most appropriate for a regression problem where large errors are particularly undesirable? A) MAE B) RMSE C) R² D) MAPE Answer: B Explanation: RMSE penalizes larger errors more heavily than MAE due to squaring, making it suitable when big deviations are costly. Question 52. When performing stratified k-fold cross-validation on a binary dataset, what is preserved across folds? A) Exact same number of samples per fold. B) The proportion of each class. C) The order of samples. D) The feature distribution variance. Answer: B Explanation: Stratification ensures each fold maintains the class proportion of the whole dataset.

Ultimate Exam

Question 53. Which of the following is a key advantage of using a validation curve over a learning curve? A) It shows model performance as a function of training set size. B) It visualizes performance across different hyperparameter values. C) It measures computational cost. D) It evaluates model bias only. Answer: B Explanation: Validation curves plot training and validation scores against varying hyperparameter values, helping to detect over-/under-fitting. Question 54. In a pipeline that includes imputation, scaling, and a classifier, why is it important to fit the imputer and scaler only on the training data? A) To reduce computation time. B) To avoid data leakage from the test set. C) Because imputation cannot be performed on unseen data. D) Scaling does not affect the classifier. Answer: B Explanation: Fitting preprocessing steps on the training data only prevents information from the test set influencing model training, which would inflate performance estimates. Question 55. Which of the following is a characteristic of a soft margin SVM? A) It allows some misclassifications to achieve a larger margin. B) It requires the data to be linearly separable. C) It uses only the hinge loss without regularization. D) It cannot be kernelized.