




























































































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The CMLP exam certifies core competencies in machine learning. It covers supervised and unsupervised learning, model evaluation, feature selection, overfitting prevention, and deployment fundamentals. Candidates demonstrate practical understanding of building, testing, and maintaining machine learning solutions across diverse application domains.
Typology: Exams
1 / 111
This page cannot be seen from the preview
Don't miss anything!





























































































Question 1. Which data ingestion method is most appropriate for processing millions of sensor readings per second with minimal latency? A) Batch ETL scheduled nightly B) Micro-batch Spark Streaming C) Real‑time Kafka streaming D) Manual CSV upload Answer: C Explanation: Real‑time Kafka streaming can ingest high‑velocity data continuously with low latency, suitable for millions of sensor events per second, unlike batch or micro‑batch approaches. Question 2. In a data lake architecture for machine learning, which characteristic best differentiates it from a traditional data warehouse? A) Strict schema enforcement at write time B) Storage of raw, unstructured data alongside structured data C) Use of OLAP cubes for fast analytics D) Mandatory ACID transactions for all operations Answer: B Explanation: Data lakes store raw, unstructured, semi‑structured, and structured data without requiring a predefined schema, whereas warehouses enforce schemas and are optimized for structured queries. Question 3. When connecting to a semi‑structured data source containing nested JSON objects, which Spark feature is most useful for flattening the hierarchy for model training?
A) DataFrame.persist() B) spark.read.format("csv") C) spark.sql.functions.explode() D) broadcast joins Answer: C Explanation: The explode() function expands arrays or nested structures into separate rows, enabling flattening of JSON hierarchies for downstream processing. Question 4. Which ETL design pattern ensures that each transformation step can be independently re‑executed without re‑processing the entire pipeline? A) Monolithic pipeline B) Lambda architecture C) Incremental (delta) processing with checkpoints D) Full reload on each run Answer: C Explanation: Incremental processing with checkpoints records the state after each step, allowing isolated re‑execution of only the affected stages. Question 5. In data cleaning, which statistical technique is most appropriate for imputing missing values in a normally distributed numeric feature? A) Median imputation B) Mode imputation C) Mean imputation
Explanation: Capturing data version IDs and lineage ensures that the exact inputs can be retrieved, making experiments reproducible. Question 8. Which storage solution offers the lowest latency for serving feature vectors to an online inference service? A) Amazon S3 object storage B) HDFS distributed file system C) In‑memory key‑value store (e.g., Redis) D) Cold archival tape storage Answer: C Explanation: In‑memory stores like Redis provide sub‑millisecond latency, ideal for real‑time feature retrieval, unlike disk‑based or archival systems. Question 9. Which encryption method protects data both at rest and in transit for a cloud‑based ML data repository? A) AES‑256 for storage and TLS 1.2 for network traffic B) MD5 hashing for files C) Base64 encoding of data D) Plaintext storage with VPN Answer: A Explanation: AES‑256 encrypts data at rest, while TLS 1.2 secures data in transit; both together provide comprehensive protection.
Question 10. In IAM policy design, which principle minimizes the risk of unauthorized data access? A) Granting admin rights to all users B) Using role‑based access control with least privilege C) Sharing credentials across teams D) Disabling multi‑factor authentication Answer: B Explanation: Role‑based access with least‑privilege ensures users receive only the permissions necessary for their tasks, reducing exposure. Question 11. Which scaling technique transforms features to a range of 0 to 1 and is sensitive to outliers? A) Standardization (Z‑score) B) Min‑Max scaling C) Robust scaling D) Log transformation Answer: B Explanation: Min‑Max scaling maps values linearly to [0,1]; extreme outliers can compress the majority of data into a narrow range. Question 12. When a feature exhibits a heavy‑tailed distribution, which scaling method is most robust? A) Min‑Max scaling
Answer: C Explanation: Target encoding replaces categories with the mean target value, reducing dimensionality while preserving predictive signal for high‑cardinality features. Question 15. Which encoding method can unintentionally introduce ordinal relationships where none exist? A) One‑hot encoding B) Label encoding C) Frequency encoding D) Hashing trick Answer: B Explanation: Label encoding assigns integer values, implying order, which may mislead models that interpret numeric magnitude as ordinal. Question 16. When creating polynomial features of degree 3 for a numeric predictor, how many interaction terms are generated for two original features x₁ and x₂? A) 3 B) 4 C) 5 D) 6 Answer: D Explanation: Degree‑3 polynomial terms include x₁³, x₁²x₂, x₁x₂², x₂³, plus lower‑degree terms (x₁², x₁x₂, x₂²) and linear terms, totaling 6 interaction terms beyond the original features.
Question 17. In text preprocessing, which step is essential for reducing dimensionality while preserving semantic meaning in bag‑of‑words models? A) Stemming or lemmatization B) Adding HTML tags C) Converting to uppercase D) Random word shuffling Answer: A Explanation: Stemming/lemmatization reduces word variants to a common root, decreasing the vocabulary size without losing core meaning. Question 18. Which vectorization technique captures term importance across documents by weighting term frequency with inverse document frequency? A) CountVectorizer B) One‑hot encoding C) TF‑IDF D) Word2Vec Answer: C Explanation: TF‑IDF multiplies term frequency by inverse document frequency, emphasizing words that are frequent in a document but rare across the corpus. Question 19. Which dimensionality reduction algorithm is non‑linear and preserves local neighborhood structure, often used for visualizing high‑dimensional data?
D) Random forest importance Answer: C Explanation: RFE iteratively fits a model, eliminates the weakest features, and repeats until the desired number of features remains. Question 22. In a correlation matrix, a pair of features with a Pearson coefficient of 0. suggests what action? A) Keep both; they add unique information B) Remove one to reduce multicollinearity C) Transform both using log scaling D) Increase regularization Answer: B Explanation: A coefficient of 0.95 indicates strong linear dependence; retaining both can cause multicollinearity, so dropping one is advisable. Question 23. Which regression algorithm adds an L2 penalty to the loss function to shrink coefficients? A) Linear Regression B) Ridge Regression C) Lasso Regression D) Elastic Net Answer: B
Explanation: Ridge regression incorporates an L2 regularization term, penalizing large coefficients and reducing overfitting. Question 24. Which linear model is capable of performing both L1 and L2 regularization simultaneously? A) Ridge B) Lasso C) Elastic Net D) Bayesian Regression Answer: C Explanation: Elastic Net combines L1 (lasso) and L2 (ridge) penalties, offering a balance between feature selection and coefficient shrinkage. Question 25. In logistic regression, what does the sigmoid activation function output? A) Class label directly B) Probability between 0 and 1 C) Log‑odds D) Decision tree leaf Answer: B Explanation: The sigmoid function maps any real‑valued input to a value in (0,1), interpreted as the probability of the positive class.
B. Gradient Boosted Trees C. Bagging D. Stacking Answer: B Explanation: Gradient Boosting adds trees iteratively, each trained on the residual errors of the combined previous trees. Question 29. Which hyperparameter in XGBoost controls the depth of each tree and thus influences model complexity? A) learning_rate B) max_depth C) n_estimators D) subsample Answer: B Explanation: max_depth sets the maximum depth of each decision tree, directly affecting the capacity and risk of overfitting. Question 30. In K‑Means clustering, what does the “elbow method” help determine? A) Optimal number of clusters by locating a point where within‑cluster sum of squares decreases slowly B) Distance metric selection C) Initialization seed D) Data scaling technique
Answer: A Explanation: The elbow method plots SSE vs. k; the “elbow” point indicates diminishing returns, suggesting a suitable k. Question 31. Which clustering algorithm can discover arbitrarily shaped clusters and does not require specifying the number of clusters a priori? A) K‑Means B) Hierarchical Agglomerative Clustering C) DBSCAN D) Gaussian Mixture Models Answer: C Explanation: DBSCAN groups points based on density, handling irregular shapes and automatically determining the number of clusters. Question 32. In anomaly detection for credit‑card fraud, which unsupervised technique models normal behavior and flags deviations? A) One‑Class SVM B) Logistic Regression C) Decision Tree D) Naïve Bayes Answer: A
Question 35. Which activation function is preferred for hidden layers in deep networks because it avoids saturation for positive inputs? A) Sigmoid B) Tanh C) ReLU D) Softmax Answer: C Explanation: ReLU outputs zero for negative inputs and identity for positive, providing sparse activation and reducing saturation. Question 36. Which optimizer adapts learning rates per parameter based on first and second moments of gradients? A) Stochastic Gradient Descent (SGD) B) AdaGrad C) RMSprop D) Adam Answer: D Explanation: Adam combines momentum (first moment) and RMSprop‑like scaling (second moment) for adaptive learning rates. Question 37. When performing hyperparameter tuning with Grid Search, what is a major drawback compared to Bayesian Optimization? A) Requires prior knowledge of the search space
B) Explores fewer configurations C) Computationally expensive due to exhaustive enumeration D) Cannot handle categorical parameters Answer: C Explanation: Grid Search evaluates every combination, leading to high computational cost, whereas Bayesian Optimization intelligently selects promising points. Question 38. Which regularization technique randomly disables a proportion of neurons during each training iteration? A) L1 regularization B) L2 regularization C) Dropout D) Early stopping Answer: C Explanation: Dropout sets a random subset of activations to zero, preventing co‑adaptation and reducing overfitting. Question 39. In K‑Fold cross‑validation, how many distinct models are trained when K=5? A) 1 B) 3 C) 5 D) 10
Question 42. Which classification metric is most informative when the positive class is rare and false negatives are costly? A) Accuracy B) Precision C) Recall D) F1‑Score Answer: C Explanation: Recall measures the proportion of actual positives correctly identified, crucial when missing positives is expensive. Question 43. In a binary classifier, a ROC‑AUC of 0.5 indicates what level of performance? A) Perfect discrimination B) Good discrimination C) No discriminative ability (random guessing) D) Overfitting Answer: C Explanation: An AUC of 0.5 corresponds to the diagonal line, equivalent to random chance. Question 44. Which component of a confusion matrix directly reflects Type II errors? A) True Positives B) False Positives C) False Negatives
D) True Negatives Answer: C Explanation: False Negatives are instances where the model missed the positive class, representing Type II errors. Question 45. When a learning curve shows high training error and similarly high validation error, the model is likely suffering from: A) High bias (underfitting) B) High variance (overfitting) C) Data leakage D) Imbalanced classes Answer: A Explanation: Both errors being high indicates the model cannot capture underlying patterns, a bias problem. Question 46. Which model‑agnostic technique explains the contribution of each feature to a single prediction by approximating the model locally? A) SHAP values B) Global feature importance plot C) Confusion matrix D) ROC curve Answer: A