




























































































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
A comprehensive set of multiple-choice questions and answers covering various aspects of linear and non-linear regression models in machine learning. it delves into key concepts such as linear regression assumptions, polynomial regression, regularization techniques (ridge and lasso), and dimensionality reduction methods (pca, t-sne). The questions assess understanding of model selection, feature engineering, and the interpretation of regression results. the educational value lies in its ability to test and reinforce knowledge of fundamental machine learning principles.
Typology: Exams
1 / 199
This page cannot be seen from the preview
Don't miss anything!





























































































Q1. In linear regression, the relationship between the dependent and independent variables is assumed to be: A) Exponential B) Linear C) Logarithmic D) Polynomial Answer: B) Linear Explanation: Linear regression assumes a linear relationship between the dependent variable and one or more independent variables. Q2. Which of the following is a key difference between linear regression and polynomial regression? A) Polynomial regression can model non-linear relationships B) Linear regression can handle multiple variables C) Polynomial regression uses L1 regularization D) Linear regression is a type of classification algorithm Answer: A) Polynomial regression can model non-linear relationships Explanation: Polynomial regression extends linear regression by introducing polynomial terms, allowing it to capture non-linear relationships. Q3. What is the primary purpose of adding polynomial terms in regression? A) To reduce overfitting B) To capture non-linear patterns C) To simplify the model D) To increase bias Answer: B) To capture non-linear patterns Explanation: Adding polynomial terms allows the regression model to fit non-linear relationships between the independent and dependent variables. Q4. In linear regression, what method is commonly used to estimate the coefficients? A) Gradient Boosting B) Maximum Likelihood Estimation C) Ordinary Least Squares D) Support Vector Machines Answer: C) Ordinary Least Squares Explanation: Ordinary Least Squares (OLS) is a common method used to estimate the coefficients in linear regression by minimizing the sum of squared residuals.
Q5. Which assumption is NOT required for the classical linear regression model? A) Linearity B) Homoscedasticity C) Independence of errors D) Non-linearity Answer: D) Non-linearity Explanation: The classical linear regression model assumes linearity, homoscedasticity, independence of errors, and normality of error terms, not non-linearity.
Q6. What is the main purpose of regularization in regression models? A) To increase model complexity B) To prevent overfitting C) To improve training speed D) To handle missing data Answer: B) To prevent overfitting Explanation: Regularization adds a penalty to the loss function to constrain the model's complexity, thereby preventing overfitting. Q7. Which regularization technique adds the squared magnitude of coefficients as a penalty term? A) Lasso B) Ridge C) Elastic Net D) Dropout Answer: B) Ridge Explanation: Ridge regression adds the squared magnitude of coefficients (L2 penalty) to the loss function. Q8. Lasso regularization can perform which additional feature compared to Ridge? A) Handle multicollinearity B) Feature selection by setting some coefficients to zero C) Increase model bias D) Reduce training time Answer: B) Feature selection by setting some coefficients to zero Explanation: Lasso (L1 regularization) can shrink some coefficients to exactly zero, effectively performing feature selection.
Explanation: Dimensionality reduction techniques aim to decrease the number of variables under consideration, retaining the most important information. Q13. Principal Component Analysis (PCA) is primarily used for: A) Feature selection B) Feature extraction C) Classification D) Clustering Answer: B) Feature extraction Explanation: PCA is a dimensionality reduction technique that transforms the original features into a new set of uncorrelated features called principal components. Q14. Which of the following techniques is supervised dimensionality reduction? A) PCA B) t-SNE C) Linear Discriminant Analysis D) Autoencoders Answer: C) Linear Discriminant Analysis Explanation: Linear Discriminant Analysis (LDA) is supervised as it takes class labels into account while reducing dimensionality. Q15. What is a potential downside of using PCA for dimensionality reduction? A) It increases computational complexity B) It may not preserve class separability C) It cannot handle numerical data D) It always reduces accuracy Answer: B) It may not preserve class separability Explanation: PCA focuses on maximizing variance and may not preserve the discriminative information necessary for class separability.
Q16. Which of the following is a non-linear regression model? A) Linear Regression B) Support Vector Regression C) Logistic Regression D) Naive Bayes
Answer: B) Support Vector Regression Explanation: Support Vector Regression (SVR) can model non-linear relationships using kernel functions. Q17. Decision Trees can model non-linear relationships because: A) They use linear decision boundaries B) They partition the feature space into regions C) They require polynomial features D) They use kernel functions Answer: B) They partition the feature space into regions Explanation: Decision Trees split the feature space into distinct regions, allowing them to model complex, non-linear relationships. Q18. In Support Vector Regression, what is the role of the kernel function? A) To reduce overfitting B) To transform data into a higher-dimensional space C) To normalize the data D) To select features Answer: B) To transform data into a higher-dimensional space Explanation: The kernel function in SVR maps input features into a higher-dimensional space to handle non-linear relationships. Q19. Which non-linear regression model is based on ensemble learning? A) Random Forest B) Linear Regression C) K-Nearest Neighbors D) PCA Answer: A) Random Forest Explanation: Random Forest is an ensemble learning method that builds multiple decision trees to model non-linear relationships. Q20. What is a key advantage of using tree-based regression models? A) They require feature scaling B) They cannot handle categorical variables C) They automatically capture feature interactions D) They are sensitive to outliers Answer: C) They automatically capture feature interactions
A) Filter methods are supervised; wrapper methods are unsupervised B) Filter methods evaluate features independently of the model; wrapper methods use the model's performance C) Wrapper methods are faster than filter methods D) Wrapper methods select features based on correlation; filter methods do not Answer: B) Filter methods evaluate features independently of the model; wrapper methods use the model's performance Explanation: Filter methods assess the relevance of features based on statistical measures without involving a specific model, whereas wrapper methods select features based on the performance of a specific model. Q25. Which technique combines both feature selection and dimensionality reduction? A) Elastic Net B) PCA C) Random Forest D) None of the above Answer: D) None of the above Explanation: Typically, feature selection and dimensionality reduction are distinct processes. Techniques like Elastic Net combine regularization methods but do not inherently perform both feature selection and dimensionality reduction.
Q26. In the context of regularized linear models, what is the "bias-variance tradeoff"? A) Balancing model complexity and training speed B) Balancing the tradeoff between underfitting and overfitting C) Balancing the number of features and samples D) Balancing computational resources and accuracy Answer: B) Balancing the tradeoff between underfitting and overfitting Explanation: The bias-variance tradeoff involves finding a balance between a model's ability to generalize (low variance) and its ability to accurately capture the underlying relationship (low bias). Q27. Which of the following statements is true about Ridge and Lasso regression? A) Ridge can set coefficients to zero, Lasso cannot B) Lasso can set coefficients to zero, Ridge cannot C) Both Ridge and Lasso can set coefficients to zero D) Neither Ridge nor Lasso can set coefficients to zero Answer: B) Lasso can set coefficients to zero, Ridge cannot
Explanation: Lasso (L1) regularization can shrink some coefficients to exactly zero, performing feature selection, whereas Ridge (L2) only shrinks coefficients without setting them to zero. Q28. What is the effect of using a very high degree in polynomial regression? A) Underfitting B) Overfitting C) No effect on the model D) Improved generalization Answer: B) Overfitting Explanation: Using a very high degree in polynomial regression can lead to overfitting, where the model captures noise in the training data instead of the underlying pattern. Q29. Which metric is commonly used to evaluate the performance of regression models? A) Accuracy B) Precision C) Mean Squared Error D) F1 Score Answer: C) Mean Squared Error Explanation: Mean Squared Error (MSE) is a common metric for evaluating regression models, measuring the average squared difference between predicted and actual values. Q30. In regularized regression, what does the term "shrinkage" refer to? A) Reducing the number of features B) Decreasing the magnitude of coefficients C) Increasing the model complexity D) Removing outliers Answer: B) Decreasing the magnitude of coefficients Explanation: Shrinkage refers to reducing the size of the regression coefficients, which helps in preventing overfitting.
Q31. Which cross-validation technique is most suitable for time-series data? A) K-Fold B) Leave-One-Out
C) It is only used for non-linear models D) It does not consider the number of features Answer: B) It accounts for the number of predictors in the model Explanation: Adjusted R² adjusts for the number of predictors, providing a more accurate measure when comparing models with different numbers of features.
Q36. In Lasso regression, as the regularization parameter λ increases, what happens to the number of non-zero coefficients? A) It increases B) It decreases C) It remains the same D) It first increases then decreases Answer: B) It decreases Explanation: As λ increases in Lasso regression, more coefficients are shrunk to zero, reducing the number of non-zero coefficients. Q37. Which of the following is a common method to select the optimal regularization parameter? A) Grid Search with Cross-Validation B) Principal Component Analysis C) Recursive Feature Elimination D) Gradient Descent Answer: A) Grid Search with Cross-Validation Explanation: Grid Search combined with Cross-Validation is commonly used to find the optimal value of the regularization parameter by evaluating model performance across a range of λ values. Q38. What does the Elastic Net regularization combine? A) L1 and L2 penalties B) L2 and L3 penalties C) L1 and L3 penalties D) Only L1 penalties Answer: A) L1 and L2 penalties Explanation: Elastic Net combines both L1 (Lasso) and L2 (Ridge) regularization penalties.
Q39. Which regression technique is most suitable when there are more predictors than observations? A) Ordinary Least Squares B) Ridge Regression C) Decision Trees D) K-Nearest Neighbors Answer: B) Ridge Regression Explanation: Ridge Regression can handle situations where the number of predictors exceeds the number of observations by imposing regularization to prevent overfitting. Q40. What is the main advantage of using Lasso over Ridge regression? A) It is computationally faster B) It can perform feature selection C) It always provides a better fit D) It does not require feature scaling Answer: B) It can perform feature selection Explanation: Lasso can set some coefficients to zero, effectively performing feature selection, unlike Ridge regression which only shrinks coefficients.
Q41. What problem arises when independent variables in a regression model are highly correlated? A) Heteroscedasticity B) Multicollinearity C) Autocorrelation D) Homoscedasticity Answer: B) Multicollinearity Explanation: Multicollinearity refers to high correlations among independent variables, which can cause instability in coefficient estimates. Q42. Which technique can be used to address multicollinearity? A) Increasing the number of predictors B) Removing one of the correlated predictors C) Using a linear regression without regularization D) Applying K-Means clustering Answer: B) Removing one of the correlated predictors
Answer: D) All of the above Explanation: SVR can use various kernels, including linear, polynomial, and RBF, to model different types of relationships. Q47. In decision trees, what criterion is commonly used to decide where to split the data? A) Information Gain B) Gini Impurity C) Mean Squared Error D) All of the above Answer: D) All of the above Explanation: Depending on the type of decision tree (classification or regression), criteria like Information Gain, Gini Impurity, and Mean Squared Error are used to determine the best splits. Q48. What is the main difference between regression trees and classification trees? A) Regression trees predict continuous values; classification trees predict discrete classes B) Regression trees use entropy; classification trees use variance C) Regression trees cannot handle categorical data D) Classification trees require regularization Answer: A) Regression trees predict continuous values; classification trees predict discrete classes Explanation: Regression trees are used for predicting continuous outcomes, while classification trees are used for predicting categorical classes. Q49. Which ensemble method typically improves the performance of decision trees by reducing variance? A) Bagging B) Boosting C) Stacking D) All of the above Answer: A) Bagging Explanation: Bagging (Bootstrap Aggregating) reduces variance by averaging multiple decision trees trained on different subsets of the data. Q50. Gradient Boosting primarily focuses on: A) Reducing bias B) Reducing variance
C) Increasing model complexity D) Feature selection Answer: A) Reducing bias Explanation: Gradient Boosting builds models sequentially, each correcting the errors of the previous one, thereby reducing bias and improving predictive accuracy.
Q51. What is the effect of increasing the degree of a polynomial in polynomial regression? A) It makes the model simpler B) It increases the model's flexibility C) It decreases the model's ability to fit the training data D) It has no effect on the model Answer: B) It increases the model's flexibility Explanation: Higher-degree polynomials allow the model to fit more complex patterns in the data, increasing flexibility but also the risk of overfitting. Q52. Which of the following is a sign of overfitting in a regression model? A) High training error and low test error B) Low training error and high test error C) Both training and test errors are high D) Both training and test errors are low Answer: B) Low training error and high test error Explanation: Overfitting occurs when a model performs well on training data but poorly on unseen test data. Q53. Which technique can help improve a model's generalization performance? A) Adding more features B) Reducing the size of the training data C) Using regularization D) Increasing the number of epochs without validation Answer: C) Using regularization Explanation: Regularization techniques constrain the model's complexity, helping to improve generalization to new data. Q54. What is the purpose of using a validation curve in model evaluation?
Q58. Which regularization technique can be interpreted as adding a prior in Bayesian regression? A) Ridge B) Lasso C) Elastic Net D) None of the above Answer: A) Ridge Explanation: Ridge regression can be interpreted as imposing a Gaussian prior on the coefficients in a Bayesian framework. Q59. In the context of optimization, what is the purpose of the learning rate? A) To determine the size of the steps taken towards the minimum B) To decide how many features to select C) To set the regularization strength D) To normalize the data Answer: A) To determine the size of the steps taken towards the minimum Explanation: The learning rate controls how much the model parameters are updated during each iteration of optimization. Q60. What problem can occur if the learning rate is set too high in Gradient Descent? A) Slow convergence B) Getting stuck in local minima C) Overshooting the minimum D) No impact on the optimization Answer: C) Overshooting the minimum Explanation: A too high learning rate can cause the algorithm to overshoot the minimum, preventing convergence.
Q61. Which of the following is NOT a typical application of regression models? A) Predicting house prices B) Classifying emails as spam C) Estimating a company's future revenue D) Forecasting stock prices Answer: B) Classifying emails as spam
Explanation: Classifying emails as spam is a classification task, not a regression task. Q62. When dealing with high-dimensional data, which technique is commonly used before applying regression? A) Increasing the number of predictors B) Dimensionality reduction or feature selection C) Ignoring the problem D) Using a simple linear regression model Answer: B) Dimensionality reduction or feature selection Explanation: High-dimensional data can lead to overfitting and computational challenges, so dimensionality reduction or feature selection is often performed first. Q63. What is heteroscedasticity in regression analysis? A) When errors have constant variance B) When errors have non-constant variance C) When predictors are highly correlated D) When the model is linear Answer: B) When errors have non-constant variance Explanation: Heteroscedasticity refers to the situation where the variance of errors varies across observations, violating one of the OLS assumptions. Q64. How can heteroscedasticity be detected in a regression model? A) By plotting residuals vs. fitted values B) By checking the R² value C) By calculating the mean of the predictors D) By performing feature scaling Answer: A) By plotting residuals vs. fitted values Explanation: Plotting residuals against fitted values can reveal patterns indicating heteroscedasticity. Q65. Which transformation can be applied to stabilize variance in the presence of heteroscedasticity? A) Log transformation B) Polynomial transformation C) One-hot encoding D) Feature scaling Answer: A) Log transformation
C) The model has low variance D) The predictors are uncorrelated Answer: B) There is autocorrelation Explanation: Violating the independence of errors often implies the presence of autocorrelation, especially in time-series data. Q70. What is the purpose of using confidence intervals in regression analysis? A) To predict new values B) To measure the uncertainty around the estimated coefficients C) To perform feature selection D) To scale the features Answer: B) To measure the uncertainty around the estimated coefficients Explanation: Confidence intervals provide a range within which the true value of the regression coefficients is expected to lie with a certain level of confidence.
Q771. What is the primary objective of the K-Means clustering algorithm? A) To maximize the distance between clusters B) To minimize the within-cluster variance C) To build a hierarchical tree of clusters D) To model the data distribution using Gaussian distributions Answer: B) To minimize the within-cluster variance Explanation: K-Means clustering aims to partition the data into K clusters by minimizing the sum of squared distances between data points and their respective cluster centroids, thereby minimizing within-cluster variance. Q72. In K-Means clustering, how is the number of clusters (K) typically determined? A) It is always set to 2 B) Using the Elbow Method C) By maximizing the silhouette score only D) K is determined automatically by the algorithm Answer: B) Using the Elbow Method Explanation: The Elbow Method involves plotting the within-cluster sum of squares against different K values and selecting the K at the "elbow" point where the rate of decrease sharply slows.
Q73. Which distance metric is commonly used in K-Means clustering? A) Manhattan distance B) Euclidean distance C) Cosine similarity D) Hamming distance Answer: B) Euclidean distance Explanation: K-Means typically uses Euclidean distance to measure the similarity between data points and cluster centroids. Q74. What is a key difference between K-Means and Hierarchical Clustering? A) K-Means builds a tree of clusters, Hierarchical does not B) Hierarchical clustering requires the number of clusters to be specified upfront C) K-Means partitions data into flat clusters, while Hierarchical builds a hierarchy D) K-Means can only handle numerical data, Hierarchical can handle categorical data Answer: C) K-Means partitions data into flat clusters, while Hierarchical builds a hierarchy Explanation: K-Means creates a flat partition of the data into K clusters, whereas Hierarchical Clustering builds a tree-like structure of nested clusters without requiring a predefined number of clusters. Q75. In Hierarchical Agglomerative Clustering (HAC), what is the initial step? A) Each data point starts as its own cluster B) All data points start in a single cluster C) Randomly assign data points to clusters D) Use K-Means to initialize clusters Answer: A) Each data point starts as its own cluster Explanation: In HAC, the algorithm begins with each data point as an individual cluster and iteratively merges the closest pairs of clusters until the desired number of clusters is reached. Q76. Which linkage criterion in Hierarchical Clustering considers the maximum distance between points in two clusters? A) Single linkage B) Complete linkage C) Average linkage D) Ward’s linkage Answer: B) Complete linkage Explanation: Complete linkage considers the maximum distance between any two points in the clusters being merged, leading to more compact clusters.