




























































































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
A set of multiple-choice questions and answers related to data science and machine learning concepts. It covers topics such as data types, data structures, data preprocessing, linear regression, regularization, logistic regression, k-nearest neighbors (knn), support vector machines (svm), decision trees, clustering, and dimensionality reduction techniques like pca and t-sne. The questions are designed to test understanding of key concepts and their applications in data analysis and model building, making it a valuable resource for students and practitioners in the field.
Typology: Exams
1 / 167
This page cannot be seen from the preview
Don't miss anything!





























































































Question 1. Which of the following is an example of a nominal data type? A) Temperature in Celsius B) Gender C) Height in centimeters D) Income Answer: B Explanation: Nominal data is categorical with no inherent order. Gender is a classic example. Question 2. What distinguishes ordinal data from nominal data? A) Ordinal data has an inherent order B) Ordinal data is always numeric
C) Nominal data can be ranked D) Nominal data has units Answer: A Explanation: Ordinal data are categorical with a logical order, unlike nominal data. Question 3. Which variable is continuous? A) Eye color B) Number of children C) Weight D) Brand of car Answer: C
A) Matrix B) Vector C) Data frame D) Tensor Answer: B Explanation: A vector (or array) is optimal for a single column of values. Question 6. What is a key difference between a matrix and a data frame? A) Matrices store only numeric data, data frames can store mixed types B) Data frames are always larger C) Matrices can only be two-dimensional D) Data frames are only available in R
Answer: A Explanation: Matrices are numeric; data frames can include different data types per column. Question 7. Which function in pandas is used to read a CSV file? A) read_table() B) read_csv() C) load_csv() D) import_csv() Answer: B Explanation: read_csv() is the standard pandas function for CSV files. Question 8. What does the pandas .info() method display?
Explanation: MCAR stands for Missing Completely At Random. Question 10. What is the first step in handling missing data? A) Imputation B) Identification C) Deletion D) Scaling Answer: B Explanation: You must identify missing values before handling them. Question 11. Which imputation method is best for categorical data? A) Mean B) Median
C) Mode D) Regression Answer: C Explanation: Mode imputation is appropriate for categorical data. Question 12. What is KNN imputation? A) Replacing missing values with the mean B) Predicting missing values using similar data points C) Dropping all rows with missing data D) Using the last observed value Answer: B Explanation: KNN imputation finds similar records to estimate missing values.
B) Z-score C) Min-Max scaling D) PCA Answer: B Explanation: Z-score measures how far a value deviates from the mean. Question 15. What does a box plot help visualize? A) Data skewness B) Outliers and quartiles C) Correlations D) Data types Answer: B Explanation: Box plots show quartiles, medians, and outliers.
Question 16. What is capping/winsorization? A) Removing missing values B) Trimming data extremes C) Replacing outliers with boundary values D) Scaling values Answer: C Explanation: Winsorization replaces extreme values with specified percentiles. Question 17. What is normalization in data preprocessing? A) Removing outliers B) Scaling values to [0,1] range
Question 19. Why use log transformation on data? A) To normalize data B) To reduce skewness C) To handle categorical variables D) To encode missing values Answer: B Explanation: Log transformation helps reduce skewness in highly skewed data. Question 20. Which is NOT an assumption of linear regression? A) Linearity B) Homoscedasticity
C) Independence D) Non-linearity Answer: D Explanation: Linear regression assumes a linear relationship. Question 21. What does the R2 value in linear regression represent? A) The slope B) The intercept C) Variance explained by the model D) The error Answer: C Explanation: R2 indicates the proportion of variance explained by the model.
C) Improve data quality D) Reduce variance Answer: B Explanation: Regularization discourages overly complex models. Question 24. Polynomial features are used in regression to: A) Encode categorical data B) Model non-linear relationships C) Standardize data D) Detect outliers Answer: B Explanation: Polynomial features enable linear models to capture non- linear patterns.
Question 25. What is the purpose of the sigmoid function in logistic regression? A) Normalize data B) Map outputs to probabilities C) Detect outliers D) Encode labels Answer: B Explanation: The sigmoid function maps real values to [0,1] for probability interpretation. Question 26. In K-Nearest Neighbors (KNN), what does ‘K’ represent? A) Number of features
Explanation: Cosine similarity is less common for KNN classification. Question 28. What is a kernel trick in SVM? A) Data normalization B) Transforming data to higher dimensions C) Scaling features D) Regularization Answer: B Explanation: The kernel trick allows SVMs to find non-linear boundaries. Question 29. Which splitting criterion is used in decision trees for classification? A) Mean squared error
B) Gini impurity C) R2 score D) Ridge penalty Answer: B Explanation: Gini impurity is commonly used to measure node purity in classification trees. Question 30. What does MSE stand for in regression metrics? A) Mean Standard Error B) Mean Squared Error C) Median Squared Error D) Maximum Squared Error Answer: B