Download CS 7643 Master Exam Bank | Quiz 1,2& 3 (Questions & Answers) | 2025/2026 Update | 100% Cor and more Exams Advanced Education in PDF only on Docsity!
CS 7643 Master Exam Bank | Quiz 1 ,2& 3
(Questions & Answers) | 2025/ 2026 Update |
100% Correct – Georgia Tech
CS 7643 Quiz 1 | Questions and Answers
1. Which of the following is False about parametric models? A. Softmax Regression Model is one of them B. The number of parameters is associated with the number of data, not the dimension of data features C. They try to model a function D. Can return probability score per class, with labels acquired via argmax function **Correct Answer: B
- Which of the following is a non-parametric model?** A. Neural Networks B. Naïve Bayes C. Logistic Regression D. K-NN Correct Answer: D
3. Which of the following is a key drawback of non-parametric models such as K-NN? A. They assume a fixed number of parameters B. They cannot handle nonlinear boundaries C. They require storing large amounts of training data for prediction D. They are less flexible in modeling complex functions **Correct Answer: C
- In Logistic Regression, the decision boundary is defined by:** A. A nonlinear function of the input features B. A linear combination of input features passed through a sigmoid C. A similarity measure based on Euclidean distance D. A hierarchical tree-based partitioning of data **Correct Answer: B
- The main difference between Maximum Likelihood Estimation (MLE) and Maximum A Posteriori (MAP) is:** A. MLE incorporates priors, MAP does not B. MAP incorporates priors, MLE does not C. Both incorporate priors but in different ways D. Neither uses priors, they are identical **Correct Answer: B
- Which of the following is** not a property of Naïve Bayes?
D. Absolute Error Loss Correct Answer: C
10. In supervised learning, labels are: A. Generated from unsupervised clusters B. Predicted from input features C. Not required for training D. Always binary **Correct Answer: B
- The bias-variance tradeoff describes:** A. The choice between training time and test time B. Balancing model complexity against interpretability C. How underfitting and overfitting relate to model error D. The optimization speed of gradient descent **Correct Answer: C
- Which algorithm finds the hyperplane that maximizes margin between two classes?** A. Decision Trees B. Support Vector Machines (SVM) C. Naïve Bayes D. K-Means Correct Answer: B
13. L1 regularization (Lasso) encourages: A. Large weights B. Sparse weights (many zeros) C. Smooth decision boundaries D. Nonlinear interactions **Correct Answer: B
- In K-means clustering, K represents:** A. Number of features B. Number of clusters C. Number of nearest neighbors D. Number of iterations **Correct Answer: B
- Which of the following is a convex function?** A. Sigmoid activation B. Mean Squared Error loss C. ReLU activation D. Cosine similarity **Correct Answer: B
- Which optimizer adapts learning rates per parameter using first and second moment estimates?** A. SGD B. Adam
20. Which of the following metrics is most appropriate for imbalanced datasets? A. Accuracy B. Precision, Recall, and F1-score C. Mean Squared Error D. Log-Likelihood **Correct Answer: B
- Softmax outputs are constrained to:** A. Negative real numbers B. Binary outputs only C. Probabilities that sum to 1 D. Sparse vectors with zeros **Correct Answer: C
- A decision tree with unlimited depth is most likely to:** A. Underfit B. Overfit C. Be optimal D. Be equivalent to linear regression **Correct Answer: B
- Which of the following is true about Principal Component Analysis (PCA)?**
A. It increases the number of features B. It is supervised C. It reduces dimensionality by projecting onto directions of maximum variance D. It maximizes classification accuracy Correct Answer: C
24. In a confusion matrix, false negatives correspond to: A. Predicted positive, actual negative B. Predicted negative, actual positive C. Predicted positive, actual positive D. Predicted negative, actual negative **Correct Answer: B
- Which machine learning paradigm is used in reinforcement learning?** A. Labeled data B. Reward-based learning via interaction with environment C. Clustering of unlabeled data D. Linear classification boundaries **Correct Answer: B
- Which of the following best explains the universal approximation theorem for neural networks?**
3. The vanishing gradient problem occurs primarily with: A. ReLU activations B. Tanh and Sigmoid activations C. Linear transformations D. Max pooling layers **Correct Answer: B
- Which of the following best describes batch normalization?** A. Adds noise to gradients during backpropagation B. Normalizes inputs within a mini-batch to stabilize training C. Increases the learning rate dynamically D. Removes neurons to prevent overfitting **Correct Answer: B
- Why does stochastic gradient descent (SGD) often converge better than full batch gradient descent?** A. It guarantees global minima B. It uses second-order derivatives C. The noise helps escape saddle points and local minima D. It requires fewer epochs **Correct Answer: C
- What is the role of momentum in gradient descent?** A. To reduce variance in gradients B. To accumulate past gradients to speed up convergence and escape
local minima C. To slow down convergence for stability D. To normalize the learning rate Correct Answer: B
7. Which optimizer combines momentum with adaptive learning rates per parameter? A. RMSProp B. AdaGrad C. Adam D. SGD **Correct Answer: C
- Exploding gradients are most commonly addressed by:** A. Weight initialization B. Gradient clipping C. Dropout D. Adding regularization **Correct Answer: B
- Xavier/Glorot initialization is designed to:** A. Ensure weights are all positive B. Keep the variance of activations consistent across layers C. Reduce training time by skipping normalization
13. Which learning rate schedule gradually decreases the learning rate during training? A. Step decay B. Exponential decay C. Cosine annealing D. All of the above **Correct Answer: D
- A flat loss surface near the optimum generally indicates:** A. Poor generalization B. Better generalization C. Higher overfitting D. Lower variance **Correct Answer: B
- Gradient noise scale increases when:** A. Mini-batch size decreases B. Learning rate decreases C. Number of layers decreases D. Regularization increases **Correct Answer: A
- Which of the following is true about early stopping?** A. It prevents underfitting B. It monitors validation error and stops when performance worsens
C. It ensures training loss always decreases D. It requires large learning rates Correct Answer: B
17. What is the primary purpose of weight decay (L2 regularization)? A. Reduce gradient variance B. Penalize large weights to avoid overfitting C. Increase convergence speed D. Eliminate vanishing gradients **Correct Answer: B
- Which of the following problems do ReLU activations help mitigate?** A. Vanishing gradient B. Exploding gradient C. Overfitting D. Saddle points **Correct Answer: A
- The Hessian matrix is useful in optimization because it:** A. Measures gradient variance B. Provides second-order curvature information C. Guarantees global minima D. Eliminates saddle points Correct Answer: B
23. Which technique injects noise during training to improve generalization? A. Batch Normalization B. Weight Decay C. Dropout D. Xavier Initialization **Correct Answer: C
- Vanishing gradients are most problematic when:** A. The network is shallow B. The network is very deep with sigmoid/tanh activations C. The learning rate is too high D. Dropout is used **Correct Answer: B
- Which learning strategy uses a small initial learning rate that increases for a few epochs before decaying?** A. Step decay B. Warm restarts C. Learning rate warm-up D. Cyclical learning rate **Correct Answer: C
- Which of the following best explains gradient clipping?**
A. Scaling gradients to avoid vanishing values B. Bounding gradient magnitude to avoid exploding values C. Forcing gradients to zero at saddle points D. Normalizing gradients across layers Correct Answer: B CS 7643 Quiz 3 | Questions and Answers | 2025 Update
1. The source of error where the neural network may not generalize (e.g., overfitting due to finite data) to the testing set is known as: A. Estimation Error B. Optimization Error C. Modeling Error D. Testing Error **Correct Answer: A. Estimation Error
- Select the following trends of errors that occur as a neural network grows in complexity:** A. Modeling error decreases B. Optimization error decreases C. Modeling error increases D. Estimation error increases Correct Answer: A, D
6. Which error component represents underfitting due to limited model capacity? A. Estimation Error B. Optimization Error C. Modeling Error D. Approximation Error **Correct Answer: C
- If a model achieves low training error but high test error, it is suffering from:** A. Underfitting B. Overfitting C. High bias D. Poor optimization **Correct Answer: B
- As the size of the training set increases, estimation error generally:** A. Increases B. Decreases C. Remains constant D. Becomes zero **Correct Answer: B
- Which of the following is most closely related to variance in the bias-variance tradeoff?**
A. Estimation Error B. Optimization Error C. Modeling Error D. Approximation Error Correct Answer: A
10. Training error typically: A. Is always larger than test error B. Decreases with model complexity C. Increases with model complexity D. Remains constant regardless of model **Correct Answer: B
- Which of the following is true about test error?** A. It always decreases with model complexity B. It decreases initially, then increases due to overfitting C. It equals training error in all cases D. It depends only on optimization error **Correct Answer: B
- The approximation error of a model corresponds to:** A. Estimation Error B. Error due to limited model expressiveness C. Random sampling error