CS 7643 Quiz 2 | Actual Questions and Answers Latest Updated 2025/2026 (Graded A+) Georgia, Exams of Advanced Education

CS 7643 Quiz 2 | Actual Questions and Answers Latest Updated 2025/2026 (Graded A+) Georgia Institute of Technology

Typology: Exams

2025/2026

Available from 09/02/2025

Favorgrades
Favorgrades 🇺🇸

2.6

(5)

13K documents

1 / 10

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CS 7643 Quiz 2 | Actual Questions and Answers
Latest Updated 2025/2026 (Graded A+) Georgia
Institute of Technology
1. Which of the following are common issues while optimizing
the weights of a deep neural network? (Select all that apply)
A. Existence of local minima
B. Ill-conditioned loss surface
C. Noisy gradient estimates
D. Saddle points
Correct Answer: B, C, D
2. Which of the following is the advantage of Leaky ReLU
compared to ReLU?
A. There’s no saturation on the positive end
B. Its output is always positive
C. It is cheap to compute
D. There’s no “dead” neuron when computing gradients
Correct Answer: D
3. The vanishing gradient problem occurs primarily with:
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download CS 7643 Quiz 2 | Actual Questions and Answers Latest Updated 2025/2026 (Graded A+) Georgia and more Exams Advanced Education in PDF only on Docsity!

CS 7643 Quiz 2 | Actual Questions and Answers

Latest Updated 202 5 /202 6 (Graded A+) Georgia

Institute of Technology

1. Which of the following are common issues while optimizing the weights of a deep neural network? (Select all that apply) A. Existence of local minima B. Ill-conditioned loss surface C. Noisy gradient estimates D. Saddle points **Correct Answer: B, C, D

  1. Which of the following is the advantage of Leaky ReLU compared to ReLU?** A. There’s no saturation on the positive end B. Its output is always positive C. It is cheap to compute D. There’s no “dead” neuron when computing gradients **Correct Answer: D
  2. The vanishing gradient problem occurs primarily with:**

A. ReLU activations B. Tanh and Sigmoid activations C. Linear transformations D. Max pooling layers Correct Answer: B

4. Which of the following best describes batch normalization? A. Adds noise to gradients during backpropagation B. Normalizes inputs within a mini-batch to stabilize training C. Increases the learning rate dynamically D. Removes neurons to prevent overfitting **Correct Answer: B

  1. Why does stochastic gradient descent (SGD) often converge better than full batch gradient descent?** A. It guarantees global minima B. It uses second-order derivatives C. The noise helps escape saddle points and local minima D. It requires fewer epochs **Correct Answer: C
  2. What is the role of momentum in gradient descent?**

A. Ensure weights are all positive B. Keep the variance of activations consistent across layers C. Reduce training time by skipping normalization D. Initialize all weights at zero Correct Answer: B

10. Why are residual connections (ResNets) effective in deep architectures? A. They prevent underfitting B. They allow gradients to flow directly, mitigating vanishing gradients C. They reduce the number of parameters D. They remove the need for backpropagation **Correct Answer: B

  1. Which of the following is a drawback of using very deep networks?** A. Higher capacity for feature learning B. More prone to vanishing/exploding gradients C. Faster training time D. Less risk of overfitting Correct Answer: B

12. In dropout regularization, neurons are: A. Permanently removed from the network B. Randomly deactivated during training to prevent co- adaptation C. Replaced with noise during forward propagation D. Normalized across mini-batches **Correct Answer: B

  1. Which learning rate schedule gradually decreases the learning rate during training?** A. Step decay B. Exponential decay C. Cosine annealing D. All of the above **Correct Answer: D
  2. A flat loss surface near the optimum generally indicates:** A. Poor generalization B. Better generalization C. Higher overfitting D. Lower variance Correct Answer: B

18. Which of the following problems do ReLU activations help mitigate? A. Vanishing gradient B. Exploding gradient C. Overfitting D. Saddle points **Correct Answer: A

  1. The Hessian matrix is useful in optimization because it:** A. Measures gradient variance B. Provides second-order curvature information C. Guarantees global minima D. Eliminates saddle points **Correct Answer: B
  2. Why is Adam optimizer widely preferred in practice?** A. It always finds global optima B. It requires no hyperparameters C. It adapts learning rates individually per parameter using moment estimates D. It is slower but more stable Correct Answer: C

21. Which initialization is typically best for ReLU activations? A. Xavier (Glorot) B. He initialization C. Zero initialization D. Random small constants **Correct Answer: B

  1. Which of the following issues are caused by saddle points in deep networks?** A. Training stops prematurely B. Extremely slow convergence C. Oscillations in weight updates D. Poor initialization **Correct Answer: B
  2. Which technique injects noise during training to improve generalization?** A. Batch Normalization B. Weight Decay C. Dropout D. Xavier Initialization Correct Answer: C