CS-7643 QUIZ 4 DEEP LEARNING OPTIMIZATION REGULARIZATION FINAL STUDY GUIDE 2026 SOLVED QUE, Exams of Advanced Algorithms

CS-7643 QUIZ 4 DEEP LEARNING OPTIMIZATION REGULARIZATION FINAL STUDY GUIDE 2026 SOLVED QUESTIONS FULLY CORRECT

Typology: Exams

2025/2026

Available from 01/27/2026

alcorbgeneralstore
alcorbgeneralstore 🇺🇸

29K documents

1 / 14

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CS-7643 QUIZ 4 DEEP LEARNING
OPTIMIZATION REGULARIZATION FINAL
STUDY GUIDE 2026 SOLVED QUESTIONS
FULLY CORRECT
RL: Evaluative Feedback. Answer: - Pick an action, receive a reward
- No supervision for what the correct action is or would have been
(unlike supervised learning)
RL: Sequential Decisions. Answer: - Plan and execution actions over
a sequence of states
- Reward may be delayed, requiring optimization of future rewards
(long-term planning)
Signature Challenges in RL. Answer: Evaluative Feedback: Need
trial and error to find the right action
Delayed Feedback: Actions may not lead to immediate reward
Non-stationarity: Data distribution of visited states changes when the
policy changes
Fleeting Nature: of online data (may only see data once)
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe

Partial preview of the text

Download CS-7643 QUIZ 4 DEEP LEARNING OPTIMIZATION REGULARIZATION FINAL STUDY GUIDE 2026 SOLVED QUE and more Exams Advanced Algorithms in PDF only on Docsity!

CS-7643 QUIZ 4 DEEP LEARNING

OPTIMIZATION REGULARIZATION FINAL

STUDY GUIDE 2026 SOLVED QUESTIONS

FULLY CORRECT

⫸ RL: Evaluative Feedback. Answer: - Pick an action, receive a reward

  • No supervision for what the correct action is or would have been (unlike supervised learning) ⫸ RL: Sequential Decisions. Answer: - Plan and execution actions over a sequence of states
  • Reward may be delayed, requiring optimization of future rewards (long-term planning) ⫸ Signature Challenges in RL. Answer: Evaluative Feedback: Need trial and error to find the right action Delayed Feedback: Actions may not lead to immediate reward Non-stationarity: Data distribution of visited states changes when the policy changes Fleeting Nature: of online data (may only see data once)

⫸ MDP. Answer: Framework underlying RL S: Set of states A: Set of actions R: Distribution of Rewards T: Transition probabiliity y: Discount property Markov Property: Current state completely characterizes state of the environment ⫸ RL: Equations relating optimal quantities. Answer: 1. V(S) = max_a(Q(s, a)

  1. PI(s) = argmax_a(Q(s, a) ⫸ V(S). Answer: max_a (sum_(s') { p(s'|s, a) [r(s, a) + yV(s')] } ) ⫸ Q(s,a). Answer: sum_(s') { p(s'|s, a) [r(s, a) + ymax_(a'){Q*(s', a') ] } ⫸ Value Iteration. Answer: v_(i+1) = max_a (sum_(s') { p(s'|s, a) [r(s, a) + yV_(i)(s')] } )
  • repeat until convergence
  • Time complexity per iteration O(|S^2| |A|)
  • Experience Replay Addresses this: --> store (s, a, s', r) pairs and continually update episodes (older samples discarded) --> Train Q-Network on random mini batches of transitions from the replay memory instead of consecutive examples --> larger the buffer, lower the correlation ⫸ Experience Replay. Answer: - store (s, a, s', r) pairs and continually update episodes (older samples discarded)
  • Train Q-Network on random mini batches of transitions from the replay memory instead of consecutive examples
  • larger the buffer, lower the correlation ⫸ Value Based RL Methods Learn, Model Based Methods Learn. Answer: - Q-Functions
  • Transition and reward function ⫸ Parameterized Policy. Answer: theta* = argmax_theta( E [ sum( R(s, a) ) ] ) ⫸ REINFORCE Algorithm. Answer: 1. Gather data using current policy
  • Sample trajectories t by acting according to PI
  1. Compute the gradient update
  • sum(delta_theta * pi_theta(a_t | s_t)) * sum(R(s_t, a_t))
  1. Update Policy Parameters
  • theta = theta + alpha*(policy gradient) ⫸ Drawbacks of Policy Gradients. Answer: - Increases or decreases gradient based on entire sequence of actions --> Can't determine which action was good or bad --> Credit assignent problem --> This suffers from high variance and unstable learning
  • How to reduce variance? --> subtracting an action independent baseline from the reward preserves the mean of the gradient expectation while possibly reducing the variance
  • Different Choices of baseline result in different variants of Policy Gradient Algorithm ⫸ Actor-Critic. Answer: - Replaces rewards with Q_(PI_theta)(s, a)
  • E[delta_theta * log_pi_theta(a | s) (Q_(PI_theta)(s, a))

⫸ Pseudo-labeling for Unlabeled Data (Semi-Supervised Learning). Answer: - Learn a model on labeled training data

  • Feed unlabeled example through model
  • Take the most confident predictions and convert to pseudo label
  • Put these pseudo labels into dataset and retrain --> these will be more noisy than human labels ⫸ Cross-View/Augmentation & Consistency. Answer: - Take an unlabeled example and make weakly and strongly augmented data
  • Use weakly-augment an image and get a pseudo-label
  • Strongly-augment an image and make a prediction
  • train these predictions on the labels from the weakly augmented data Idea:
  • Weak augmentation isn't so severe that the pseudo-labels are bad
  • Using strong augmentation to make the NN learn better feature representations ⫸ Meta-Learning (Few-Shot Learning). Answer: - Learning to learn
  • Learn NN initialization that after it perform SGD steps on small amounts of labeled data, you learn an effective initialization ⫸ Surrogate Tasks (Self-Supervised Learning). Answer: - Identify loss functions for tasks we don't care about, but allow us to learn good feature representations

⫸ Multi-View Pseudo-Labeling Key Details for Success. Answer: - Pseudo-labeling without augmentation isn't very effective --> need good data augmentation algos

  • Doing this in multiple stages isn't as good as end to end
  • Large unlabeled batch sizes are necessary for the labels to be good
  • Confidence threshold is very important --> too big = not getting many labels, too small = many noisy examples
  • Cosine learning rate schedules
  • Inference with exponential moving average of weights --> stabilizes training and improves performance ⫸ Other Methods for Semi-Supervised Learning. Answer: MixMatch/ReMixMatch: More complex variations prior to FIxMatch
  • Temperature scaling and entropy minimization to stabilize training
  • Multiple augmentations and ensemblier to improve pseudo-labels Virtual Adversarial Training: Augmentation through adversarial examples Mean Teach: Student/teacher distillation consistency method with exponential moving average
  • Set up a set of smaller tasks that better mirror what will happen during testing
  • This amounts to having a bunch of N-Way tests
  • Can also optionally pre-train features on held-out base classes ⫸ Approaches to Meta-Training. Answer: 1. MatchingNet:
  • Cosine distance of features between support and query set
  1. ProtoNet
  • Extract features from support and query set
  • Take the mean of the features of the support set
  • compare each query to the mean of the features (euclidean distance)
  1. RelationNet
  • Same as ProtoNet, but using a different distance function
  • Relation Module learns how to relate in a more complicated manner than Cosine Similarity or Euclidean Distance ⫸ Meta-Learning Approaches. Answer: 1. Take inspiration from a known learning algorithm
  • KNN/kernel machine: Matching network
  • Gaussian classifier: Prototypical Networks
  • Gradients Descent: Meta-Learner LSTM, Model-Agnostic Meta- Learning MAML
  1. Derive it from a black box NN
  • MANN
  • SNAIL ⫸ Meta-Learning to Learn Gradient Descent. Answer: Strategy 1: Output:
  1. Parameter Initialization
  2. Meta-learner that decides how to update parameters Strategy 2: Learn just an initialization and use normal gradient descent Output: Just parameter initialization ⫸ Meta-Learner LSTM. Answer: - Gradient descent update rule is similar to LSTM cell state update rule. GD: theta_t = theta_(t-1) - alpha * dl/d theta_(t-1) LSTM: c_t = f_t X c_(t-1) + i_t X c_t(tilda) C_t: equivalent to the parameters we want to learn (c_0 is the learned initialization) C_t(tilda): equivalente to the negative gradient
  1. Trivial Parameterizations ⫸ Surrogate Tasks. Answer: - Tasks where we can get the label for free and enable us to learn a good feature representation ⫸ Evaluation Surrogate Tasks. Answer: - Extract the ConvNet (encoder part)
  • Transfer to actual task --> use it to initialize the model of another supervised learning task --> Us it to extract features for learning a separate classifier --> Often classifier is limited to linear layer and features are frozen - > interested in how good feature representations are (no need to add another confounding effect) ⫸ Contrastive Loss. Answer: Dot product between augmentation 1 and positive & negative examples ⫸ Instance Discrimination. Answer: - Take an image and perform two augmentations
  • Take a set of negative examples and perform augmentations
  • Try to minimize contrastive loss so that features of the positive examples are more similar than features of the negative examples ⫸ Memory Bank Approach. Answer: - To make sampling negative examples easy (without needing to do feature extraction again)
  • Have a queue that stores features of negative examples
  • Features may be stale (since encoder weights have been updated) ⫸ Momentum Encoder. Answer: - Uses moving average of parameters for the key encoder
  • want key encoder to move slowly so when we put them in memory bank, they won't be as different
  • gradient only flows to query encoder ⫸ Space and time complexity for MDP algs. Answer: Value Iteration- O(|S|^2•|A|) ⫸