Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

For each uploaded document

Answer questions

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

CS-7643 QUIZ 4 DEEP LEARNING OPTIMIZATION REGULARIZATION FINAL STUDY GUIDE 2026 SOLVED QUE, Exams of Advanced Algorithms

Chamberlain College of Nursing Advanced Algorithms

CS-7643 QUIZ 4 DEEP LEARNING OPTIMIZATION REGULARIZATION FINAL STUDY GUIDE 2026 SOLVED QUESTIONS FULLY CORRECT

Typology: Exams

2025/2026

Available from 01/27/2026

alcorbgeneralstore 🇺🇸

29K documents

1 / 14

This page cannot be seen from the preview

Don't miss anything!

bg1

CS-7643 QUIZ 4 DEEP LEARNING

OPTIMIZATION REGULARIZATION FINAL

STUDY GUIDE 2026 SOLVED QUESTIONS

FULLY CORRECT

⫸ RL: Evaluative Feedback. Answer: - Pick an action, receive a reward

- No supervision for what the correct action is or would have been

(unlike supervised learning)

⫸ RL: Sequential Decisions. Answer: - Plan and execution actions over

a sequence of states

- Reward may be delayed, requiring optimization of future rewards

(long-term planning)

⫸ Signature Challenges in RL. Answer: Evaluative Feedback: Need

trial and error to find the right action

Delayed Feedback: Actions may not lead to immediate reward

Non-stationarity: Data distribution of visited states changes when the

policy changes

Fleeting Nature: of online data (may only see data once)

pf3

pf4

pf5

pf8

pf9

pfa

pfd

pfe

Discover Exams of Advanced Algorithms Chamberlain College of Nursing

Related documents

CS-7643 QUIZ 4 DEEP LEARNING OPTIMIZATION REGULARIZATION ACTUAL EXAMINATION 2026 QUESTIONS

CS-7643 Quiz 4 Exam – Deep Learning Optimization & Regularization Study Guide

CS-7643 Quiz 4 Exam – Deep Learning Optimization & Regularization Study Guide

CS-7643 Quiz 4 Exam – Deep Learning Optimization & Regularization Study Guide

CS-7643 Quiz 4 Exam – Deep Learning Optimization & Regularization Study Guide

CS-7643 Quiz 4 Exam – Deep Learning Optimization & Regularization Study Guide

CS-7643 Quiz 4 Exam – Deep Learning Optimization & Regularization Study Guide

CS-7643 Quiz 4 Exam – Deep Learning Optimization & Regularization Study Guide

CS-7643 QUIZ 4 DEEP LEARNING OPTIMIZATION REGULARIZATION PRACTICE SCRIPT UPDATED 2026 TEST

CS-7643 QUIZ 4 DEEP LEARNING OPTIMIZATION REGULARIZATION CERTIFICATION REVIEW SET 2026 ANS

CS-7643 QUIZ 4 DEEP LEARNING OPTIMIZATION REGULARIZATION COMPREHENSIVE TEST PAPER 2026 COM

CS-7643 Quiz 4 Exam – Deep Learning Optimization & Regularization Study Guide

Partial preview of the text

Download CS-7643 QUIZ 4 DEEP LEARNING OPTIMIZATION REGULARIZATION FINAL STUDY GUIDE 2026 SOLVED QUE and more Exams Advanced Algorithms in PDF only on Docsity!

CS-7643 QUIZ 4 DEEP LEARNING

OPTIMIZATION REGULARIZATION FINAL

STUDY GUIDE 2026 SOLVED QUESTIONS

FULLY CORRECT

⫸ RL: Evaluative Feedback. Answer: - Pick an action, receive a reward

No supervision for what the correct action is or would have been (unlike supervised learning) ⫸ RL: Sequential Decisions. Answer: - Plan and execution actions over a sequence of states
Reward may be delayed, requiring optimization of future rewards (long-term planning) ⫸ Signature Challenges in RL. Answer: Evaluative Feedback: Need trial and error to find the right action Delayed Feedback: Actions may not lead to immediate reward Non-stationarity: Data distribution of visited states changes when the policy changes Fleeting Nature: of online data (may only see data once)

⫸ MDP. Answer: Framework underlying RL S: Set of states A: Set of actions R: Distribution of Rewards T: Transition probabiliity y: Discount property Markov Property: Current state completely characterizes state of the environment ⫸ RL: Equations relating optimal quantities. Answer: 1. V(S) = max_a(Q(s, a)

PI(s) = argmax_a(Q(s, a) ⫸ V(S). Answer: max_a (sum_(s') { p(s'|s, a) [r(s, a) + yV(s')] } ) ⫸ Q(s,a). Answer: sum_(s') { p(s'|s, a) [r(s, a) + ymax_(a'){Q*(s', a') ] } ⫸ Value Iteration. Answer: v_(i+1) = max_a (sum_(s') { p(s'|s, a) [r(s, a) + yV_(i)(s')] } )

repeat until convergence
Time complexity per iteration O(|S^2| |A|)

Experience Replay Addresses this: --> store (s, a, s', r) pairs and continually update episodes (older samples discarded) --> Train Q-Network on random mini batches of transitions from the replay memory instead of consecutive examples --> larger the buffer, lower the correlation ⫸ Experience Replay. Answer: - store (s, a, s', r) pairs and continually update episodes (older samples discarded)
Train Q-Network on random mini batches of transitions from the replay memory instead of consecutive examples
larger the buffer, lower the correlation ⫸ Value Based RL Methods Learn, Model Based Methods Learn. Answer: - Q-Functions
Transition and reward function ⫸ Parameterized Policy. Answer: theta* = argmax_theta( E [ sum( R(s, a) ) ] ) ⫸ REINFORCE Algorithm. Answer: 1. Gather data using current policy
Sample trajectories t by acting according to PI

Compute the gradient update

sum(delta_theta * pi_theta(a_t | s_t)) * sum(R(s_t, a_t))

Update Policy Parameters

theta = theta + alpha*(policy gradient) ⫸ Drawbacks of Policy Gradients. Answer: - Increases or decreases gradient based on entire sequence of actions --> Can't determine which action was good or bad --> Credit assignent problem --> This suffers from high variance and unstable learning
How to reduce variance? --> subtracting an action independent baseline from the reward preserves the mean of the gradient expectation while possibly reducing the variance
Different Choices of baseline result in different variants of Policy Gradient Algorithm ⫸ Actor-Critic. Answer: - Replaces rewards with Q_(PI_theta)(s, a)
E[delta_theta * log_pi_theta(a | s) (Q_(PI_theta)(s, a))

⫸ Pseudo-labeling for Unlabeled Data (Semi-Supervised Learning). Answer: - Learn a model on labeled training data

Feed unlabeled example through model
Take the most confident predictions and convert to pseudo label
Put these pseudo labels into dataset and retrain --> these will be more noisy than human labels ⫸ Cross-View/Augmentation & Consistency. Answer: - Take an unlabeled example and make weakly and strongly augmented data
Use weakly-augment an image and get a pseudo-label
Strongly-augment an image and make a prediction
train these predictions on the labels from the weakly augmented data Idea:
Weak augmentation isn't so severe that the pseudo-labels are bad
Using strong augmentation to make the NN learn better feature representations ⫸ Meta-Learning (Few-Shot Learning). Answer: - Learning to learn
Learn NN initialization that after it perform SGD steps on small amounts of labeled data, you learn an effective initialization ⫸ Surrogate Tasks (Self-Supervised Learning). Answer: - Identify loss functions for tasks we don't care about, but allow us to learn good feature representations

⫸ Multi-View Pseudo-Labeling Key Details for Success. Answer: - Pseudo-labeling without augmentation isn't very effective --> need good data augmentation algos

Doing this in multiple stages isn't as good as end to end
Large unlabeled batch sizes are necessary for the labels to be good
Confidence threshold is very important --> too big = not getting many labels, too small = many noisy examples
Cosine learning rate schedules
Inference with exponential moving average of weights --> stabilizes training and improves performance ⫸ Other Methods for Semi-Supervised Learning. Answer: MixMatch/ReMixMatch: More complex variations prior to FIxMatch
Temperature scaling and entropy minimization to stabilize training
Multiple augmentations and ensemblier to improve pseudo-labels Virtual Adversarial Training: Augmentation through adversarial examples Mean Teach: Student/teacher distillation consistency method with exponential moving average

Set up a set of smaller tasks that better mirror what will happen during testing
This amounts to having a bunch of N-Way tests
Can also optionally pre-train features on held-out base classes ⫸ Approaches to Meta-Training. Answer: 1. MatchingNet:
Cosine distance of features between support and query set

ProtoNet

Extract features from support and query set
Take the mean of the features of the support set
compare each query to the mean of the features (euclidean distance)

RelationNet

Same as ProtoNet, but using a different distance function
Relation Module learns how to relate in a more complicated manner than Cosine Similarity or Euclidean Distance ⫸ Meta-Learning Approaches. Answer: 1. Take inspiration from a known learning algorithm
KNN/kernel machine: Matching network
Gaussian classifier: Prototypical Networks

Gradients Descent: Meta-Learner LSTM, Model-Agnostic Meta- Learning MAML

Derive it from a black box NN

MANN
SNAIL ⫸ Meta-Learning to Learn Gradient Descent. Answer: Strategy 1: Output:

Parameter Initialization
Meta-learner that decides how to update parameters Strategy 2: Learn just an initialization and use normal gradient descent Output: Just parameter initialization ⫸ Meta-Learner LSTM. Answer: - Gradient descent update rule is similar to LSTM cell state update rule. GD: theta_t = theta_(t-1) - alpha * dl/d theta_(t-1) LSTM: c_t = f_t X c_(t-1) + i_t X c_t(tilda) C_t: equivalent to the parameters we want to learn (c_0 is the learned initialization) C_t(tilda): equivalente to the negative gradient

Trivial Parameterizations ⫸ Surrogate Tasks. Answer: - Tasks where we can get the label for free and enable us to learn a good feature representation ⫸ Evaluation Surrogate Tasks. Answer: - Extract the ConvNet (encoder part)

Transfer to actual task --> use it to initialize the model of another supervised learning task --> Us it to extract features for learning a separate classifier --> Often classifier is limited to linear layer and features are frozen - > interested in how good feature representations are (no need to add another confounding effect) ⫸ Contrastive Loss. Answer: Dot product between augmentation 1 and positive & negative examples ⫸ Instance Discrimination. Answer: - Take an image and perform two augmentations
Take a set of negative examples and perform augmentations
Try to minimize contrastive loss so that features of the positive examples are more similar than features of the negative examples ⫸ Memory Bank Approach. Answer: - To make sampling negative examples easy (without needing to do feature extraction again)

Have a queue that stores features of negative examples
Features may be stale (since encoder weights have been updated) ⫸ Momentum Encoder. Answer: - Uses moving average of parameters for the key encoder
want key encoder to move slowly so when we put them in memory bank, they won't be as different
gradient only flows to query encoder ⫸ Space and time complexity for MDP algs. Answer: Value Iteration- O(|S|^2•|A|) ⫸