Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

CS-7643 QUIZ 4 DEEP LEARNING OPTIMIZATION REGULARIZATION PRACTICE SCRIPT UPDATED 2026 TEST, Exams of Advanced Algorithms

Chamberlain College of Nursing Advanced Algorithms

CS-7643 QUIZ 4 DEEP LEARNING OPTIMIZATION REGULARIZATION PRACTICE SCRIPT UPDATED 2026 TESTED SOLUTIONS

Typology: Exams

2025/2026

Available from 01/27/2026

alcorbgeneralstore 🇺🇸

29K documents

1 / 12

This page cannot be seen from the preview

Don't miss anything!

CS-7643 QUIZ 4 DEEP LEARNING

OPTIMIZATION REGULARIZATION

PRACTICE SCRIPT UPDATED 2026 TESTED

SOLUTIONS

⫸ Graph Embedding Answer: Optimize the objective that connected

nodes have more similar embeddings than unconnected nodes.

Task: convert nodes to vectors

- effectively unsupervised learning where nearest neighbors are similar

- these learned vectors are useful for downstream tasks

⫸ Multi-layer Perceptron (MLP) pain points for NLP Answer: - Cannot

easily support variable-sized sequences as inputs or outputs

- No inherent temporal structure

- No practical way of holding state

- The size of the network grows with the maximum allowed size of the

input or output sequences

⫸ Truncated Backpropagation through time Answer: - Only

backpropagate a RNN through T time steps

Discover Exams of Advanced Algorithms Chamberlain College of Nursing

Partial preview of the text

Download CS-7643 QUIZ 4 DEEP LEARNING OPTIMIZATION REGULARIZATION PRACTICE SCRIPT UPDATED 2026 TEST and more Exams Advanced Algorithms in PDF only on Docsity!

CS-7643 QUIZ 4 DEEP LEARNING

OPTIMIZATION REGULARIZATION

PRACTICE SCRIPT UPDATED 2026 TESTED

SOLUTIONS

⫸ Graph Embedding Answer: Optimize the objective that connected nodes have more similar embeddings than unconnected nodes. Task: convert nodes to vectors

effectively unsupervised learning where nearest neighbors are similar
these learned vectors are useful for downstream tasks ⫸ Multi-layer Perceptron (MLP) pain points for NLP Answer: - Cannot easily support variable-sized sequences as inputs or outputs
No inherent temporal structure
No practical way of holding state
The size of the network grows with the maximum allowed size of the input or output sequences ⫸ Truncated Backpropagation through time Answer: - Only backpropagate a RNN through T time steps

⫸ Recurrent Neural Networks (RNN) Answer: h(t) = activation(Uinput + Vh(t-1) + bias) y(t) = activation(W*h(t) + bias)

activation is typically the logistic function or tanh
outputs can also simply be h(t)
family of NN architectures for modeling sequences ⫸ Training Vanilla RNN's difficulties Answer: - Vanishing gradients
Since dx(t)/dx(t-1) = w^t
if w > 1: exploding gradients
if w < 1: vanishing gradients ⫸ Long Short-Term Memory Network Gates and States Answer: - f(t) = forget gate
i(t) = input gate
u(t) = candidate update gate
o(t) = output gate
c(t) = cell state
c(t) = f(t) * c(t - 1) + i(t) * u(t)
h(t) = hidden state
h(t) = o(t) * tanh(c(t))

L(dist) = CE b/w student and teacher predictions L(student) = CE b/w predicted output and actual L = alpha * L(dist) + beta * L(student) Advantages:

may work well b/c of soft predictions of teacher model
if we don't have enough labeled text we can still train student model to align predictions ⫸ Collobert and Weston Vector Idea Answer: a word and its context is a positive training sample; a random word in that sample context gives a negative training sample ⫸ Word2vec Overview Answer: Word2vec - a framework for learning word vector Idea:
we have a large corpus of text
every word in. fixed vocabulary represented by a vector
Go through each position t in the text, which has a center word c and context words o
Use the similarity of the word vectors for c and o to calculate the probability of o given c (or vice versa)
Keep adjusting the word vectors to maximize this probability

⫸ Word2vec Variants Answer: Skip-Gram: Predict context words given center word Continuous Bag of Words: Predict center word from (bag of) context words ⫸ Word2vec Objective Function Answer: - product over all possible center words

product over all words in the context window
P( w(t+j) | w(t); theta )
J(theta) = - 1 / T * log (L) ⫸ Word2vec P( w(t+j) | w(t) ) Answer: - Two sets of vectors for each word in vocabulary

u(w) for when w is the center word
v(o) for when w is a context word P( w(t+j) | w(t) ) = softmax( u(wt) * v(wt+j) ) ⫸ Word2vec Expensive to Compute Solutions Answer: 1. Hierarchical Softmax
Negative Sampling

Nearest Neighbors are semantically meaningful ⫸ Graph Embeddings Loss Function Answer: - Margin loss between the score of an edge f(e) and a negative sampled edge f(e')
Negative sampled edges are constructed by taking real edge and replacing either the source or destination vertex with a random node
the score of an edge f(e) is a similarity (dot product) between the source embedding and a transformed version of the destination embedding
f(e) = cos( theta(s) , theta(d) + theta(r) ) ⫸ Graph Embedding is Slow: Reason and Solution Answer: - Training time dominated by computing scores for "fake edges"
Corrupt a sub-batch of edges with the same set of random nodes ⫸ Debiasing word2vec Answer: - identify gender subspace with gendered words
project all words onto this subspace
subtract those projections from the original word Problem: Not that effective and bias pervades the word embedding space ⫸ t-SNE things to remember Answer: 1. Run until it stabilizes

Set perplexity b/w 2 and N

perplexity loosely measures # neighbors
balances b/w local and global aspects of nodes

Re-run t-SNE multiple times to ensure we get the same shape ⫸ t-SNE general concept Answer: - Maps inputs from high dimensional space to lower dimensions for visualization

recursively moves similar points closer and distance points further
expands dense clusters and contracts sparse cluster ⫸ Teacher Forcing Answer: - next input to model is not predicted value, but the actual value from the training data
allows model to train effectively even if a mistake was made
if used instead of hidden-to-hidden recurrence nodes, can allow for parallelization, but model becomes less powerful
emerges from MLE
issues may arise if network is later going to be used in "closed-loop" mode where output is fed back as input ⫸ Skip-Gram Model: Loss/Objective Function Answer: Loss - for each position t, we try to predict the context words within a fixed window size given some context word
multiple these probabilities to get a likelihood

choose a distribution that samples less frequent words likely ⫸ Word Embeddings as a graph Answer: - each word is a node with edge connections to context words ⫸ Pytorch Big Graph: Idea Answer: - Start with multi-relation graph (different edge types that encode different relations)
minimize margin loss b/w an edge score and a negative sampled edge
a negative sample edge is found by taking. areal edge and replacing either the source or destination node
negative sampling is a bottleneck since there are many more negative edges than real edges ⫸ Pytorch Big Graph: Edge Scores Answer: - f(e) = cos(theta_s, theta_d

theta_r)

theta_s = source vertex
theta_d = destination vertex
theta_r = relation vector ⫸ Structured Representation: State, Neighborhood, Propagation of Info Answer: State - compactly represents all data we've seen (nodes) Neighborhood - What other elements to incorporate/how two nodes are connected (edges)

Propagation: how to combine structured data to get new state/vector representations ⫸ Non-Local Neural Network Answer: - Allows it to learn it's own connectivity pattern

Does so in Data dependent way
it's called non-local because you don't have a specific local receptive field
y = 1/(c) * sum(f(x_i, x_j)g_j))
f is similarity function
g encodes the features of similarity ⫸ Skip Gram Model: What is conditioned on what? Answer: - Probability of a context word given a center word ⫸ RNNs and LSTMs, how their update rules differ, and what they problems each have or solve (vanishing and/or exploding gradients) Answer: - Solving Vanishing Gradients: The way it does so is by creating an internal memory state which is simply added to the processed input, which greatly reduces the multiplicative effect of small gradients. The time dependence and effects of previous inputs are controlled by an interesting concept called a forget gate, which determines which states are remembered or forgotten.
Input gate determines the extent to which the current timestamp input should be used

CS-7643 QUIZ 4 DEEP LEARNING OPTIMIZATION REGULARIZATION PRACTICE SCRIPT UPDATED 2026 TEST, Exams of Advanced Algorithms

Related documents

Partial preview of the text

Download CS-7643 QUIZ 4 DEEP LEARNING OPTIMIZATION REGULARIZATION PRACTICE SCRIPT UPDATED 2026 TEST and more Exams Advanced Algorithms in PDF only on Docsity!

CS-7643 QUIZ 4 DEEP LEARNING

OPTIMIZATION REGULARIZATION

PRACTICE SCRIPT UPDATED 2026 TESTED

SOLUTIONS