CS 7643 Quiz 4 – Concepts|Actual 2026 Update with complete solutions|Georgia Institute Of, Exams of Nursing

CS 7643 Quiz 4 – Concepts|Actual 2026 Update with complete solutions|Georgia Institute Of Technology CS 7643 Quiz 4 – Concepts|Notes. Used this to pass in quiz 4

Typology: Exams

2025/2026

Available from 02/23/2026

StudyPlug
StudyPlug 🇺🇸

5

(3)

19K documents

1 / 13

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1 | P a g e
CS 7643 Quiz 4 – Concepts|Actual 2026 Update
with complete solutions|Georgia Institute Of
Technology CS 7643 Quiz 4 – Concepts|Notes.
Used this to pass in quiz 4
Covers Structured Representations (Lesson 11), Language Models (Lesson 12), and
Embeddings (Lesson 13).
Conceptual Questions:
RNNs and LSTMs, how their update rules differ, and what
problems they each have or solve
RNN – Recurrent NNs, are designed to model sequences
Many to Many: Input a sequence -> output a sequence ; aka sequence
CS7643 QUIZ 4 REAL
pf3
pf4
pf5
pf8
pf9
pfa
pfd

Partial preview of the text

Download CS 7643 Quiz 4 – Concepts|Actual 2026 Update with complete solutions|Georgia Institute Of and more Exams Nursing in PDF only on Docsity!

CS 7643 Quiz 4 – Concepts|Actual 2026 Update

with complete solutions|Georgia Institute Of

Technology CS 7643 Quiz 4 – Concepts|Notes.

Used this to pass in quiz 4

Covers Structured Representations (Lesson 11), Language Models (Lesson 12), and Embeddings (Lesson 13). Conceptual Questions:

  • RNNs and LSTMs, how their update rules differ, and what problems they each have or solve RNN – Recurrent NNs, are designed to model sequences
  • Many to Many: Input a sequence -> output a sequence ; aka sequence

transduction. Other setups know as Encoder - Decoder OCR – given an image of a text, split that up into individual characters and try to recognize each one.

  • (^) Many to one : sequence as input, one output Sentiment Analysis – given a piece of text, classify if the author was feeling positive or negative when writing.

One to many : one input, sequence as output (eg. Image captioning model) One to one : no sequence involved, typical regression problems. RNNs solve the problems MLP (Multilayer Perceptron) have when used to model sequences.

Problem of the Vanilla RNN: Vanishing gradients Solution : LSTM architecture LSTM (Long Short-Term Memory) introduces the concept of gates – taking parts of the input to the cell, and multiply them together.

Update rule for LSTM for c_t its update has an additive element to take care of the vanishing gradients problem. Conditional language models and how to train them (teacher/student forcing), language metrics (how to calculate them), how knowledge distillation works Conditional Language Models

Per-word cross-entropy is the average of cross-entropy for all words in the sequence. Perplexity – geometric mean of the inverse probability of a sequence of words. As evaluation metric – the lower the perplexity, the better our model is. The perplexity of a discreet uniform distribution over K events is K (Coin toss has a perplexity of 2, fair die toss has a perplexity of 6). Training Feed the words one by one – after each step, project into high dim. space, turn into a probability distribution and calculate the loss using cross-entropy. Compute the overall loss when the whole sentence has been fed as the average of the losses for each word, and do backpropagation.

Teacher forcing : at the following time step we input the actual word present in the training data not the previous prediction. Allows the model to keep learning effectively even if it made a mistake previously. Knowledge distillation : The teacher model works well but is too slow or expensive to run. Student and Teacher will both make a prediction. We still use the target to compute the Student loss. We’ll encourage the Student model to align its (soft) predictions to those of the Teacher (distillation loss) – tells us that we also want to rank wolf and fox

Word2vec Probability equation Intrinsic/extrinsic evaluation

  • (^) What t-SNE is and how it conceptually works t-SNE: widely used method for visualization that is itself constructing embedding in low dimensions (2-dim). Graph embeddings didn’t know anything about misinformation but the pages are marked green and close together.