This is the first homework, Exams of Computer Science

This is the first homework This is the first homework

Typology: Exams

2025/2026

Uploaded on 01/22/2026

reuben-addison
reuben-addison šŸ‡ŗšŸ‡ø

1 document

1 / 11

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CS7643: Deep Learning
Assignment 1
Instructor: Zsolt Kira
Deadline: Feb 2nd 2026, 8:00am ET
• This assignment is due on the date/time posted on Canvas. We will
have a 48-hour grace period for this assignment. However, no questions
about the assignment are answered during the grace period in any way.
• Discussion is encouraged, but each student must write his/her own
answers and explicitly mention any collaborators.
• Each student is expected to respect and follow the GT Honor Code.
We will apply anti-cheating software to check for plagiarism.
Anyone who is flagged by the software will automatically receive 0 for
homework and be reported to OSI.
• Please do not change the filenames and function definitions in
the skeleton code provided, as this will cause the test scripts to fail and
you will not receive points in those failed tests. You may also NOT
change the import modules in each file or import additional modules.
• It is your responsibility to ensure that all code and other deliverables
are in the correct format and that your submission compiles and runs.
We will not manually check your code (this is not feasible given the
class size). Thus, non-runnable code in our test environment
will directly lead to a score of 0. Also, your entire programming
parts will NOT be graded and given a 0 score if your code prints out
anything that is not asked in each question.
Theory Problem Set
1. In problem set 0, we derived the gradient of the log-sum-exp function
(Q10). Now we will consider a similar function - the softmax function
1
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download This is the first homework and more Exams Computer Science in PDF only on Docsity!

CS7643: Deep Learning

Assignment 1

Instructor: Zsolt Kira

Deadline: Feb 2nd 2026, 8:00am ET

  • This assignment is due on the date/time posted on Canvas. We will have a 48-hour grace period for this assignment. However, no questions about the assignment are answered during the grace period in any way.
  • Discussion is encouraged, but each student must write his/her own answers and explicitly mention any collaborators.
  • Each student is expected to respect and follow the GT Honor Code. We will apply anti-cheating software to check for plagiarism. Anyone who is flagged by the software will automatically receive 0 for homework and be reported to OSI.
  • Please do not change the filenames and function definitions in the skeleton code provided, as this will cause the test scripts to fail and you will not receive points in those failed tests. You may also NOT change the import modules in each file or import additional modules.
  • It is your responsibility to ensure that all code and other deliverables are in the correct format and that your submission compiles and runs. We will not manually check your code (this is not feasible given the class size). Thus, non-runnable code in our test environment will directly lead to a score of 0. Also, your entire programming parts will NOT be graded and given a 0 score if your code prints out anything that is not asked in each question.

Theory Problem Set

  1. In problem set 0, we derived the gradient of the log-sum-exp function (Q10). Now we will consider a similar function - the softmax function

s(z), which takes a vector input z and outputs a vector whose ith entry si is

si =

ezi āˆ‘ k e zk^ (1)

The input vector z to s(Ā·) is sometimes called the ā€œlogitsā€, which just means the unscaled output of previous layers. Derive the gradient of s with respect to the logits, i.e. derive āˆ‚ āˆ‚sz. Consider re-using your work from PS0.

  1. Implement AND and OR for pairs of binary inputs using a single linear threshold neuron with weights w ∈ R^2 , bias b ∈ R, and x ∈ { 0 , 1 }^2 :

f (x) =

1 if wT^ x + b ≄ 0 0 if wT^ x + b < 0

That is, find wAND and bAND such that

x 1 x 2 fAND(x) 0 0 0 0 1 0 1 0 0 1 1 1

Also find wOR and bOR such that

x 1 x 2 fOR(x) 0 0 0 0 1 1 1 0 1 1 1 1

  1. Consider the XOR function

x 1 x 2 fXOR(x) 0 0 0 0 1 1 1 0 1 1 1 0

Questions for this paper:

  • The traditional view of optimization in deep learning (and often in general) is that we are searching the space of weights to find the best ones. In other words, learning is a search problem. How would you view the above paper from the perspective of search?
  • One of the key aspects of deep learning is that given a parameter- ized function, we can find weights to represent any function if it has sufficient depth and complexity. What does this paper say about the representational power of architectures given a fixed method for deter- mining weights? Does the method for determining the weights matter? Do you think these two have equal representational power? Why or why not?

Paper Choice 2:

The second paper is again one that questions conventional wisdom and shows that large neural networks can actually fit random labels, that is labels that remain fixed but, for example, have no relationship to what is actually in the image. The paper title is ā€œUnderstanding deep learning requires rethinking generalizationā€ and can be found here.

Questions for this paper:

  • If neural networks can ā€œmemorizeā€ the data, which is the only thing they can do for random label assignments that don’t correlate with patterns in the data, why do you think neural networks learn more meaningful, generalizable representations when there are meaningful patterns in the data?
  • How does this finding align or not align with your understanding of machine learning and generalization?

Coding: Implement and train a network on

MNIST

Overview

Deep Neural Networks are becoming more and more popular and widely applied to many ML-related domains. In this assignment, you will complete a

simple pipeline of training neural networks to recognize MNIST Handwritten Digits: http://yann.lecun.com/exdb/mnist/. You will implement two neural network architectures along with the code to load data, train and optimize these networks. You will also run different experiments on your model to complete a short report. Be sure to use the report template that we give you and fill in your information on the first page. The main.py contains the main logic of this assignment. You can execute it by invoking the following command where the yaml file contains all the hyper-parameters.

$ python main.py --config configs/<name_of_config_file>.yaml

There are three pre-defined config files under ./configs. Two of them are default hyperparameters for models that you will implement in the assign- ment (Softmax Regression and 2-layer MLP). The correctness of your im- plementation is partially judged by the model performance on these default hyper-parameters; therefore, do NOT modify values in these config files. The third config file, config_exp.yaml, is used for your hyper-parameter tuning experiments (details in Section 5) and you are free to modify values of the hyper-parameters in this file.

The script trains a model with the number of epochs specified in the config file. At the end of each epoch, the script evaluates the model on the validation set. After the training completes, the script finally evaluates the best model on the test data.

Python and dependencies

In this assignment, we will work with Python 3. If you do not have a python distribution installed yet, we recommend installing Anaconda: https://www.ana- conda.com/ (or miniconda) with Python 3. We provide environment.yaml which contains a list of libraries needed to set environment for this as- signment. You can use it to create a copy of conda environment. Refer to the users’ manual: https://docs.conda.io/projects/conda/en/latest/user- guide/tasks/manage-environments.html for more details.

$ conda env create -f environment.yaml

If you already have your own Python development environment, please refer to this file to find necessary libraries, which is used to set the same coding/grading environment.

1.1 Data Preparation

To avoid the choice of hyper-parameters overfits the training data, it is a common practice to split the training dataset into the actual training data and validation data and perform hyper-parameter tuning based on results on validation data. Additionally, in deep learning, training data is often forwarded to models in batches for faster training time and noise reduction.

In our pipeline, we first load the entire MNIST data into the system, followed by a training/validation split on the training set. We simply use the first 80% of the training set as our training data and use the rest training set as our validation data. We also want to organize our data (training, validation, and test) in batches and use different combination of batches in different epochs for training data. Therefore, your tasks are as follows:

(a) follow the instruction in code to complete load_mnist_trainval in ./utils.py for training/validation split

(b) follow the instruction in code to complete generate_batched_data in ./utils.py to organize data in batches You can test your data loading code by running: $ python -m unittest tests.test_loading

2 Model Implementation

You will now implement two networks from scratch: a simple softmax re- gression and a two-layer multi-layer perceptron (MLP). Definitions of these classes can be found in ./models.

Weights of each model will be randomly initialized upon construction and stored in a weight dictionary. Meanwhile, a corresponding gradient dictio- nary is also created and initialized to zeros. Each model only has one public method called forward, which takes input of batched data and correspond- ing labels and returns the loss and accuracy of the batch. Meanwhile, it computes gradients of all weights of the model (even though the method is called forward!) based on the training batch.

2.1 Utility Function

There are a few useful methods defined in ./models/_base_network.py that can be shared by both models. Your first task is to implement them based on instructions in _base_network.py:

(a) Activation Functions. There are two activation functions needed for this assignment: ReLU and Sigmoid. Implement both functions as well as their derivatives in ./models/_base_network.py (i.e, sigmoid, sigmoid_dev, ReLU, and ReLU_dev). Test your methods with:

$ python -m unittest tests.test_activation

(b) Loss Functions. The loss function used in this assignment is Cross Entropy Loss. You will need to implement both Softmax function and the computation of Cross Entropy Loss in ./models/_base_network.py.

$ python -m unittest tests.test_loss

(c) Accuracy. We are also interested in knowing how our model is doing on a given batch of data. Therefore, you may want to implement the compute_accuracy method in ./models/_base_network.py to com- pute the accuracy of given batch.

2.2 Model Implementation

You will implement the training processes of a simple Softmax Regression and a two-layer MLP in this section. The Softmax Regression is composed by a fully-connected layer followed by a ReLU activation. The two-layer MLP is composed by two fully-connected layers with a Sigmoid Activation in between. Note that the Sofmax Regression model has no bias terms, while the two-layer MLP model does use biases. Also, don’t forget the softmax function before computing your loss!

(a) Implement the forward method in softmax_regression.py as well as two_layer_nn.py. If the mode argument is train, compute gradients of weights and store the gradients in the gradient dictionary. Otherwise, simply return the loss and accuracy. Test:

$ python -m unittest tests.test_network

3 Optimizer

We will use an optimizer to update weights of models. An optimizer is initialized with a specific learning rate and a regularization coefficients. Be- fore updating model weights, the optimizer applies L2 regularization on the

Figure 1: Example plot of learning curves

and report your observations by answering questions in the report template. We provide a default config file config_exp.yaml in ./configs. When tun- ing a specific hyper-parameter (e.g, the learning rate), please leave all other hyper-parameters as-is in the default config file.

(a) You will try out different values of learning rates and report your ob- servations in the report file.

(b) You will try out different values of regularization coefficients and report your observations in the report file.

(c) You will try your best to tune the hyper-parameters for best accuracy.

(d) When tuning for best accuracy, tuning just epochs is not interesting. Tune at least 3 hyper-parameters (not including epochs). You may increase or decrease epochs it does not count as 1 of the 3.

(e) When tuning for best accuracy, the best model should have a marked improvement compared to the default hyper-parameters.

(f) When reporting observations be aware of applying good scientific meth- ods.

6 Deliverables

6.1 Coding

To submit your code to Gradescope, you will need to submit a zip file con- taining all your codes in structure. For your convenience, we provide a handy script for you.

Simply run

$ bash c o l l e c t _ s u b m i s s i o n. sh

or if running Microsoft Windows 10

C: \ a s s i g n m e n t f o l d e r >c o l l e c t _ s u b m i s s i o n. bat

then upload assignment_ 1 _submission.zip to Gradescope.

6.2 Writeup

You will also need to submit a report summarizing your experimental results and findings as specified in Section 5. Again, we provide a starting template for you and your task is just to answer each question in the template. For whichever questions asking for plots, please include plots from all your ex- periments.

Note: Explanations should offer some intuition on why certain results might have been observed using your knowledge of Machine Learning. When tun- ing hyperparameters, explain the reasoning behind the choices. If you need more than one slide for a question, you are free to create new slides right after the given one.

You will need to export your report in pdf format and submit to Gradescope. You should combine your answers to the theory questions, paper review, and report into one pdf and submit it to the ā€Assignment 1 Writeupā€ assignment in Gradescope. When submitting to Gradescope, make sure you se- lect ALL corresponding slides for each question. Failing to do so will result in -1 point for each incorrectly tagged question, with future assignments having a more severe penalty.