Data and Machine Learning Practice Exam, Exams of Machine Learning

MATH5836 Data and Machine Learning Practice Exam from University of New South Wales 2025

Typology: Exams

2025/2026

Uploaded on 12/08/2025

aakash-12
aakash-12 🇦🇺

1 document

1 / 25

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Assessment 4: Practice Exam
Introduction
Sample Final Examination
There are three parts to this examination:
Part A (Quiz): 10 marks
Part B (Short answers): 14 marks
Part C (Programming): 26 marks (Note that you have the option to do either Q2 or Q3)
Instructions
All answers must be submitted online using the provided instructions in the respective
questions.
Answer all the questions in Part A and B, and one question in Part C.
Questions may be answered in any order.
Ensure you submit all answers
Part B and Part C needs to be submitted as a single pdf document in Moodle - Special
Exam. In case of Part C, do not include any code in this document, code will be
submitted in Ed.
Do not use email or any other software to communicate during the exam.
Do not use ChatGPT or other AI tools for the exam.
Please email me directly if you have any issues (rohitash.chandra@unsw.edu.au)
You are free to use any software installed in the lab computers, including Jupyter
notebooks. Note that the entire course is run on Edstem and some libraries may not work
in desktop computer and hence you should use Edstem rather than the desktop for
coding.
Ensure that you create a pdf using open office and upload in Moodle. You can simply save
as doc and print as pdf.
The exam will be held for 2 hours in Lab with restricted internet with the following sites
and resources: https://edstem.org/au/courses/19116/lessons/60882/slides/413118
Note that the above weights for sections can change in your final exam, i.e for example, you can have 15
marks with17 multiple-choice questions (best 15) in Part A. Allocation and questions for Part B may be reduced
if this happens.
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19

Partial preview of the text

Download Data and Machine Learning Practice Exam and more Exams Machine Learning in PDF only on Docsity!

Assessment 4: Practice Exam

Introduction

Sample Final Examination

There are three parts to this examination: Part A (Quiz): 10 marks Part B (Short answers): 14 marks Part C (Programming): 26 marks (Note that you have the option to do either Q2 or Q3)

Instructions

All answers must be submitted online using the provided instructions in the respective questions. Answer all the questions in Part A and B, and one question in Part C. Questions may be answered in any order. Ensure you submit all answers Part B and Part C needs to be submitted as a single pdf document in Moodle - Special Exam. In case of Part C, do not include any code in this document, code will be submitted in Ed. Do not use email or any other software to communicate during the exam. Do not use ChatGPT or other AI tools for the exam. Please email me directly if you have any issues ([email protected]) You are free to use any software installed in the lab computers, including Jupyter notebooks. Note that the entire course is run on Edstem and some libraries may not work in desktop computer and hence you should use Edstem rather than the desktop for coding. Ensure that you create a pdf using open office and upload in Moodle. You can simply save as doc and print as pdf. The exam will be held for 2 hours in Lab with restricted internet with the following sites and resources: https://edstem.org/au/courses/19116/lessons/60882/slides/ Note that the above weights for sections can change in your final exam, i.e for example, you can have 15 marks with17 multiple-choice questions (best 15) in Part A. Allocation and questions for Part B may be reduced if this happens.

You can reuse code from the lessons in the course and from exercise solutions.

Part A: Online quiz (10 marks)

Question 1 Question 2 You need to answer 10 multiple choice questions in this section. Each question is worth 1 mark. All the questions are compulsory. Which activation function in the output layer of a neural network would be most suited for a multiclass classification problem? Softmax ReLu Hyberbolic tangent Linear None of the above Which of the following statements is correct? Adam is generally faster than SGD. Achieving excellent training performance on the training dataset implies that you have an excellent model. It is best to randomly assign the number of hidden neurons irrespective of the dataset. Keras employs scikit-learn in its core framework. None of the above

Question 3 Question 4 Question 5 What would be a major difference between the role of a data scientist and a data engineer? They do not have any differences in roles at major companies. Data scientists typically use machine learning models to develop solutions and compile reports while data engineers work with databases/datasets to organise, process and visualise data. Data engineers are database managers and data scientists are programmers. They both do similar work, but data scientists present mostly while data engineers develop models. None of the above. What would be the best model for highly non-linear and chaotic time series prediction problem? Linear regression model Logistic regression model Neural network model with sigmoid activation function in output layer Neural network model with linear activation function in output layer Either linear and sigmoid activation can be used in the output layer of a neural Network for this problem. Given ROC and AUC (0.7) in the below figure, which of the following statements is true?

Question 7 Question 8 Question 9 Which one of the following statements is true? In bagging, models are trained sequentially, and the aim is to reduce erros in every subsequent steps. In boosting, models are trained in parallel independent of each other and the outcomes are combined. In stacking, models are trained in parallel independent of each other and the outcomes are combined. In bagging, models are trained independent of each other and the outcomes are combined. None of the above. Suppose you want to cluster the following data set into two clusters. Which one of the following algorithm is the most suitable for your task? K-Means Algorithm DBSCAN Algorithm Agglomerative Clustering Algorithm Random Forest Algorithm

Question 10 Which one of the following sentences is correct? Model-based collaborative filtering uses descriptions of items for recommendations, and is similar to Amazon-style recommender systems. Collaborative filtering works well even with a very limited past recommendations. Memory-based collaborative filtering uses descriptions of items for recommendations, and is similar to Amazon-style recommender systems. Model-based collaborative filtering uses well-understood techniques from information retrieval. None of the above. Which one of the following statements is not true about Principal Component Analysis (PCA)? PCA is an unsupervised method. PCA searches for the directions that data have the smallest variance. Maximum number of principal components <= number of features. All principal components are orthogonal to each other.

Pandas_Cheat_Sheet.pdf Scikit_Learn_Cheat_Sheet_Python.pdf numpy-user.pdf matplotlib.pdf Machine Learning Modelling in R.pdf data-transformation.pdf

  1. Calculator: https://www.desmos.com/scientific
  2. Machine Learning in Python. Retrieved from https://scikit-learn.org/stable/
  3. R interface to Keras. Retrieved from https://keras.rstudio.com/
  4. Introduction to Keras. Retrieved from https://keras.io/
  5. The caret Package. Retrieved from https://topepo.github.io/caret/
  6. Python. Retrieved from https://docs.python.org/3/.
  7. Rdrr.io. Retrieved from https://rdrr.io/r/
  8. https://edstem.org/
  9. https://pandas.pydata.org/
  10. https://matplotlib.org/
  11. https://moodle.telt.unsw.edu.au/ "You need to upload a pdf document of your response in Moodle - Final Exam - depending on your session. Note only one document needs to be uploaded that will include Part B and Part C Moodle submission link: Upload to Moodle (Section B and C Answers): https://moodle.telt.unsw.edu.au/mod/turnitintooltwo/view.php?id=

Part A : Solutions

1: A

2: A

3: B

4: E

5: A

6:A

7: D

8: B

9: E

10: B

Part B: Q1 (2 marks)

If a Decision Tree is overfitting the training set, is it a good idea to try decreasing max_depth? Briefly explain your answer. Type your response in the Challenge workspace (in the file answer.txt) and then click on the Submit button at the bottom right of the screen.

Part B: Q2 (2 marks)

Briefly explain the most important difference between the AdaBoot and the Gradient Boosting methods. Type your response in the Challenge workspace (in the file answer.txt) and then click on the Submit button at the bottom right of the screen.

Part B: Q4 (2 marks)

In multi-layer perceptron, does increasing the number of hidden layers improve performance? Explain your answer with reference to any dataset example from lessons or assignment. Type your response in the Challenge workspace (in the file answer.txt) and then click on the Submit button at the bottom right of the screen.

Part B: Q5 (2 marks)

Explain what is happening in the code below. def BackwardPass(self, input_vec, desired): out_delta = (desired - self.out)(self.out(1-self.out)) hid_delta = out_delta.dot(self.W2.T) * (self.hidout * (1-self.hidout)) if self.vanilla == True: self.W2+= self.hidout.T.dot(out_delta) * self.learn_rate self.B2+= (-1 * self.learn_rate * out_delta) self.W1 += (input_vec.T.dot(hid_delta) * self.learn_rate) self.B1+= (-1 * self.learn_rate * hid_delta) else: v2 = self.W2.copy() v1 = self.W1.copy() b2 = self.B2.copy() b1 = self.B1.copy() self.W2+= ( v2 *self.momenRate) + (self.hidout.T.dot(out_delta) * self.lear self.W1 += ( v1 *self.momenRate) + (input_vec.T.dot(hid_delta) * self.learn_ self.B2+= ( b2 *self.momenRate) + (-1 * self.learn_rate * out_delta) # self.B1 += ( b1 *self.momenRate) + (-1 * self.learn_rate * hid_delta) Type your response in the Challenge workspace (in the file answer.txt) and then click on the Submit button at the bottom right of screen.

Part C: Programming questions (26 marks)

For Part C questions, you need to answer Part C: Q1 and one of Part C: Q2 OR Part C: Q

Part C: Q1 (6 marks)

This remains a challenge for large models and unstructured datasets.For the following tasks, you need to write Python (or R) code, along with the required comments in the file answer.py (or answer.r ) and submit your solution. Load the dataset available in dataset_clustering.csv. Your task is to cluster the dataset using K- Means. You need to use silhouette scores to select a suitable number of clusters. and store that value in the variable named best_k and the corresponding model should be stored in the variable named best_model. In your comments, provide brief justifications, with clearly articulated reasons, for the alternatives you explored to build the model you submitted.

How to submit

Type your solution (python code and comments) in the Challenge workspace (in the file answer.py or answer.r ) and then click on the Submit button at the bottom right of the screen.