Machine Learning & Optimization (Data Analytics), Exercises of Machine Learning

ACTUAL COMPLETE REAL EXAM QUESTIONS AND CORRECT DETAILED ANSWERS (VERIFIED ANSWERS ) ALREADY GRADED A+//BRAND NEW!!!

Typology: Exercises

2025/2026

Available from 04/23/2026

brian-mugo-2
brian-mugo-2 🇰🇪

211 documents

1 / 74

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
ISYE 6501 MIDTERM 2 EXAM 2025/2026 ACTUAL COMPLETE REAL
EXAM QUESTIONS AND CORRECT DETAILED ANSWERS (VERIFIED
ANSWERS ) ALREADY GRADED A+//BRAND NEW!!!
when might overfitting occur - ANSWER-when the # of factors
is close to or larger than the # of data points causing the model
to potentially fit too closely to random effects
Why are simple models better than complex ones - ANSWER-
less data is required; less chance of insignificant factors and
easier to interpret
What is the objective function in linear regression? - ANSWER-
trying to minimize the squared error in linear regression is an
example of this
What are the statistical variables and constants in linear
regression? - ANSWER-the data; the coefficients in linear
regression are examples of these
what are the variables and constants in optimization model for
linear regression? *** - ANSWER-the data is the constant and
the coefficients are the variables
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43
pf44
pf45
pf46
pf47
pf48
pf49
pf4a

Partial preview of the text

Download Machine Learning & Optimization (Data Analytics) and more Exercises Machine Learning in PDF only on Docsity!

ISYE 6501 – MIDTERM 2 EXAM 2025/2026 ACTUAL COMPLETE REAL

EXAM QUESTIONS AND CORRECT DETAILED ANSWERS (VERIFIED

ANSWERS ) ALREADY GRADED A+//BRAND NEW!!!

when might overfitting occur - ANSWER-when the # of factors is close to or larger than the # of data points causing the model to potentially fit too closely to random effects

Why are simple models better than complex ones - ANSWER- less data is required; less chance of insignificant factors and easier to interpret

What is the objective function in linear regression? - ANSWER- trying to minimize the squared error in linear regression is an example of this

What are the statistical variables and constants in linear regression? - ANSWER-the data; the coefficients in linear regression are examples of these

what are the variables and constants in optimization model for linear regression? *** - ANSWER-the data is the constant and the coefficients are the variables

What is the objective function in logistic regression - ANSWER- to minimize the prediction error in logistic regression

What are the variables in logistic regreession - ANSWER-the coefficients in logistic regression

in soft and hard classification SVMs what are the variables - ANSWER-the coefficients in SVMs

what is the constraint in hard classification for SVMS - ANSWER-each observation has to be on the right side of the line in SVMs

What is the objective function for hard classification? - ANSWER-to maximize the margin in SVMs

For soft classification, what is the objective function? - ANSWER-to minimize classification error and maximize the margin in SVMS

what is the objective function for a time series model? - ANSWER-to minimize prediction error for a time series model

what is a linear program? - ANSWER-f(x) is a linear function; constraint set X is defined by linear equations and inequalites

what is convex quadratic program - ANSWER-f(x) is a convex quadratic function. Minimize f(x) or Maximize -f(x). constraint set X is defined by linear equations and inequalites

what is forward selection - ANSWER-we select the best new factor and see if it's good enough (R^2, AIC, or p-value) add it to our model and fit the model with the current set of factors. Then at the end we remove factors that are lower than a certain threshold

what is backward elimination - ANSWER-we start with all factors and find the worst on a supplied threshold (p = 0.15). If it is worse we remove it and start the process over. We do that until we have the number of factors that we want and then we move the factors lower than a second threshold (p = .05) and fit the model with all set of factors

what is stepwise regression - ANSWER-it is a combination of forward selection and backward elimination. We can either start with all factors or no factors and at each step we remove or add

a factor. As we go through the procedure after adding each new factor and at the end we eliminate right away factors that no longer appear.

what type of algorithms are stepwise selection? - ANSWER- Greedy algorithms - at each step they take one thing that looks best

what is LASSO - ANSWER-a variable selection method where the coefficients are determined by both minimizing the squared error and the sum of their absolute value not being over a certain threshold t

How do you choose t in LASSO - ANSWER-use the lasso approach with different values of t and see which gives the best trade off

why do we have to scale the data for LASSO - ANSWER-if we don't, the measure of the data will artificially affect how big the coefficients need to be

What is elastic net? - ANSWER-A variable selection method that works by minimizing the squared error and constraining the combination of absolute values of coefficients and their squares

Advantages: variable selection from LASSO and Predictive benefits of Ridge.

Disadvantages: Arbitrarily rules out some correlated variables (e.g. LASSO doesn't know which one should be left out); Underestimates coefficients of very predictive variables (i.e. Ridge Regression)

What are some downsides of surveys? - ANSWER-Even if you have what appears to be a representative sample in simple ways, maybe it isn't in more complex ways.

If we're testing to see whether red cars sell for higher prices than blue cars, we need to account for the type and age of the cars in our data set. This is called: - ANSWER-Controlling

what is a blocking factor *** - ANSWER-a source of variability that is not of primary interest to the experimenter

what is an example of a blocking factor - ANSWER-The type of car, sports car or family car, is a blocking factor that it could account for some of the difference between red cars and blue cars. Because sports cars are more likely to be red; if we

account for the difference, we can reduce the variability in our estimates

Under what conditions should you run A/B tests - ANSWER- When you can collect data quickly. When the data is representative and the amount of data is small compared to the whole population

Do you have to decide the sample size ahead of time for A/B tests - ANSWER-no, and we can run the hypothesis test anytime we want

What is full factorial design - ANSWER-you test every combination and then use ANOVA to determine importance of each factor

What is fractional factorial design - ANSWER-when you test a subset of the entire set of combinations

What is a balanced design? - ANSWER-You test each choice the same # of times and each pair of choices the same # of times

When is regression effective in variable selection? - ANSWER-If there aren't significant interactions between the factors.

values. For updating we can use bayesian updates or estimate from the observed distribution

What are common reasons that data sets are missing values? - ANSWER-* a person accidentally types in the wrong value

  • a person did not want to reveal the true value
  • an automated system did not work correctly to record the value

What are some examples of why there might be bias in missing data - ANSWER-* Income: people with higher incomes are more likely to omit this answer

  • Radar gun: a car that passes the radar gun very slowly might be treated as an anomaly and its speed might be recorded in the system
  • Heart transplants: If there's a variable "date of death" it will be missing for patients still living and thus the missing data will naturally include more successful transplant cases

What are three ways of dealing with missing data that don't require imputation - ANSWER-discard the data, use categorical variables to indicate missing data, estimate missing values

What are the pros and cons of throwing away missing data - ANSWER-Pros: not potentially introducing errors; easy to implement

Cons: don't want to lose to many data points; potential for censored or biased missing data

What is the categorical variable approach - ANSWER-If the data is categorical, we just add another category "missing". With quantitative variables you include interactions variables between the categorical variable and other variables.

Why wouldn't you want to fill in missing quantitative variabes with 0 - ANSWER-It can lead to problems if some types of data points are more likely than others to have missing data. The coefficients of the other variables might be pulled in one direction or another to try to account for the missing data

What are the advantages and disadvantages of imputing missing data with the mean, median (numeric) or mode (categorical) - ANSWER-Advantage: hedge against being too wrong and easy to compute

what is the binomial distribution - ANSWER-the probability of getting x successes out of n independent identically distributed Bernoulli (p) trials; e.g. count of successful coin flips in n trials

What happens when n is big for binomial distribution - ANSWER-it converges to normal distribution

what is a Bernoulli distribution - ANSWER-it's like a flipping coin. It can be used to model a single event and is most useful when we put many of them together

what are some examples of a geometric distribution *** - ANSWER-How many interviews until first job offer; how many hits until a baseball bat breaks

what is a geometric distribution? - ANSWER-How many Bernoulli trials until ...; It is the probability of having x Bernoulli(p) falures until first success or having Bernoulli(p) success until first failure

In a geometric distribution what is the value that is set to a power - ANSWER-The thing you're trying to see how many X until something

What are the assumptions does a geometric distribution make?

  • ANSWER-Each Bernoulli trial is independent and identically distributed

what is the Poisson distribution good at modeling - ANSWER- random arrivals

what does the Poisson distribution assume - ANSWER- independent and identically distributed

If arrivals are poisson what then the interarrival time is what type of distribution - ANSWER-exponential

If the inter-arrival time is exponential what type of distribution is the arrival - ANSWER-poisson distribution is the underlying distribution for what 'inter-arrival', or 'until-when' distribution

what is the difference between Weibull and geometric distribution - ANSWER-weibull - time between failures; geometric - number of tries between failures

What is the weibull distribution useful for modeling - ANSWER- time it takes something to fail, specifically time between failures

Ways to Deal With Uncertainty in Optimization - ANSWER- Mathematical programming

Model Conservatively

Scenario Modeling

Dynamic Programming

Stochastic Dynamic Program

Markov Decision Process

Dynamic Programming - ANSWER-Divides systems into states (what's going on in the system) States - the exact situations and their values

At each state the decision maker gets to make a decision Decision - choices of next state Based on the decision the system moves to the next state

Use Bellman's equation to determine optimal decision to make at each state

assumes there's no uncertainty

Stochastic Dynamic Program - ANSWER-Dynamic program but decisions have probabilities of next state

Markov Decision Process - ANSWER-Have a discrete set of states and decisions, and the probabilities only depend on the current state/decision (memoryless)

Stochastic dynamic program with discrete states and decisions

Probabilities only depend on current state/decision

Basic 2 Steps of Optimization Algorithms - ANSWER-1) Initialization - create a first solution (values for all variables), can be simple/bad/infeasible

  1. Repeat two stage process:

2-a) start with current solution and find a vector of relative changes to make to each variable

2-b) make changes in that improving direction some amount

stop process when solution doesn't change much or time runs out

Mann-Whitney (unpaired)

what tests are use for 1 data set - ANSWER-Wilcoxon signed rank test (compare possible median)

McNemar's Test - ANSWER-Compare results on pairs of responses data points where two different approaches were used on the same thing

Don't need to know anything about the distribution, it's just a comparison of pairs of results

Wilcoxon Signed Rank Test for Medians - ANSWER- Assumption - distribution is continuous and symmetric, Is the median different from a specific value m

If the probability p of getting a sum of ranks at least as extreme as W is small, then we can say the median is probably different from m, otherwise we don't

version for paired samples

When to use McNemar vs Wilcoxon - ANSWER-numeric data - use Wilcoxon

Yes/no data - us McNemar

Mann-Whitney Test - ANSWER-2 data sets, but not paired samples Assume all observations are independent of each other

Sum up all yi and all zi and compare smallest sum against a table that gives the significance of the difference

What kind of model do you use based on the metric? - ANSWER-Mean or other metric using exact values of data - use parametric

median of other metric using ranks of values of data - use non- parametric

success probability, or other metric using counts of binary outcomes of data - use binomial

Empirical Bayes Modeling - ANSWER-Overall distribution of something is known or estimated

Only a little data available for a specific case