Georgia Institute of Technology], Exams of Mathematical Modeling and Simulation

Georgia Institute of Technology]

Typology: Exams

2025/2026

Available from 06/14/2026

WuodKowino
WuodKowino 🇺🇸

3.9

(11)

26K documents

1 / 42

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
ISYE 6501 Analytics Modeling Midterm 1
Question and Answer 2026 | Graded
What do descriptive questions ask? -✓✓What happened? (e.g., which customers are
most alike)
• What do predictive questions ask? -✓✓What will happen? (e.g., what will Google's
stock price be?)
• What do prescriptive questions ask? -✓✓What action(s) would be best? (e.g., where to
put traffic lights)
• What is a model? -✓✓Real-life situation expressed as math.
• What do classifiers help you do? -✓✓differentiate
• What is a soft classifier and when is it used? -✓✓In some cases, there won't be a line
that separates all of the labeled examples. So we use a classifier that minimizes the
number of mistakes.
• What does it mean when the classifier/decision boundary is almost parallel to the
vertical x-axis? -✓✓The horizontal attribute is all that is needed.
• What does it mean when the classifier/decision boundary is almost parallel to the
horizontal y-axis? -✓✓The vertical attribute is all that is needed.
• What is time-series data? -✓✓The same data recorded over time often recorded at
equal intervals
• What is quantitative data? -✓✓Number with a meaning: higher means more, lower
means less (e.g., age, sales, temperature, income)
• What is categorical data? -✓✓Numbers w/o meaning (e.g., zip codes), non-numeric
(e.g., hair color), binary data (e.g., male/female, yes/no, on/off)
• Which of these is time series data?
A. The average cost of a house in the United States every year since 1820
B. The height of each professional basketball player in the NBA at the start of the
season -✓✓A
• Which of these is structured data?
A. The contents of a person's Twitter feed
B. The amount of money in a person's bank account -✓✓B
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a

Partial preview of the text

Download Georgia Institute of Technology] and more Exams Mathematical Modeling and Simulation in PDF only on Docsity!

ISYE 6501 Analytics Modeling Midterm 1

Question and Answer 2026 | Graded

  • What do descriptive questions ask? - ✓✓What happened? (e.g., which customers are most alike)
  • What do predictive questions ask? - ✓✓What will happen? (e.g., what will Google's stock price be?)
  • What do prescriptive questions ask? - ✓✓What action(s) would be best? (e.g., where to put traffic lights)
  • What is a model? - ✓✓Real-life situation expressed as math.
  • What do classifiers help you do? - ✓✓differentiate
  • What is a soft classifier and when is it used? - ✓✓In some cases, there won't be a line that separates all of the labeled examples. So we use a classifier that minimizes the number of mistakes.
  • What does it mean when the classifier/decision boundary is almost parallel to the vertical x-axis? - ✓✓The horizontal attribute is all that is needed.
  • What does it mean when the classifier/decision boundary is almost parallel to the horizontal y-axis? - ✓✓The vertical attribute is all that is needed.
  • What is time-series data? - ✓✓The same data recorded over time often recorded at equal intervals
  • What is quantitative data? - ✓✓Number with a meaning: higher means more, lower means less (e.g., age, sales, temperature, income)
  • What is categorical data? - ✓✓Numbers w/o meaning (e.g., zip codes), non-numeric (e.g., hair color), binary data (e.g., male/female, yes/no, on/off)
  • Which of these is time series data? A. The average cost of a house in the United States every year since 1820 B. The height of each professional basketball player in the NBA at the start of the season - ✓✓A
  • Which of these is structured data? A. The contents of a person's Twitter feed B. The amount of money in a person's bank account - ✓✓B
  • What is structured data? - ✓✓Data that can be stores in a structured way
  • What is unstructured data? - ✓✓Data that is not easily described and stored (e.g., written text)
  • A survey of 25 people recorded each person's family size and type of car. Which of these is a data point? A. The 14th person's family size and car type B. The 14th person's family size C.The car type of each person - ✓✓A. A data point is all the information about one observation
  • The farther the wrongly classified point is from the line ___ - ✓✓The bigger the mistake we've made
  • The term including the margin gets larger so the importance of a large margin out weights avoiding mistakes and classifying known data samples. - ✓✓As lambda gets larger
  • That term also drops towards zero, so the importance of minimizing mistakes and classifying known data points outweighs having a large margin. - ✓✓As lambda drops towards zero
  • What can SVMs be used for - ✓✓to find a classifier with maximum seperation or margin between the two sets of points?
  • When to use SVM? - ✓✓If it's impossible to avoid classification errors, SVM can find a classifier that trades off reducing errors and enlarging the margin.
  • Error for data point j - ✓✓What does this formula describe?
  • Total error - ✓✓What does this formula describe?
  • To maximize the distance between the two lines what do we need to minimize? - ✓✓
  • m_j > 1 - ✓✓What value do we give for more costly errors
  • Giving a bad loan is twice as costly as withholding a good loan? - ✓✓What does this mean in the context of giving a loan?
  • m_j < 1 - ✓✓What value do we give for less costly errors?
  • If we use the same data to fit a model as we do to estimate how good it is, what is likely to happen? - ✓✓The model will appear to be better than it really is. The model will be fit to both real and random patterns in the data. The model's effectiveness on this data set will include both types of patterns, but its true effectiveness on other data sets (with different random patterns) will only include the real patterns
  • When comparing models, if we use the same data to pick the best model as we do to estimate how good the best one is, what is likely to happen? - ✓✓The model will appear to be better than it really is. The model with the highest measured performance is likely to be both good and lucky in its fit to random patterns.
  • What is a training set used for - ✓✓used to fit the models
  • What is a validation set used for? - ✓✓used to choose best model
  • Why would we use two sets? - ✓✓Reason to use two different sets is because if the first set, the training set, had unique random effects that the classifer was designed for, we wouldn't be counting those benefits when we measure effectiveness on the validation set.
  • What effects does randomness have on training /validation performance? - ✓✓sometimes the randomness will make the performance look worse than it really is, and sometimes the randomness will make the performance look better than it really is
  • how are high-performing models affected by randomness? - ✓✓They are often boosted by above average random effects making it look better
  • what is a test data set used for? - ✓✓to estimate performance of chosen model
  • When do we need a validation set? - ✓✓When we are choosing between multiple models.
  • What are the data splits when working with one model? - ✓✓ 70 - 90% training, 10-30% test
  • What are the data splits when comparing models? - ✓✓ 50 - 70% training, split the rest between validation and test
  • What are two methods of splitting data? - ✓✓random and roation
  • What is the rotation method of splitting data? - ✓✓You take turns selecting points. 5 data point rotation sequence: (Training - Validation - Training - Test - Training
  • What is the advantage of rotation over randomness? - ✓✓We make sure each part of the data is equally separated.
  • What is the disadvantage of using rotation? - ✓✓We have to make sure we aren't creating some other type of bias when we assign points.
  • what is k-fold cross validation? - ✓✓split the training/validation data into k-parts; we train on k-1 parts and validate on the remaining part.
  • What metric do you use for k-fold cross validation when comparing models? - ✓✓The average of all k evaluations.
  • What do we use when important data only appears in the validation or test sets? - ✓✓cross-validation
  • What do we do after we've performed cross-validation? - ✓✓We train the model again using all the data.
  • what are the benefits of k-fold cross validation? - ✓✓better use of data, better estimate of model quality, and chooses model more effectively
  • What can clustering be used for? - ✓✓grouping data points (e.g., market segmentation) and discovering groups in data points (e.g., personalized medicine
  • Which should we use most of the data for: training, validation, or test? - ✓✓training
  • In k-fold cross-validation, how many times is each part of the data used for training, and for validation? - ✓✓k-1 times for training, and 1 time for validation
  • what is rectangular distance useful for? - ✓✓calculating driving distance when the city is mapped in a grid
  • what is the value of p for euclidean distance - ✓✓ 2
  • what is the general equation for p-norm distance - ✓✓
  • 2-norm - ✓✓Straight-line distance corresponds to which distance metric?
  • How do you find the distance of an infinity norm? - ✓✓You find the largest | x_i - y_i |
  • What is a centroid - ✓✓the center of a cluster
  • A group of astronomers has a set of long-exposure CCD images of various distant objects. They do not know yet which types of object each one is, and would like your help using analytics to determine which ones look similar. Which is more appropriate: classification or clustering? - ✓✓clustering
  • Suppose one astronomer has categorized hundreds of the images by hand, and now wants your help using analytics to automatically determine which category each new image belongs to. Which is more appropriate: classification or clustering? - ✓✓classification
  • Which of these is generally a good reason to remove an outlier from your data set? A. The outlier is an incorrectly-entered data, not real data. B. Outliers like this only happen occasionally. - ✓✓A. If the data point isn't a true one, you should remove it from your data set.
  • What is an outlier? - ✓✓A data point that is very different from the rest
  • What graph or plot can we use to find outliers? - ✓✓box-and-whisker plot
  • What are the parts of a box-and-whisker plot? - ✓✓The bottom and top of the box are the 25th and 75th percentile. The middle value is the median. The whiskers stretch up and down to the most extreme non-outlier values.
  • Where would outliers exist in a box and whisker plot - ✓✓outside of the whiskers.
  • What are some ways to deal with outliers that are bad data? - ✓✓Omit them or use imputation
  • What can change detection be used for? - ✓✓Determining whether action might be needed, determining impact of past action, determining changes to help plan.
  • What is Cumulative sum (CUSUM) used for - ✓✓detect in crease, decrease or both
  • What is C used for in the Cusum formula - ✓✓Since we expect there to be some randomness, we include a value C to pull the running total down
  • If we have a larger C ... - ✓✓the harder for S_t to get large and the less sensitive the method will be
  • If we have a smaller C ... - ✓✓the more sensitive the method is because S_t can get larger faster
  • What factors go into finding the right values of C and T? - ✓✓how costly it is if the model takes a long time to nice a change, and how costly it is if the model think it has found a change that really isn't there.
  • Why are hypothesis tests often not sufficient for change detection? - ✓✓They often are slow to detect changes. Hypothesis tests generally have high threshold levels, which makes them slow to detect changes.
  • In the CUSUM model, having a higher threshold T makes it... - ✓✓detect changes slower, and less likely to falsely detect changes.
  • In the exponential smoothing equation S_t = \alpha \times x_t + (1-\alpha) \times S_{t- 1} a value of closer to 1 is chosen if... - ✓✓There's less randomness, so we're more willing to trust the observation. We put more weight on the observation x_t than the previous estimate S_{t-1}
  • A multiplicative seasonality, like in the Holt-Winters method, means that the seasonal effect is... - ✓✓Proportional to the baseline value. A multiplicative seasonality is larger when the baseline value is larger, because its effect is a multiple of the baseline
  • In the exponential smoothing equation S_t = \alpha \times x_t + (1-\alpha) \times S_{t- 1} only the current observation x_t is considered in calculating the estimate S_t. - ✓✓False. we consider all previous observations
  • Is exponential smoothing better for short-term forecasting or long-term forecasting? - ✓✓Short-term Exponential smoothing bases its forecast primarily on the most-recent data points. For forecasts of the longer-term future, there aren't data points close to the time being forecasted
  • In simple forecasting with basic exponential smoothing what is the value of F_{t+i} - ✓✓S_t
  • What does autoregression mean? - ✓✓Previous values of the thing being estimated are used to calculate the estimate
  • Why would we want to estimate the variance? - ✓✓Knowing the variance can help us estimate the amount of error
  • Which of the following does principal component analysis (PCA) do? - ✓✓Transform data so there's no correlation between dimensions and rank the new dimensions in likely order of importance.
  • If you use principal component analysis (PCA) to transform your data and then you run a regression model on it, how can you interpret the regression coefficients in terms of the original attributes? - ✓✓Each original attribute's implied regression coefficient is equal to a linear combination of the principal components' regression coefficients. This is equivalent to using the inverse transformation.
  • True or false: In a regression tree, every leaf of the tree has a different regression model that might use different attributes, have different coefficients, etc. - ✓✓True. Each leaf's individual model is tailored to the subset of data points that follow all of the branches leading to the leaf.
  • Tree-based approaches can be used for other models besides regression. - ✓✓True. For example, a classification tree might have a different SVM or KNN model at each leaf. It might even use SVM at some leaves and KNN at others (though that's probably rare).
  • A common rule of thumb is to stop branching if a leaf would contain less than 5% of the data points. Why not keep branching and allow models to find very close fits to each very small subset of data? - ✓✓Fitting to very small subsets of data will cause overfitting. With too few data points, the models will fit to random patterns as well as real ones
  • True or False: When using a random forest model, it's easy to interpret how its results are determined. - ✓✓False. Unlike a model like regression where we can show the result as a simple linear combination of each attribute times its regression coefficient, in a random forest model there are so many different trees used simultaneously that it's difficult to interpret exactly how any factor or factors affect the result.
  • A logistic regression model can be especially useful when the response... - ✓✓...is a probability (a number between zero and one) or is binary (either zero or one).
  • wga - ✓✓
  • A model is built to determine whether data points belong to a category or not. A "true negative" result is: - ✓✓A data point that is not in the category, and the model correctly says so. True' and 'false' refer to whether the model is correct or not, and 'positive' and 'negative' refer to whether the model says the point is in the category.
  • True or False: The most useful classification models are the ones that correctly classify the highest fraction of data points. - ✓✓False. Sometimes the cost of a false positive is so high that it's worth accepting more false negatives, or vice versa.
  • In exponential smoothing what is S_t - ✓✓the expected baseline response at time period t e.g., blood pressure at hour t
  • In exponential smoothing what is x_t - ✓✓observed response. Observed blood pressure at t
  • S_t = \alpha \times x_t + (1 - \alpha)S_{t-1}. When \alpha is closer to zero - ✓✓a lot of randomness in the system. the previous baseline is probably a good indicator of today's baseline
  • S_t = \alpha \times x_t + (1 - \alpha)S_{t-1}. When \alpha is closer to 1 - ✓✓not much randomness in the system. If we observe a fluctuation today, it probably means today's baselines is close to the observed data
  • What is T_t in S_t = \alpha \times x_t + (1 - \alpha)(S_{t-1} + T_{t-1}) - ✓✓The trend at time t
  • what is the initial condition for T? - ✓✓T_1 = 0
  • what is the initial condition for S_t? - ✓✓S_1 = x_
  • What is L - ✓✓length of cycle. When we're taking daily observation then L is 7
  • What is C_t - ✓✓the multiplicative seasonality factor of time. It inflates or delates the observation
  • When C is = 1.1 what does that mean? - ✓✓10% higher just because of that interval of cycle
  • When C is = 1 what does that mean? - ✓✓no effect
  • How does the exponential smoothing formula weight more recent observations more than older ones? - ✓✓(1 - alpha) < 1
  • The further in the future we go ... - ✓✓The more uncertainty thus the anticpated forecast error gets larger
  • When using exponential smoothing for prediction/forecasting what value is used for x_{t+1}? - ✓✓S_t
  • when does exponential smoothing work well - ✓✓when the data is stationary (i.e., mean, variance and other measures are all expected to be constant over time)
  • How is the best fit regression line determined? - ✓✓It is the line that minimizes the sum of squared errors
  • What does AIC (Akaike Information Criterion) do and some of its properties? - ✓✓Encourages fewer parameters k and higher likelihood. Works well with a lot of data points.
  • How do you compare two AICs? - ✓✓
  • If the relative likelihood is 8.2% what does that mean? - ✓✓Model 2 is 8.2% as likely as Mode 1 to be better
  • When do you use BIC? - ✓✓when there are more data points than parameters
  • What's the difference between AIC and BIC - ✓✓BIC encourages models with fewer parameters than AIC does
  • When would you use Corrected AIC and not AIC - ✓✓when you have smaller data sets
  • What does |BIC_1 - BIC_2| > 10 mean? - ✓✓smaller BIC model is "very likely" better
  • What does 6 < |BIC_1 - BIC_2| < 10 mean? - ✓✓smaller BIC model is "likely" better
  • What does 2 < |BIC_1 - BIC_2| < 6 mean? - ✓✓smaller BIC model is "somewhat likely" better
  • What does 0 < |BIC_1 - BIC_2| < 2 mean? - ✓✓smaller BIC model is "slightly likely" better
  • When trying to answer questions about how a system works what is important - ✓✓the coefficients
  • If you using regression to make forecasts the key answers are? - ✓✓The responses
  • what is causation? - ✓✓One thing causes another
  • What is correlation? - ✓✓two things tend to happen or not happen together
  • When is there causation? - ✓✓Cause is before effect, idea of causation makes sense, no outside factors causing the relationship (hard to guarantee)
  • Can we still make predictions with a model if the predictor and response are highly correlated - ✓✓yes. Even though we can use it for empirical predictions, it doesn't make sense to say that the model shows causation
  • what is the R-squared value? - ✓✓estimates how much variability the model accounts for
  • what is adjusted r-squared - ✓✓same as r^2 but favors simpler models by penalizing for using too many variables
  • What is the T-statistic? - ✓✓the coefficient divided by its standard error; related to the p-value
  • When you have higher p-values ... - ✓✓increase the possibility of including irrelevant factors
  • when you have lower - pvalues - ✓✓increase the possibility of leaving out a relevant factor
  • What happens to p-values when you have a lot of data? - ✓✓they get small when attributes are not all related to the response
  • If you have 100 attributes with a p-value of 2% each what does that mean? - ✓✓we can expect 2 of them to be irrelevant.
  • Which plots can we use to check for normality? - ✓✓Q-Q plot
  • What does a Box-Cox transformation do? - ✓✓performs a logarithm transformation that stretches out the smaller range to enlarge its variability and shrinks the larger range to reduce its variability
  • Why would we want to detrend data? - ✓✓because the trend in time series could mess up a factor based analysis
  • what can you detrend? - ✓✓The response and predictors in factor-based models
  • Name two factor-based models? - ✓✓SVM and regression
  • How do you detrend data? - ✓✓Factor-by-factor. You fit a one-dimensional linear regression to the data and subtract
  • What does PCA do? - ✓✓Removes correlations within the data and ranks coordinates dimension in order of the amount of variance
  • Whey do you focus on the first n principal components? - ✓✓Reduces the effect of randomness and earlier principal components are likely to have higher signal to-noise ratios
  • what is sensitivity - ✓✓the fraction of category members that are correctly classified TP / (TP + FN)
  • what is specificity - ✓✓the fraction of non-category member that are correctly identified TN / (TN + FP)
  • what does the roc curve plot - ✓✓sensitivity plotted against 1 - specificity
  • what is the Area Under Curve - ✓✓probability that the model estimates a random "yes" point higher than a random "no" point
  • what does it mean when the AUC = 0.5 - ✓✓we are just guessing
  • What does ROC/AUC give you and what doesn't it - ✓✓gives a quick-and-dirty estimate of quality but does not differentiate between the coset of FN and FP
  • what does TP mean? - ✓✓point in the category, correctly classified
  • what does FP mean - ✓✓point not in category, model says it is
  • what does TN mean? - ✓✓point not in category, correctly classified
  • what does FN mean? - ✓✓point in the category model says no
  • how do you do KNN regression? - ✓✓plot all the data. predict response by taking average response of k closest data points
  • what are parametric methods? - ✓✓the form of the predictor (linear regression)
  • what are non-parametric methods - ✓✓we don't force any specific form onto the predictor (knn)
  • What is a spline? - ✓✓function of polynomials that connect to each other
  • How does regression splines work? - ✓✓Fit different functions to different parts of the data set with smooth connections between the parts.
  • What is the points where the different functions connect? - ✓✓they are called knots
  • Why do connection have to be smooth? - ✓✓Otherwise you could have drastically different answers for very nearby points.
  • How does Bayesian Regression work? - ✓✓Start with data and estimate of how regression coefficients and the random error is distributed. Then we use Bayes theorem to update estimate.
  • When should you use Bayesian Regression? - ✓✓When there's not much data and want to combine expert opinion.
  • If we have a classifier where one type of mistake is more costly where do we move the line? - ✓✓further away from that class
  • In a classifier what range can a0 have - ✓✓it can be between - 1 and 1
  • In knn, how can we remove unimportant attributes - ✓✓set the weight of that distance to 0`
  • What is bias? - ✓✓Bias is an error from erroneous assumption in the learning algorithm High bias can cause an algorithm to miss the relevant relations between features and target outputs (underfitting)
  • what is variance? - ✓✓An error from sensitivity to small fluctuations in the training set. High variance can cause an algorithm to model the random noise in the training data, rather than the intended outputs (overfitting)
  • what is a contextual outlier - ✓✓value isn't far from the rest overall but is far from the points nearby in time
  • outlier - ✓✓data point that is very different from the rest
  • collective outlier - ✓✓something is missing in a range of points but can't tell exactly where
  • how could we detect outliers when there are multiple dimensions? - ✓✓we could fit a model and then determine the points with a large error
  • What is an example of when we should keep outliers? - ✓✓When the magnitude of the model's error is part of the model's value
  • What can removing outliers do? - ✓✓Paint an overly optimistic picture
  • What are two ways to deal with outliers - ✓✓Have two models: Logistic Regression model to estimate the likelihood of outliers happening under certain conditions. Then a second model to predict with outliers and one without
  • what type of data is used in change detection? - ✓✓time-series data
  • What do ARIMA models do? - ✓✓Help forecast or estimate a value.
  • what is a common error measure for simple linear regression - ✓✓sum of squared error
  • what makes up the best-fit regression line - ✓✓coefficients that minimize the sum of squared errors.
  • what is likelihood - ✓✓the probability (probability density) of some observed outcomes given a set of parameter values
  • maximum likelihood - ✓✓parameters that give the highest probability
  • what is the maximum likelihood estimate - ✓✓the set of parameters that minimizes the sum of squared errors.
  • what can extra parameters do? - ✓✓cause overfittign
  • what does a smaller AIC encourage - ✓✓higher likelihood and less parameters
  • How does BIC's penalty term compare to AIC's penalty term - ✓✓It's bigger. BIC encourages models with fewer parameters than AIC does
  • if there is a strong relationship between a predictor and the response what will it's p- value be - ✓✓very low.
  • how do you find the implied regression coefficients in PCR? - ✓✓you multiply the eigen vector by the new coefficient
  • What does SVM stand for? - ✓✓Support Vector Machine
  • Is written text structured or unstructured? - ✓✓Unstructured
  • When we increase the sum of the square of the coefficients we... - ✓✓Decrease the distance between the lines
  • In SVM soft classifier we tradeoff between maximizing ___ and minimizing ___ - ✓✓margin and errors
  • If lambda gets small what gets emphasized, large margin or minimizing training error?,
  • ✓✓Minimizing errors.
  • What is a support vector? - ✓✓A point that holds up a shape.
  • Does ...[⅔(a-1)+1/3(a+1)] move an SVM classifier up or down? - ✓✓Up
  • How do you make errors more costly in a soft SVM classifier? - ✓✓include a multiplier for the point-error term.
  • If an SVM coefficient is very close to zero... - ✓✓that term is not very important to the classification.
  • What is the difference between standardization and scaling? - ✓✓Scaling is bounded in range. Standardization is scaling to a normal distribution. Standardization is the (value - factor mean) / (factor standard deviation)
  • What is the 2-norm? - ✓✓Euclidean distance
  • What is the 1-norm? - ✓✓The rectilinear (Manhattan) distance
  • What is the infinity norm? - ✓✓The value of the largest dimension
  • Measuring the quality of a model is called? - ✓✓Validation
  • What does a confusion matrix show? - ✓✓The performance of a classification model.
  • A time series outlier that seems "off the curve" is called a... - ✓✓contextual outlier.
  • A data element that is different from all other data in a set is called a... - ✓✓point outlier.
  • When something is missing in a range of points - ✓✓it is called a..., collective outlier.
  • The whiskers on a box plot extend to... - ✓✓the 10th and 90th percentiles (or 5th and 95th)
  • Why are hypothesis tests generally not sufficient for change detection? - ✓✓They are slow to detect changes.
  • In CUSUM, T is _____ and C is _____., - ✓✓Threshold and a "bring down factor"
  • In a CUSUM model, you adjust T and C to manage the tradeoff between..., - ✓✓early detection and false-alarms
  • In exponential smoothing, if the data is less random, then you want to pick an alpha that is..., - ✓✓Close to 1.
  • What is the initial condition for T in exponential smoothing with trending? - ✓✓T_i=