


































































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
ACTUAL COMPLETE REAL EXAM QUESTIONS AND CORRECT DETAILED ANSWERS (VERIFIED ANSWERS ) ALREADY GRADED A+//BRAND NEW!!!
Typology: Exercises
1 / 74
This page cannot be seen from the preview
Don't miss anything!



































































when might overfitting occur - ANSWER-when the # of factors is close to or larger than the # of data points causing the model to potentially fit too closely to random effects
Why are simple models better than complex ones - ANSWER- less data is required; less chance of insignificant factors and easier to interpret
What is the objective function in linear regression? - ANSWER- trying to minimize the squared error in linear regression is an example of this
What are the statistical variables and constants in linear regression? - ANSWER-the data; the coefficients in linear regression are examples of these
what are the variables and constants in optimization model for linear regression? *** - ANSWER-the data is the constant and the coefficients are the variables
What is the objective function in logistic regression - ANSWER- to minimize the prediction error in logistic regression
What are the variables in logistic regreession - ANSWER-the coefficients in logistic regression
in soft and hard classification SVMs what are the variables - ANSWER-the coefficients in SVMs
what is the constraint in hard classification for SVMS - ANSWER-each observation has to be on the right side of the line in SVMs
What is the objective function for hard classification? - ANSWER-to maximize the margin in SVMs
For soft classification, what is the objective function? - ANSWER-to minimize classification error and maximize the margin in SVMS
what is the objective function for a time series model? - ANSWER-to minimize prediction error for a time series model
what is a linear program? - ANSWER-f(x) is a linear function; constraint set X is defined by linear equations and inequalites
what is convex quadratic program - ANSWER-f(x) is a convex quadratic function. Minimize f(x) or Maximize -f(x). constraint set X is defined by linear equations and inequalites
what is forward selection - ANSWER-we select the best new factor and see if it's good enough (R^2, AIC, or p-value) add it to our model and fit the model with the current set of factors. Then at the end we remove factors that are lower than a certain threshold
what is backward elimination - ANSWER-we start with all factors and find the worst on a supplied threshold (p = 0.15). If it is worse we remove it and start the process over. We do that until we have the number of factors that we want and then we move the factors lower than a second threshold (p = .05) and fit the model with all set of factors
what is stepwise regression - ANSWER-it is a combination of forward selection and backward elimination. We can either start with all factors or no factors and at each step we remove or add
a factor. As we go through the procedure after adding each new factor and at the end we eliminate right away factors that no longer appear.
what type of algorithms are stepwise selection? - ANSWER- Greedy algorithms - at each step they take one thing that looks best
what is LASSO - ANSWER-a variable selection method where the coefficients are determined by both minimizing the squared error and the sum of their absolute value not being over a certain threshold t
How do you choose t in LASSO - ANSWER-use the lasso approach with different values of t and see which gives the best trade off
why do we have to scale the data for LASSO - ANSWER-if we don't, the measure of the data will artificially affect how big the coefficients need to be
What is elastic net? - ANSWER-A variable selection method that works by minimizing the squared error and constraining the combination of absolute values of coefficients and their squares
Advantages: variable selection from LASSO and Predictive benefits of Ridge.
Disadvantages: Arbitrarily rules out some correlated variables (e.g. LASSO doesn't know which one should be left out); Underestimates coefficients of very predictive variables (i.e. Ridge Regression)
What are some downsides of surveys? - ANSWER-Even if you have what appears to be a representative sample in simple ways, maybe it isn't in more complex ways.
If we're testing to see whether red cars sell for higher prices than blue cars, we need to account for the type and age of the cars in our data set. This is called: - ANSWER-Controlling
what is a blocking factor *** - ANSWER-a source of variability that is not of primary interest to the experimenter
what is an example of a blocking factor - ANSWER-The type of car, sports car or family car, is a blocking factor that it could account for some of the difference between red cars and blue cars. Because sports cars are more likely to be red; if we
account for the difference, we can reduce the variability in our estimates
Under what conditions should you run A/B tests - ANSWER- When you can collect data quickly. When the data is representative and the amount of data is small compared to the whole population
Do you have to decide the sample size ahead of time for A/B tests - ANSWER-no, and we can run the hypothesis test anytime we want
What is full factorial design - ANSWER-you test every combination and then use ANOVA to determine importance of each factor
What is fractional factorial design - ANSWER-when you test a subset of the entire set of combinations
What is a balanced design? - ANSWER-You test each choice the same # of times and each pair of choices the same # of times
When is regression effective in variable selection? - ANSWER-If there aren't significant interactions between the factors.
values. For updating we can use bayesian updates or estimate from the observed distribution
What are common reasons that data sets are missing values? - ANSWER-* a person accidentally types in the wrong value
What are some examples of why there might be bias in missing data - ANSWER-* Income: people with higher incomes are more likely to omit this answer
What are three ways of dealing with missing data that don't require imputation - ANSWER-discard the data, use categorical variables to indicate missing data, estimate missing values
What are the pros and cons of throwing away missing data - ANSWER-Pros: not potentially introducing errors; easy to implement
Cons: don't want to lose to many data points; potential for censored or biased missing data
What is the categorical variable approach - ANSWER-If the data is categorical, we just add another category "missing". With quantitative variables you include interactions variables between the categorical variable and other variables.
Why wouldn't you want to fill in missing quantitative variabes with 0 - ANSWER-It can lead to problems if some types of data points are more likely than others to have missing data. The coefficients of the other variables might be pulled in one direction or another to try to account for the missing data
What are the advantages and disadvantages of imputing missing data with the mean, median (numeric) or mode (categorical) - ANSWER-Advantage: hedge against being too wrong and easy to compute
what is the binomial distribution - ANSWER-the probability of getting x successes out of n independent identically distributed Bernoulli (p) trials; e.g. count of successful coin flips in n trials
What happens when n is big for binomial distribution - ANSWER-it converges to normal distribution
what is a Bernoulli distribution - ANSWER-it's like a flipping coin. It can be used to model a single event and is most useful when we put many of them together
what are some examples of a geometric distribution *** - ANSWER-How many interviews until first job offer; how many hits until a baseball bat breaks
what is a geometric distribution? - ANSWER-How many Bernoulli trials until ...; It is the probability of having x Bernoulli(p) falures until first success or having Bernoulli(p) success until first failure
In a geometric distribution what is the value that is set to a power - ANSWER-The thing you're trying to see how many X until something
What are the assumptions does a geometric distribution make?
what is the Poisson distribution good at modeling - ANSWER- random arrivals
what does the Poisson distribution assume - ANSWER- independent and identically distributed
If arrivals are poisson what then the interarrival time is what type of distribution - ANSWER-exponential
If the inter-arrival time is exponential what type of distribution is the arrival - ANSWER-poisson distribution is the underlying distribution for what 'inter-arrival', or 'until-when' distribution
what is the difference between Weibull and geometric distribution - ANSWER-weibull - time between failures; geometric - number of tries between failures
What is the weibull distribution useful for modeling - ANSWER- time it takes something to fail, specifically time between failures
Ways to Deal With Uncertainty in Optimization - ANSWER- Mathematical programming
Model Conservatively
Scenario Modeling
Dynamic Programming
Stochastic Dynamic Program
Markov Decision Process
Dynamic Programming - ANSWER-Divides systems into states (what's going on in the system) States - the exact situations and their values
At each state the decision maker gets to make a decision Decision - choices of next state Based on the decision the system moves to the next state
Use Bellman's equation to determine optimal decision to make at each state
assumes there's no uncertainty
Stochastic Dynamic Program - ANSWER-Dynamic program but decisions have probabilities of next state
Markov Decision Process - ANSWER-Have a discrete set of states and decisions, and the probabilities only depend on the current state/decision (memoryless)
Stochastic dynamic program with discrete states and decisions
Probabilities only depend on current state/decision
Basic 2 Steps of Optimization Algorithms - ANSWER-1) Initialization - create a first solution (values for all variables), can be simple/bad/infeasible
2-a) start with current solution and find a vector of relative changes to make to each variable
2-b) make changes in that improving direction some amount
stop process when solution doesn't change much or time runs out
Mann-Whitney (unpaired)
what tests are use for 1 data set - ANSWER-Wilcoxon signed rank test (compare possible median)
McNemar's Test - ANSWER-Compare results on pairs of responses data points where two different approaches were used on the same thing
Don't need to know anything about the distribution, it's just a comparison of pairs of results
Wilcoxon Signed Rank Test for Medians - ANSWER- Assumption - distribution is continuous and symmetric, Is the median different from a specific value m
If the probability p of getting a sum of ranks at least as extreme as W is small, then we can say the median is probably different from m, otherwise we don't
version for paired samples
When to use McNemar vs Wilcoxon - ANSWER-numeric data - use Wilcoxon
Yes/no data - us McNemar
Mann-Whitney Test - ANSWER-2 data sets, but not paired samples Assume all observations are independent of each other
Sum up all yi and all zi and compare smallest sum against a table that gives the significance of the difference
What kind of model do you use based on the metric? - ANSWER-Mean or other metric using exact values of data - use parametric
median of other metric using ranks of values of data - use non- parametric
success probability, or other metric using counts of binary outcomes of data - use binomial
Empirical Bayes Modeling - ANSWER-Overall distribution of something is known or estimated
Only a little data available for a specific case