Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

ISYE 6501 - Midterm Exam 2 (Latest 2022/2023) Already Graded A, Exams of Nursing

ISYE 6501 - Midterm Exam 2 (Latest 2022/2023) Already Graded A

Typology: Exams

2021/2022

Available from 07/14/2022

Experttutor1
Experttutor1 🇬🇧

3.8

(90)

877 documents

1 / 14

Toggle sidebar

Often downloaded together


Related documents


Partial preview of the text

Download ISYE 6501 - Midterm Exam 2 (Latest 2022/2023) Already Graded A and more Exams Nursing in PDF only on Docsity!

ISYE 6501 - Midterm Exam 2

when might overfitting occur - THE CORRECT ASNWER IS when the # of factors is close to or larger than the # of data points causing the model to potentially fit too closely to random effects Why are simple models better than complex ones - THE CORRECT ASNWER IS less data is required; less chance of insignificant factors and easier to interpret what is forward selection - THE CORRECT ASNWER IS we select the best new factor and see if it's good enough (R^2, AIC, or p-value) add it to our model and fit the model with the current set of factors. Then at the end we remove factors that are lower than a certain threshold what is backward elimination - THE CORRECT ASNWER IS we start with all factors and find the worst on a supplied threshold (p = 0.15). If it is worse we remove it and start the process over. We do that until we have the number of factors that we want and then we move the factors lower than a second threshold (p = .05) and fit the model with all set of factors what is stepwise regression - THE CORRECT ASNWER IS it is a combination of forward selection and backward elimination. We can either start with all factors or no factors and at each step we remove or add a factor. As we go through the procedure after adding each new factor and at the end we eliminate right away factors that no longer appear. what type of algorithms are stepwise selection? - THE CORRECT ASNWER IS Greedy algorithms - at each step they take one thing that looks best what is LASSO - THE CORRECT ASNWER IS a variable selection method where the coefficients are determined by both minimizing the squared error and the sum of their absolute value not being over a certain threshold t How do you choose t in LASSO - THE CORRECT ASNWER IS use the lasso approach with different values of t and see which gives the best trade off why do we have to scale the data for LASSO - THE CORRECT ASNWER IS if we don't the measure of the data will artificially affect how big the coefficients need to be What is elastic net? - THE CORRECT ASNWER IS A variable selection method that works by minimizing the squared error and constraining the combination of absolute values of coefficients and their squares what is a key difference between stepwise regresson and lasso regression - THE CORRECT ASNWER IS If the data is not scaled, the coefficients can have artificially

different orders of magnitude, which means they'll have unbalanced effects on the lasso constraint. Why doesn't Ridge Regression perform variable selection? - THE CORRECT ASNWER IS The coefficients values are squared so they go closer to zero or regularizes them What are the pros and cons of Greedy Algorithms (Forward selection, stepwise elimination, stepwise regression) - THE CORRECT ASNWER IS Good for initial analysis but often don't perform as well on other data because they fit more to random effects than you'd like and appear to have a better fit What are the pros and cons of LASSO and elastic net - THE CORRECT ASNWER IS They are slower but help make models that make better predictions Which two methods does elastic net look like it combines and what are the downsides from it? - THE CORRECT ASNWER IS Ridge Regression and LASSO. Advantages: variable selection from LASSO and Predictive benefits of LASSO. Disadvantages: Arbitrarily rules out some correlated variables like LASSO (don't know which one that is left out should be); Underestimates coefficients of very predictive variables like Ridge Regresison What are some downsides of surveys? - THE CORRECT ASNWER IS Even if you what appears to be a representative sample in simple ways, maybe it isn't in more complex ways. If we're testing to see whether red cars sell for higher prices than blue cars, we need to account for the type and age of the cars in our data set. This is called: - THE CORRECT ASNWER IS Controlling what is a blocking factor - THE CORRECT ASNWER IS a source of variability that is not of primary interest to the experimenter what is an example of a blocking factor - THE CORRECT ASNWER IS The type of car, sports car or family car, is a blocking factor that it could account for some of the difference between red cars and blue cars. Because sports cars are more likely to be red; if we account for the difference, we can reduce the variability in our estimates Under what conditions should you run A/B tests - THE CORRECT ASNWER IS When you can collect data quickly. When the data is representative and the amount of data is small compared to the whole population Do you have to decide the sample size ahead of time for A/B tests - THE CORRECT ASNWER IS no, and we can run the hypothesis test anytime we want

What is full factorial design - THE CORRECT ASNWER IS you test every combination and then use ANOVA to determine importance of each factor What is fractional factorial design - THE CORRECT ASNWER IS when you test a subset of the entire set of combinations What is a balanced design? - THE CORRECT ASNWER IS You test each choice the same # of times and each pair of choices the same # of times When is regression effective work well to determine important factors? - THE CORRECT ASNWER IS If there aren't significant interactions between the factors. what is exploration? - THE CORRECT ASNWER IS focusing on getting more information; in this case, to determine with more certainty which ad is really the best what is exploitation - THE CORRECT ASNWER IS we're focused on getting immediate value; in this example, to show the add that seems to be doing best so far, because it seems to be most likely to be clicked. what is the multi-armed bandit approach and how does it balance exploration and exploitation. - THE CORRECT ASNWER IS We start with no info and have an equal probability of selecting each alternative. After performing some tests, we've gotten more information, so we can update the probabilities of each one being best and start assigning new tests according to those probabilities. We keep testing multiple alternatives; so, we're still doing exploration. But we make it more likely to pick the best ones so we're also doing exploitation What are some of the parameters in the multi-armed bandit approach - THE CORRECT ASNWER IS number of tests between recalculating probabilities; how to update the probabilities; and how to pick an alternative to test based on probabilities and/or expected values. For updating we can use bayesian updates or estimate from the observed distribution What are common reasons that data sets are missing values? - THE CORRECT ASNWER IS * a person accidentally types in the wrong value

  • a person did not want to reveal the true value
  • an automated system did not work correctly to record the value What are some examples of why there might be bias in missing data - THE CORRECT ASNWER IS * Income: people with higher incomes are less likely to omit this answer
  • Radar gun: a car that passes the radar gun very slowly might be treated as an anomaly and its speed might be recorded in the system
  • Heart transplants: If there's a variable "date of death" it will be missing for patients still living and thus the missing data will naturally include more successful transplant cases

What are three ways of dealing with missing data that don't require imputation - THE CORRECT ASNWER IS discard the data, use categorical variables to indicate missing data, estimate missing values What are the pros and cons of throwing away missing data - THE CORRECT ASNWER IS Pros: not potentially introducing errors; easy to implement Cons: don't want to lose to many data points; potential for censored or biased missing data What is the categorical variable approach - THE CORRECT ASNWER IS If the data is categorical, we just add another category "missing". With quantitative variables you include interactions variables between the categorical variable and other variables. Why wouldn't you want to fill in missing quantitative variabes with 0 - THE CORRECT ASNWER IS It can lead to problems if some types of data points are more likely than others to have missing data. The coefficients of the other variables might be pulled in one direction or another to try to account for the missing data What are the advantages and disadvantages of imputing missing data with the mean, median (numeric) or mode (categorical) - THE CORRECT ASNWER IS Advantage: hedge against being too wrong and easy to compute Disadvantage: it can be biased imputation. Example people with high income less likely to answer survey and thus the mean/median will underestimate the missing value What are the advantages and disadvantages of using regression for imputation - THE CORRECT ASNWER IS It reduces or eliminates the problem of bias. Also gives better values for missing data Disadvantages: we have to build, validate and test a whole other model just to fill in the missing data and then we have to do it all over again to get the answer we want. Also we are using the same data twice: once for imputation and a second time to fit the model How does adding variability to a regression imputation compare to one without - THE CORRECT ASNWER IS without: more accurate on average but has less accurate variability with: it's less accurate on average but has more accurate variability When should you not use imputation? - THE CORRECT ASNWER IS When more than 5% of the data is moving per factor

what is the binomial distribution - THE CORRECT ASNWER IS the probability of getting x successes out of n independent identically distributed Bernoulli (p) trials; count of successful coin flips in n trials What happens when n is big for binomial distribution - THE CORRECT ASNWER IS it converges to normal distribution what is a Bernoulli distribution - THE CORRECT ASNWER IS it's like a flipping coin. It can be used to model a single event and is most useful when we put many of them together what are some examples of a geometric distribution - THE CORRECT ASNWER IS How many interviews until first job offer; how many hits until a baseball bat breaks what is a geometric distribution? - THE CORRECT ASNWER IS How many Bernoulli trials until ...; It is the probability of having x Bernoulli(p) falures until first success or having Bernoulli(p) success until first failure In a geometric distribution what is the value that is set to a power - THE CORRECT ASNWER IS The thing you're trying to see how manxy X until something What are the assumptions does a geometric distribution make? - THE CORRECT ASNWER IS Each Bernoulli trial is independent and identically distributed what is the Poisson distribution good at modeling - THE CORRECT ASNWER IS random arrivals what does the Poisson distribution assume - THE CORRECT ASNWER IS arrivals are independent and identically distributed If arrivals are poisson what then the interarrival time is what type of distribution - THE CORRECT ASNWER IS exponential If the interarrival time is exponential what type of distribution is the arrival - THE CORRECT ASNWER IS poisson what is the difference between Weibull and geometric distribution - THE CORRECT ASNWER IS weibull - time between failures; geometric - number of tries between failures What is the weibull distribution useful for modeling - THE CORRECT ASNWER IS time it takes something to fail, specifically time between failures What does k < 1 mean in a weibull distribution - THE CORRECT ASNWER IS modeling when failure rate decreases with time; worst things fail first (mechancial parts), the parts that are left are the better ones and take longer to fail

What does k > 1 mean in a weibull distribution - THE CORRECT ASNWER IS The more worn they get the more likely it is that they'll fail soon, so we'll observe fewer failures at first and more later on What do q-q plots help visual - THE CORRECT ASNWER IS if two data sets follow the same distribution. why are q-q plots sometimes better than statistical tests - THE CORRECT ASNWER IS sometimes the statistical test will lead us in the wrong direction because most points might match but may be bad matches at the ends what is the memoryless property - THE CORRECT ASNWER IS it doesn't matter what's happened in the past, all that matters is where we are now If the data fits exponential distribution is it memoryless? - THE CORRECT ASNWER IS Yes If a data is memoryless is it exponential - THE CORRECT ASNWER IS yes Which distributions are memoryless - THE CORRECT ASNWER IS poisson and exponential Can a distribution not be memoryless and still be exponential - THE CORRECT ASNWER IS no what are deterministic simulations - THE CORRECT ASNWER IS same inputs give the same outputs what are stochastic simulations? - THE CORRECT ASNWER IS when there is randomness what are continuous-time simulations? - THE CORRECT ASNWER IS When changes happen continuously. Example: chemical processes, propagations What are discrete-event simulatons - THE CORRECT ASNWER IS changes happen at discrete time points. Example: call center simulations someone calls worker finishes talking to someone. what are the elements of simulation model? - THE CORRECT ASNWER IS entities, modules, actions, resources, decisions point, and statistical tracking what are entities - THE CORRECT ASNWER IS things that move through the simulation (bags, people, etc)

what are modules - THE CORRECT ASNWER IS parts of process (queues, storage, etc) what are replications - THE CORRECT ASNWER IS number of runs of a simulaiton Why is it important to validate a simulation by comparing to real data as much as possible? - THE CORRECT ASNWER IS If the simulation isn't a good reflection of reality, then any insights we gain from studying the simulation might not be applicable in reality what do prescriptive simulations answer - THE CORRECT ASNWER IS what-if questions what's an example of heuristic optimization - THE CORRECT ASNWER IS what's the best buffer size to have at each step in the process Why do we need to use the same number of random numbers (seed) - THE CORRECT ASNWER IS Because if we wouldn't be able to accurately compare the two sets of replications because the replications while the same distribution are still different How can simulation be used in a prescriptive analytics way, for example to determine the right number of voting machines and poll workers at an election location? - THE CORRECT ASNWER IS * Vary parameters of interest (like the number of voting machines and the number of poll workers), compare the simulated system performance, and select the setup with the best results (for example, the best balance between low waiting times to vote and low cost of machines and workers).

  • Use the automated optimization function in simulation software (for example, OptQuest in Arena) to find parameter values that give good results what is the steady state - THE CORRECT ASNWER IS the point at which the states have gotten so mixed that the initial conditions not longer matter where the probability of being in state i is the same every Do most systems exhibit the memoryless property? - THE CORRECT ASNWER IS thing usually depend on the bast What is the difference between statistical software and optimization software? - THE CORRECT ASNWER IS Statistical software can both build and solve regression models. Optimization software only solves models; human experts are required to build optimization models. What are the three main components of the optimization models? - THE CORRECT ASNWER IS variables, constraints, objective function

In a queuing model given a service time, an arrival rate and number of helpers/servers, how do you determine if you have enough people - THE CORRECT ASNWER IS if service_rate * arrival_rate > servers, wait time high; else wait time low what are variables - THE CORRECT ASNWER IS decisions to be made what are constraints - THE CORRECT ASNWER IS restrictions on variable names Why do we need constraints? - THE CORRECT ASNWER IS Optimization solvers only look at the math. They don't look at the words of what we want the variables to mean. So if we don't use constrains to explicitly tell the solver how variables should related, the solver will go happily ahead and find mathematical solution telling our candidate to visit each state i 12 times, but that each vid should be 0. what is an objective function? - THE CORRECT ASNWER IS The objective function is a measure of the quality of a set of values for the variables, which we're trying to maximize or minimize what is the solution - THE CORRECT ASNWER IS values for each variable what is a feasible solution - THE CORRECT ASNWER IS variable values that satisfy all constraints what is an optimal solution - THE CORRECT ASNWER IS feasible solution with the best objective value What do binary variables do - THE CORRECT ASNWER IS They allow for more- complex models what is the objective function in linear regression - THE CORRECT ASNWER IS trying to minimize the squared error what are the statistical variables and constants in linear regression - THE CORRECT ASNWER IS the data; the coefficients what are the variables and constants in optimization model for linear regression - THE CORRECT ASNWER IS the data is the constant and the coefficients are the variables What is the objective function in logistic regression - THE CORRECT ASNWER IS to minimize the prediction error What are the variables in logistic regreession - THE CORRECT ASNWER IS the coefficients in soft and hard classification SVMs what are the variables - THE CORRECT ASNWER IS the coefficients

what is the constraint in hard classification for SVMS - THE CORRECT ASNWER IS each observation has to be on the right side of the line What is the objective function for hard classification - THE CORRECT ASNWER IS maximize the margin for soft classification what is the objective function - THE CORRECT ASNWER IS minimize classification error and maximize the margin what is the objective function for a time series model - THE CORRECT ASNWER IS minimize prediction error what are the variables in k-means clustering - THE CORRECT ASNWER IS coordinate of cluster centers and if a point is part of certain cluster What are the constraints in k-means clustering - THE CORRECT ASNWER IS each data point is assigned to a cluster What is the objective function in k-means - THE CORRECT ASNWER IS minimize total distance from data points to their cluster centers What are the order of fastest to slowest optimization problems - THE CORRECT ASNWER IS linear programs, convex quadratic programs, convex programs, integer programs, general non-convex programs are convex optimization problems guaranteed to find optimal solution - THE CORRECT ASNWER IS yes are non-convex optimization problems guaranteed to find an optimal solution - THE CORRECT ASNWER IS No; they might converge to an infeasible solution or to a local optimum what is a general non-convex program - THE CORRECT ASNWER IS Optimization problem is not convex what is a linear program? - THE CORRECT ASNWER IS f(x) is a linear function; constraint set X is defined by linear equations and inequalites what is convex quadratic program - THE CORRECT ASNWER IS f(x) is a convex quadratic function. Minimize f(x) or Maximize - f(x). constraint set X is defined by linear equations and inequalites what is constraint set X defined by in linear programs - THE CORRECT ASNWER IS linear equations and inequalities

what is a convex optimization problem - THE CORRECT ASNWER IS objective f(x) is concave (if maximizing) or convex (if minimizing). Constraint set X is a convex set what is constraint set X in a convex optimization progrem - THE CORRECT ASNWER IS a convex set what is a integer program - THE CORRECT ASNWER IS linear program plus some (or all) variables restricted to take only integer values; variables could be binary (either 0 or

what are the basic steps to solve an optimization problem - THE CORRECT ASNWER IS 1) Initialization: pick values for all the variables (they may be simple, bad and not satisfy all of the constraints) 2.) find an improving direction t and make a change in that direction of some amount called the step size (theta) 3.) repeat using the the old solution plus the improving direction times the step size What should you do if your problem is too hard - THE CORRECT ASNWER IS Use a heuristic: rule of thumb process. It is ually gives good solutions what are some common network models? - THE CORRECT ASNWER IS shortest path model - find quickest/shortest route from one place to another; assignment model - which worker gets which job to maximize workforce efficiency; maximum flow model - how much oil can flow through complex network of pipes what type of models are network models - THE CORRECT ASNWER IS linear program what are the constraints in network models - THE CORRECT ASNWER IS flow into node = flow out of node; flow on arc between min and max allowable what do we get without constraints in a network model if data is all integers - THE CORRECT ASNWER IS an optimal solution where all the variables have integer values True or false: Requiring some variables in a linear program to take integer values can make it take a lot longer to solve. - THE CORRECT ASNWER IS True Adding integer variables moves the model from a linear program, which usually solves very quickly, to an integer program, which sometimes takes a long time to solve. Do optimization implicitly assume that we know all of the values of the input data exactly? - THE CORRECT ASNWER IS Optimization models treat all of the data as known exactly what is the structure of dynamic programs - THE CORRECT ASNWER IS states (the exact situation and their values); decisions (choices of next state) What is the structure of stochastic dynamic program - THE CORRECT ASNWER IS dynamic program, but decisions have probabilities of next state

what is the structure of markov decision processes - THE CORRECT ASNWER IS stochastic dynamic program with discrete states and decisions; probabilities depend only on current state/decision In an optimization model what do you if data or parameter isn't known exactly or if forecast values aren't known exactly - THE CORRECT ASNWER IS model conservatively or define some or many different scenarios and model them what is a robust solution and it's possible downsides - THE CORRECT ASNWER IS a solution that satisfy each of the different scenario constraints; it could be very expensive The two main steps of most optimization algorithms are: - THE CORRECT ASNWER IS Find a good direction to move from the current solution, and determine how far to go in that direction. when do we use non-parametric statistical tests - THE CORRECT ASNWER IS for unknown distribution when do you use McNemar's test? - THE CORRECT ASNWER IS when data points where two different approaches were used on the same thing what type of test is McNemar's? - THE CORRECT ASNWER IS binomial What does McNemar's test consider - THE CORRECT ASNWER IS Only the ones when A and B are different. McNemar's test throws out al the cases where the results are the same and tests using the binomial distribution to see whether we'd expect results this extreme or more extreme just by luck. What is the assumption made in Wilcoxon Signed Rank Test for Medians - THE CORRECT ASNWER IS distribution is continuous and symmetric what question is the wilcoxon signed rank test trying to answer - THE CORRECT ASNWER IS is the median of the distribution different from m why is calculating p-values for the Wilcoxon Signed Rank Test is harder than McNemar's test - THE CORRECT ASNWER IS it is like a a normal distribution test unlike McNemar's which is uses the binomial distribution Under which scenarios do we use Wilcoxon vs McNemar - THE CORRECT ASNWER IS McNemar - yes/no; Wilcoxon - numeric what is a non-zero sum game - THE CORRECT ASNWER IS it's possible for the total benefit for everyone to be higher or lower depending on the decisions made.

what is a zero sum game - THE CORRECT ASNWER IS whatever one side gets, the other side looses what is stable equilibrium - THE CORRECT ASNWER IS Neither station has incentive to change what is pure strategy - THE CORRECT ASNWER IS just one choice (gas station) What is mixed strategy - THE CORRECT ASNWER IS randomize decisions according to probabilities (rock, paper, scissors); If your opponent knows you're going to pick rock she'll pick paper and beat you every time. Instead, a randomized or mixed strategy, where you randomly pick each one with probability 1/3 is your best bet. what is perfect information - THE CORRECT ASNWER IS know all about everyone else's situation what is imperfect information - THE CORRECT ASNWER IS Neither one really knows the others profit margins exactly. And in still other situation, some people have more information than others, so it's not symmetric what do descriptive, predictive and prescriptive models assume - THE CORRECT ASNWER IS that the systems do not react what is the idea of deep learning - THE CORRECT ASNWER IS to train a system to react to whatever patterns our human brain is reacting to without knowing what it's reacting to. what are some areas where deep learning is useful - THE CORRECT ASNWER IS recognizing speech and writing, natural language processing, and image recognition. what are the down sides of neural networks? - THE CORRECT ASNWER IS often do't give the best results. They require lots of data to rain and it's often hard to chose and tune the learning algorithm so the weights don't change too slowly but also don't change so quickly that they jump all over the place. What are the three levels of neurons - THE CORRECT ASNWER IS input level, hidden level, and output level What is the process of neural networks? - THE CORRECT ASNWER IS * Each input neuron accepts a single piece of information

  • for example, if we are trying to solve the CAPTCHA problem we might divide the picture of a digit into pixels and every pixel's status between fully white and fully black goes to its own input neuron
  • As inputs come in, they're passed to the first level of hidden neurons. Each simulated neuron calculates a weighted value of those inputs and sends the result to neurons at

the next level. Those next level neurons do the same thing. There may be several layers of hidden neurons one after another.

  • Eventually the output layer neurons get their inputs and each one uses those inputs to find their results.
  • Example: In CAPTCHA have one output neuron for each possible number of letters and each output neuron's result is like a level of certainty that the input image is that number or letter
  • In the end, the models predicted output is whoever output neuron has the highest result What are two situations when nonparametric are useful? - THE CORRECT ASNWER IS
  • we don't know much about the form of the underlying distribution that data comes from, or it doesn't fit a nice distribution
  • it's important to have information about the median
  • Many nonparametric tests focus on the median, and they can be used even when we do not know the form of the underlying distribution.
  • we don't have much data How can Bayesian models incorporate expert opinion when there's not as much data to analyze as we'd like to have? - THE CORRECT ASNWER IS Expert opinion can be used to define the initial distribution of P(A) , and observed data about can be used with Bayes' theorem to obtain a revised opinion P(A|B). The initial distribution assumed for P(A) is called the 'prior distribution' and the revised distribution P(A|B) is called the 'posterior distribution' when is it useful to empirical bayesian modeling - THE CORRECT ASNWER IS In the absence of lots of data What is the gist of the Bayesian approach - THE CORRECT ASNWER IS with a single observation combined with a broader set of observations we can make a deduction or prediction what is a clique - THE CORRECT ASNWER IS a set of nodes that all have edges between each other what is a comunity - THE CORRECT ASNWER IS a set of circles that's highly connected within itself what does the Louvian algorithm do - THE CORRECT ASNWER IS decompose a graph into communities How does the Louvian algorithm work - THE CORRECT ASNWER IS Step 0: starts with each node being its own community. Step1: Repeat ... make biggest modularity increase by moving a node from its community to an adacent node's community until no move increases modularity

Step2: Each community is a super-node. Repeat step 1 using super nodes Suppose we have a graph where all edge weights are equal to 1. In the video, we saw how to split a graph up into highly-interconnected communities. Now, instead we want to split the nodes into large groups that have very few connections between them (for example, if a marketer wants to find sets of people in a social network who probably have very different sets of friends). How might you do that? - THE CORRECT ASNWER IS Change the graph: for every pair of nodes i and j , if there's an edge between i and j then remove it; and if there's not an edge between i and j , then add it. Then run the Louvain algorithm on the new graph. what is modularity? - THE CORRECT ASNWER IS a measure of how well the graph is separated into communities or modules that are connected a lot internally but not connected much between each other. what is the downside of the Louvian algorithm - THE CORRECT ASNWER IS It is a heurisic so its not guaranteed to find the absolute best algorithm, but it gives very good solutions very quickly How do you know when to switch from exploration to exploitation - THE CORRECT ASNWER IS when a significant difference has been achieved (non-overlapping CIs) How do you widdle down the options in a multi-armed bandit model when switching to exploitation - THE CORRECT ASNWER IS See which models have an overlapping confidence interval of the best model order linear regression, ridge regession, elastic net and lasso regression in order of variable selected ascending - THE CORRECT ASNWER IS lasso, elastic net, ridge and linear (tied) What does the beginning of a graph showing an average x (e.g., wait time) over the number of replications show? - THE CORRECT ASNWER IS The amount of uncertainty at the beginning