
























Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
A comprehensive glossary of terms and concepts relevant to mathematical modeling and industrial and systems engineering (isye). It includes definitions of statistical methods, optimization techniques, and simulation concepts, such as 1-norm, a/b testing, anova, arima, and bayesian regression. The document also covers various aspects of modeling, including dynamic programming, network optimization, and experimental design, making it a valuable resource for students and professionals in the field. It is useful for exam preparation and quick reference.
Typology: Exams
1 / 32
This page cannot be seen from the preview
Don't miss anything!

























1 - norm Correct Answer-Similar to rectilinear distance; measures the straight-line length of a vector from the origin. If z=(z1,z2,...,zm) is a vector in an m-dimensional space, then it's 1 - norm is square root(|𝑧1|+|𝑧2|+⋯+|𝑧𝑚| = |𝑧1|+|𝑧2|+⋯+|𝑧| = Σm over i= 1 |𝑧𝑖| A/B Testing Correct Answer-testing two alternatives to see which one performs better 2 - norm Correct Answer-Similar to Euclidian distance; measures the straight-line length of a vector from the origin. If z=(z1,z2,...,zm) is a vector in an 𝑚-dimensional space, then its 2 - norm is the same as 1 - norm but everything is squared= square root(Σm over i=1 (|𝑧𝑖|)^2) Accuracy Correct Answer-Fraction of data points correctly classified by a model; equal to TP+TN / TP+FP+TN+FN Action Correct Answer-In ARENA, something that is done to an entity. Additive Seasonality Correct Answer-Seasonal effect that is added to a baseline value (for example, "the temperature in June is 10 degrees above the annual baseline"). Adjusted R-squared Correct Answer-Variant of R2 that encourages simpler models by penalizing the use of too many variables. AIC Correct Answer-Akaike information criterion- Model selection technique that trades off between model fit and model complexity. When comparing models, the model with lower AIC is preferred. Generally penalizes complexity less than BIC. Algorithm Correct Answer-Step-by-step procedure designed to carry out a task. Analysis of Variance/ANOVA Correct Answer-Statistical method for dividing the variation in observations among different sources.
Approximate dynamic program Correct Answer-Dynamic programming model where the value functions are approximated. Arc Correct Answer-Connection between two nodes/vertices in a network. In a network model, there is a variable for each arc, equal to the amount of flow on the arc, and (optionally) a capacity constraint on the arc's flow. Also called an edge. Area under the curve (AUC) Correct Answer-Area under the ROC curve; an estimate of the classification model's accuracy. Also called concordance index. ARIMA Correct Answer-Autoregressive integrated moving average. Arrival Rate Correct Answer-Expected number of arrivals of people, things, etc. per unit time -- for example, the expected number of truck deliveries per hour to a warehouse. Assignment Problem Correct Answer-Network optimization model with two sets of nodes, that finds the best way to assign each node in one set to each node in the other set. Attribute Correct Answer-A characteristic or measurement - for example, a person's height or the color of a car. Generally interchangeable with "feature", and often with "covariate" or "predictor". In the standard tabular format, a column of data. Autoregression Correct Answer-Regression technique using past values of time series data as predictors of future values. Autoregressive integrated moving average (ARIMA) Correct Answer-Time series model that uses differences between observations when data is nonstationary. Also called Box-Jenkins. Backward elimination Correct Answer-Variable selection process that starts with all variables and then iteratively removes the least-immediately-relevant variables from the model. Balanced Design Correct Answer-Set of combinations of factor values across multiple factors, that has the same number of runs for all combinations of levels of one or more factors.
Blocking Correct Answer-Factor introduced to an experimental design that interacts with the effect of the factors to be studied. The effect of the factors is studied within the same level (block) of the blocking factor. box and whisker plot Correct Answer-Graphical representation data showing the middle range of data (the "box"), reasonable ranges of variability ("whiskers"), and points (possible outliers) outside those ranges. Box-Cox Transformation Correct Answer-Transformation of a non-normally-distributed response to a normal distribution. Branching Correct Answer-Splitting a set of data into two or more subsets, to each be analyzed separately. CART Correct Answer-Classification and regression trees. Categorical Data Correct Answer-Data that classifies observations without quantitative meaning (for example, colors of cars) or where quantitative amounts are categorized (for example, " 0 - 10 , 11 - 2 0, ..."). Causation Correct Answer-Relationship in which one thing makes another happen (i.e., one thing causes another). Chance Constraint Correct Answer-A probability-based constraint. For example, a standard linear constraint might be 𝐴x≤𝑏. A similar chance constraint might be Pr (𝐴x≤𝑏)≥0. 95 Change Detection Correct Answer-Identifying when a significant change has taken place in a process. Classification Correct Answer-The separation of data into two or more categories, or (a point's classification) the category a data point is put into. Classification tree Correct Answer-Tree-based method for classification. After branching to split the data, each subset is analyzed with its own classification model.
Classifier Correct Answer-A boundary that separates the data into two or more categories. Also (more generally) an algorithm that performs classification. Clique Correct Answer-A set of nodes where each pair is connected by an arc. Cluster Correct Answer-A group of points identified as near/similar to each other. Cluster Center Correct Answer-In some clustering algorithms (like 𝑘𝑘-means clustering), the central point (often the centroid) of a cluster of data points. Clustering Correct Answer-Separation of data points into groups ("clusters") based on nearness/similarity to each other. A common form of unsupervised learning. Collective outlier Correct Answer-A set of data points that is (uncommonly) different from others - for example, a missing heartbeat in an electrocardiogram; we don't know exactly which millisecond it should've happened in, but collectively there's a set of milliseconds that it's missing from. Concave Function Correct Answer-A function f() where for every two points 𝑥 and 𝑦, 𝑓(𝑐x+ ( 1 −𝑐)𝑦) ≥ 𝑐𝑓(𝑥) + (1−𝑐)𝑓(𝑦) for all 𝑐 between 0 and 1. In two dimensions, this means if the points (𝑥,𝑓(𝑥)) and (𝑦,𝑓(𝑦)) are connected with a straight line, the line is always below [or equal to] the function's curve between those two points. If 𝑓() is concave, then −𝑓() is convex. concordance index Correct Answer-Area under the ROC curve; an estimate of the classification model's accuracy. Also called AUC. Confusion matrix Correct Answer-Visualization of classification model performance. Constant Correct Answer-A number that remains the same. constraint Correct Answer-Part of an optimization model that describes a restriction on the solution (the values of the variables).
Cooperative Game Theory Correct Answer-A game theory setting where the participants are also working together to achieve some goal, while also competing in some way. Corrected AIC Correct Answer-Improved version of AIC, especially when sample size is small. Correlation Correct Answer-Relationship in which two things are likely to happen together, regardless of whether one causes the other. (There is also a quantitative statistical definition measuring the amount of correlation.) Covariate Correct Answer-A characteristic or measurement that can be used to estimate the value of something - for example, a person's height or the color of a car. A "feature" or "attribute"; in the standard tabular format, a column of data. Cross-validation Correct Answer-Validation technique where a model is tested on data different from what it was trained on. CUSUM Correct Answer-Change detection method that compares observed distribution mean with a threshold level of change. Data Point Correct Answer-Observation/record of (perhaps multiple) measurements for a single member of a population or data set. In the standard tabular format, a row of data. Decision Correct Answer-Choice of action. Decision Point Correct Answer-Place in a simulation where there is a branch (or decision to be made or observed). Decision Tree Correct Answer-Tree-based method for decision-making. After branching to split the data, each subset is analyzed with its own decision model (or just has its own decision applied). Deep Learning Correct Answer-Neural network-type model with many hidden layers.
Descriptive Analytics Correct Answer-Loosely speaking, the use of analytics to explain or describe what has happened. Design of Experiments Correct Answer-Choosing a set of tests to be made to find the effect of input variables on an outcome. Deterministic Simulation Correct Answer-Simulation with no randomness/uncertainty, so results are the same each run Detrending Correct Answer-Removal of trend, such as a change in the mean over time, from time-series data. Diagnostics odds ratio Correct Answer-Ratio of the odds that a data point in a certain category is correctly classified by a model, to the odds that a data point not in that category is incorrectly classified by the model; equal to (TP/FN) / (FP/TN) = (TNTP) / (FNFP) Diet Problem Correct Answer-Classical optimization model for finding the least-costly set of foods that meets all dietary requirements. Differencing Correct Answer-Using the difference of successive values in time series data, rather than the values themselves. Sometimes nonstationary data will have stationary differences. Dimension Correct Answer-A feature of the data points (for example, height or credit score). (Note that there is also a mathematical definition for this word.) Discrete-event simulation Correct Answer-A simulation that models a system that changes when specific events occur. Distance Correct Answer-How far it is between two points -- but there are different ways to measure it (see Minkowski distance). Distribution-fitting Correct Answer-Determining whether a set of data seems to follow a certain probability distribution, or determining which of several distributions the data is close to.
Error(per data point) Correct Answer-The difference (or absolute difference, squared difference, or other measure) between the estimate of a piece of data and its true value. Error(total over data set) Correct Answer-The total of all errors in a data set. Euclidian distance/straight-line distance Correct Answer-The length of a straight line (the 2 - norm distance) between two points. If 𝑥=(𝑥 1 ,𝑥2,...,𝑥𝑚) and 𝑦=(𝑦1,𝑦2,...,𝑦𝑚) are two points in an 𝑚- dimensional space, is the square root( (x 1 −𝑦 1 )^2+(𝑥 2 −𝑦2)^2+⋯+(𝑥𝑚−𝑦𝑚)^ 2 )=square root( Σm over i=1(𝑥𝑖−𝑦𝑖)^2) Expectation-maximization algorithm (EM algorithm) Correct Answer-General description of an algorithm with two steps (often iterated), one that finds the function for the expected likelihood of getting the response given current parameters, and one that finds new parameter values to maximize that probability. exploitation Correct Answer-Using known information to get good outcomes. Exploration Correct Answer-Finding new/better/more information to determine how to optimize output. Exponential Distribution Correct Answer-A continuous probability distribution of the time between events: 𝑓(𝑥)=𝜆𝑒^−𝜆x. If the number of events in a fixed time follows the Poission distribution, then the time between them has the exponential distribution. The exponential distribution has the memoryless property. Exponential smoothing Correct Answer-Data smoothing technique in which older observations are assigned exponentially decresing weights, so more emphasis is given to recent observations. Factorial Design Correct Answer-Tests of different combinations of factor values over multiple factors, to find each one's effect, and interaction effects, on the outcome. Fall out Correct Answer-Fraction of data points not in a certain category that are incorrectly classified by a model; equal to FP / TN+FP Also called false positive rate.
False Negative (FN) Correct Answer-Data point that a model incorrectly classifies as not being in a certain category. ("Negative" means the model classified it as not being in the category, and "False" means the model's classification is incorrect.) Sometimes abbreviated as "FN". False Negative Rate Correct Answer-Fraction of data points in a certain category that are incorrectly classified by a model; equal to FN / TP+FN. Also called miss rate. False Positive (FP) Correct Answer-Data point that a model incorrectly classifies as being in a certain category. ("Positive" means the model classified it as being in the category, and "False" means the model's classification is incorrect.) Sometimes abbreviated as "FP". False Positive Rate Correct Answer-Fraction of data points not in a certain category that are incorrectly classified by a model; equal to FP / TN+FP. Also called fall out. False Omission Rate Correct Answer-Fraction of data points the model classifies as not in a certain category, that are really in the category; equal to FN / TN+FN. feasible solution Correct Answer-A solution that satisfies a set of constraints. Feature Correct Answer-(1) A characteristic or measurement - for example, a person's height or the color of a car. Generally interchangeable with "attribute", and often with "covariate" or "predictor". In the standard tabular format, a column of data. Also called an attribute. (2) A combination of attributes in a specific format - for example, 0. 5 ×height plus 7×shoe-size. FIFO Correct Answer-First-in, first-out: The first entity to join a queue is the first one to come out -- for example, a supermarket checkout line. Fitting Correct Answer-Finding a model (including, if appropriate, a probability distribution) that is a good description of real effects in a set of data. The model is sometimes called a "fit". Fixed Charge Correct Answer-In optimization models, a cost that depends only on whether something happens, but not how much - for example, a transaction cost for buying or selling stock that is the same regardless of how many shares are bought or sold.
Greedy Algorithm Correct Answer-Algorithm that makes the immediately-best choice at each step. Heteroscedasticity Correct Answer-When the variability of a response is different across the range of predictor values. Heuristic Correct Answer-Algorithm that is not guaranteed to find the absolute best (optimal) solution. Hit rate Correct Answer-Fraction of data points in a certain category that are correctly classified by a model; equal to TP / TP+FN; also called the true positive rate, sensitivity, and recall. Holt-Winters method/Winters' method Correct Answer-Three-parameter exponential smoothing technique that incorporates trend and seasonality; also called triple exponential smoothing. Hypothesis test Correct Answer-Statistical test to determine the probability that a property of a sample of data is true for the whole population. iid (Independent and identically distributed.) Correct Answer-Things that follow the same probability distribution, including the same parameter(s), and whose values are independent of each other. For example, multiple flips of the same coin are iid. Improving direction Correct Answer-Vector of changes to a solution to an optimization problem, such that the objective function gets better when moving the solution some distance in the vector's direction. Imputation Correct Answer-Inserting values where data is missing. Independent Correct Answer-A is "independent" of B if the probability or probability distribution of A is not affected by B. For example, whether a coin flip is heads or tails is (I assume) independent of the number of fish in the ocean exactly 100 years ago to this day, but the temperature today is not independent of the temperature yesterday (if it was hot yesterday, it's more likely to be hot today too, etc.). Infinity norm Correct Answer-Specific case of p-norm when 𝑝=∞. Sounds weird, but it just reduces to the largest of the dimensions. If 𝑧=(𝑧1,𝑧2,...,𝑧𝑚) is a vector in an 𝑚-dimensional space, then its ∞-norm
is max𝑖|𝑧𝑖|. If 𝑥=(𝑥 1 ,𝑥 2 ,...,𝑥𝑚) and 𝑦=(𝑦1,𝑦2,...,𝑦𝑚) are two points in an 𝑚-dimensional space, then the ∞-norm distance between them is max𝑖|𝑥𝑖−𝑦𝑖|. Initialization Correct Answer-Setting starting values in an algorithm, or setting the first solution value for an "direction/step-size" optimization algorithm. Integer program Correct Answer-Optimization model where the objective function is a linear function of the variables, the constraints are linear equations and/or linear inequalities in terms of the variables, and some or all variables are restricted to have integer values. Interaction term Correct Answer-Variable in a model that is the combination of two or more other variables; for example, if 𝑥1 and 𝑥2 are variables, (𝑥 1 𝑥2) is an interaction term/interaction variable. interarrival times Correct Answer-The time between two consecutive arrivals of people, things, etc. -- for example, the time between consecutive phone calls to a service hotline. Iterate Correct Answer-Repeat the same steps of a process. k-fold cross validation Correct Answer-Validation technique where data is divided into several parts ("folds"), and each part is used to validate a model fit to the remaining parts. Often a more robust validation approach than splitting data into training and validation sets. K-Means Algorithm Correct Answer-Clustering algorithm that defines 𝑘 clusters of data points, each corresponding to one of 𝑘 cluster centers selected by the algorithm. K-Nearest Neighbor Correct Answer-Classification algorithm that defines a data point's category as a function of the nearest 𝑘 data points to it. k-nearest neighbor regression Correct Answer-Regression model where a data point's response is estimated based on the responses of the 𝑘 nearest data points with known response.
linear inequality Correct Answer-Inequality where a linear function is set to be greater-than-or-equal-to or less-than-or-equal-to a constant or another linear function. Linear program Correct Answer-An mathematical programming model where the objective function is a linear function of the variables, and the constraints are linear equations and/or linear inequalities in terms of the variables. linear regression Correct Answer-Regression model where the relationships between attributes and a response are modeled as linear functions: 𝑦=𝑎0+Σ m over i= 1 𝑎𝑖𝑥𝑖 Local optimum/maximum/minimum Correct Answer-A solution that achieves a better objective value than any feasible solutions that are close to it; sometimes also used to refer to that solution's objective value. Logistic regression Correct Answer-Regression model that uses an exponential function of variables to estimate a response that is either between 0 and 1, or must be equal to 0 or 1: y= 1 / 1 +𝑒^-(a0+Σ m over i=1 𝑎𝑖𝑥𝑖). Also called a logit model. Logit model Correct Answer-Regression model that uses an exponential function of variables to estimate a response between 0 and 1: y= 1 / 1 +𝑒^-(a0+Σ m over i=1 𝑎𝑖𝑥𝑖). Also called a logistic regression. Louvain algorithm Correct Answer-Algorithm for finding highly-connected communities in networks. Lower tail Correct Answer-Lowest-value part of a distribution Machine Correct Answer-Apparatus that can do something; in "machine learning", it often refers to both an algorithm and the computer it's run on. (Fun fact: before computers were developed, the term "computers" referred to people who did calculations quickly in their heads or on paper!) Machine Learning Correct Answer-Use of computer algorithms to learn and discover patterns or structure in data, without being programmed specifically for them.
Manhattan Distance Correct Answer-The sum of the lengths in each dimension between two points (the 1 - norm distance). If 𝑥=(𝑥 1 ,𝑥2,...,𝑥𝑚) and 𝑦=(𝑦1,𝑦2,...,𝑦𝑚) are two points in an 𝑚-dimensional space, then the rectilinear distance between them is square root( |𝑥1−𝑦1| + |𝑥2−𝑦2| +⋯+|𝑥𝑚−𝑦𝑚| ) = |𝑥1−𝑦1|+|𝑥2−𝑦 2 |+⋯+|𝑥𝑚−𝑦𝑚| = Σ m over i=1 |𝑥𝑖−𝑦𝑖|. Also called Rectilinear or 1 - norm distance. Mann-Whitney test Correct Answer-Nonparametric test to determine whether medians of two independent or unpaired samples (possibly of different size) are the same. Also called Wilcoxon sum rank test. Margin Correct Answer-For a single point, the distance between the point and the classification boundary; for a set of points, the minimum distance between a point in the set and the classification boundary. Also called the separation. Markov chain Correct Answer-Process where a system changes its state in a way that depends only on its current state. Markov decision process Correct Answer-Markov chain model where decisions are made at some states, and state transitions have associated rewards. MARS (Multi-adaptive regression splines.) Correct Answer-Specific regression spline model that has become commonly-used. Also called "earth". Mathematical Programming Correct Answer-Mathematical optimization, often using variables, constraints, and objective function. Maximization Problem Correct Answer-Optimization model where the objective is to find the feasible solution that maximizes the value of the objective function. Maximum Flow Problem Correct Answer-Network optimization model that finds the most flow that can be sent from one specific node to another. maximum likelihood Correct Answer-A method that finds the set of parameter values for which a model is most likely to generate the actual values of the data.
Modularity Correct Answer-Measure of the density of connections between communicates in a network. Module Correct Answer-In ARENA, a building-block of a simulation, or the process, resource, etc. it represents. Most Optimal Correct Answer-Please don't say this (or "more optimal"). "Optimal" means "best", and "most best" or "more best" are not proper English. Moving Average Correct Answer-Smoothing technique that replaces data values with the mean of a number of consecutive observed values. multi-armed bandit Correct Answer-Model that allows the tradeoff between exploration of unknown resources and exploitation of known resources to optimize output. Multiplicative seasonality Correct Answer-Seasonal effect that is multiplied by a baseline value (for example, "the temperature in June is 20% higher than the annual baseline"). Multiplier Correct Answer-A term that something is multiplied by. For example, to change units from meters to centimeters, the multiplier is 1 00. Negative likelihood ratio Correct Answer-Ratio of the fraction of data points in a certain category that are misclassified as not in the cateogry, to the fraction of data points not in the category that are correctly classified as not being in the category; equal to ( 1 - sensitivity)/specificity = (FN/(FN+TP)) / (TN/(TN+FP)) Negative predictive value Correct Answer-Fraction of data points classified as not in a certain category that are really not in that category; equal to TN / TN+FN Network Correct Answer-Model where locations (nodes or vertices) are connected by arcs or edges, with flow on the arcs from node to node.
Network Optimization problem Correct Answer-Optimization problem that can be modeled as a network with nodes and arcs, where each variable represents the flow on an arc, with constraints to ensure that the flow into each node equals the flow out of it, and to put a capacity on the flow on each arc. Neural network Correct Answer-A machine learning model that itself is modeled after the workings of neurons in the brain. node Correct Answer-Location in a network. In a network model, there is a constraint for each node to ensure that the incoming flow equals the outgoing flow. Also called a vertex. non-convex program Correct Answer-Optimization model where the constraint set is not convex, and/or the objective function is to minimize a nonconvex function or to maximize a nonconcave function. non-negativity constraints Correct Answer-Constraints that require variables to be greater than or equal to zero. Non-parametric tests Correct Answer-Statistical test that makes no assumptions about the population distribution from which the data is sampled. Often focus on the median. norm/distance norm Correct Answer-A function that measures the size/length of a vector and satisfies some basic technical properties that are beyond the scope of this course. In this course, we focus on Minkowski norm (or p-norm). Normal distribution Correct Answer-Continuous probability distribution: 𝑓(𝑥)=(1/𝜎*√2𝜋) * 𝑒^−(𝑥−𝜇)^ 2 / 2 𝜎^2. Model error is often assumed to be normally distributed (for example, in linear regression). objective function Correct Answer-Part of an optimization model that measures the quality of a solution (the values of the variables). Observation Correct Answer-(1) A measurement of one attribute of a data point. (2) A measurement of all attributes of a data point (i.e., a full row of data). (3) The act of watching/measuring/recording something.