Download ADVANCED AND MULTIVARIATE STATISTICAL METHODS PRACTICAL APPLICATION AND INTERPRETATION 7TH and more Exams Quantitative Techniques in PDF only on Docsity!
ADVANCED AND MULTIVARIATE
STATISTICAL METHODS PRACTICAL
APPLICATION AND INTERPRETATION 7TH
EDITION CRAIG MERTLER RACHEL
VANNATTA KRISTINA TEST BANK
CERTIFICATION REVIEW SET 2026
ANSWERS GUARANTEED PASS
⫸ Bivariate partial correlation Answer: Simple (two-variable) correlation between two sets of residuals (unexplained variances) that remain after the association of other independent variables is removed ⫸ Bootstrapping Answer: - An approach to validating a multivariate model by drawing a large number of subsamples and estimating models for each subsample
- Estimates from all the subsamples are then combined, providing not only the "best" estimated coefficients (e.g., means of each estimated coefficient across all the subsample models), but their expected variability and thus their likelihood of differing from zero, that is, are the estimated coefficients statistically different from zero or not?
- This approach does not rely on statistical assumptions about the population to assess statistical significance, but instead makes its assessment based solely on the sample data.
⫸ Causal inference Answer: Methods that move beyond statistical inference to the stronger statement of "cause and effect" in non- experimental situations ⫸ Composite measure Answer: Fundamental element of multivariate measurement by the combination of two or more indicators ⫸ Cross-validation Answer: Method of validation where the original sample is divided into a number of smaller sub-samples (validation samples) and that the validation fit is the "average" fit across all of the sub-samples ⫸ Data mining models Answer: - Models based on algorithms (e.g., neural networks, decision trees, support vector machine) that are widely used in many Big Data applications
- Their emphasis is on predictive accuracy rather than statistical inference and explanation as seen in statistical/data models such as multiple regression ⫸ Dependence technique Answer: - Classification of statistical techniques distinguished by having a variable or set of variables identified as the dependent variable(s) and the remaining variables as independent
- The objective is prediction of the dependent variable(s) by the independent variable(s). An example is regression analysis
(e.g., multiple regression, ANONA/MANOVA, discriminant analysis) with the assumption of a normally distributed dependent measure ⫸ Generalized linear model (GLZ or GLiM) Answer: - Similar in form to the general linear model, but able to accommodate non-normal dependent measures such as binary variables (logistic regression model)
- Uses maximum likelihood estimation rather than ordinary least squares ⫸ Independent variable Answer: Presumed cause of any change in the dependent variable ⫸ Indicator Answer: Single variable used in conjunction with one or more other variables to form a composite measure ⫸ Interdependence technique Answer: Classification of statistical techniques in which the variables are not divided into dependent and independent sets, rather, all variables are analyzed as a single set (e.g., exploratory factor analysis) ⫸ Measurement error Answer: Inaccuracies of measuring the "true" variable values due to the fallibility of the measurement instrument (i.e., inappropriate response scales), data entry errors, or respondent errors ⫸ Metric data Answer: - Also called quantitative data, interval data, or ratio data, these measurements identify or describe subjects (or objects) not only on the possession of an attribute but also by the amount or degree to which the subject may be characterized by the attribute
- For example, a person's age and weight are metric data ⫸ Multicollinearity Answer: - Extent to which a variable can be explained by the other variables in the analysis
- As multicollinearity increases, it complicates the interpretation of the variate because it is more difficult to ascertain the effect of any single variable, owing to their interrelationship ⫸ Multivariate analysis Answer: Analysis of multiple variables in a single relationship or set of relationships ⫸ Multivariate measurement Answer: - Use of two or more variables as indicators of a single composite measure
- For example, a personality test may provide the answers to a series of individual questions (indicators), which are then combined to form a single score (summated scale) representing the personality trait ⫸ Nonmetric data Answer: - Also called qualitative data, these are attributes, characteristics, or categorical properties that identify or describe a subject or object
- They differ from metric data by indicating the presence of an attribute, but not the amount
- Examples are occupation (physician, attorney, professor) or buyer status (buyer, non-buyer)
- Also called nominal data or ordinal data
⫸ Specification error Answer: Omitting a key variable from the analysis, thus affecting the estimated effects of included variables ⫸ Statistical models Answer: - The form of analysis where a specific model is proposed (e.g., dependent and independent variables to be analyzed by the general linear model), the model is then estimated and a statistical inference is made as to its generalizability to the population through statistical tests
- Operates in opposite fashion from data mining models which generally have little model specification and no statistical inference ⫸ Summated scales Answer: - Method of combining several variables that measure the same concept into a single variable in an attempt to increase the reliability of the measurement through multivariate measurement
- In most instances, the separate variables are summed and then their total or average score is used in the analysis ⫸ Treatment Answer: Independent variable the researcher manipulates to see the effect (if any) on the dependent variable(s), such as in an experiment (e.g., testing the appeal of color versus black-and-white advertisements) ⫸ Type I error Answer: - Probability of incorrectly rejecting the null hypothesis—in most cases, it means saying a difference or correlation exists when it actually does not
- Also termed alpha
- Typical levels are five or one percent, termed the .05 or .01 level, respectively ⫸ Type II error Answer: - Probability of incorrectly failing to reject the null hypothesis—in simple terms, the chance of not finding a correlation or mean difference when it does exist
- Also termed beta, it is inversely related to Type I error
- The value of 1 minus the Type II error (1 - beta) is defined as power ⫸ Univariate analysis of variance (AnoVA) Answer: Statistical technique used to determine, on the basis of one dependent measure, whether samples are from populations with equal means ⫸ Validation sample Answer: Portion of the sample "held out" from estimation and then used for an independent assessment of model fit on data that was not used in estimation ⫸ Validity Answer: - Extent to which a measure or set of measures correctly represents the concept of study—the degree to which it is free from any systematic or non-random error
- Validity is concerned with how well the concept is defined by the measure(s), whereas reliability relates to the consistency of the measure(s) ⫸ Variate Answer: Linear combination of variables formed in the multivariate technique by deriving empirical weights applied to a set of variables specified by the researcher
⫸ Centering Answer: A variable transformation in which a specific value (e.g., the variable mean) is subtracted from each observation's value, thus improving comparability among variables ⫸ Cold deck imputation Answer: Imputation method for missing data that derives the imputed value from an external source (e.g., prior studies, other samples) ⫸ Complete case approach Answer: - Approach for handling missing data that computes values based on data from complete cases, that is, cases with no missing data
- Also known as the listwise deletion approach ⫸ Curse of dimensionality Answer: - The problems associated with including a very large number of variables in the analysis
- Among the notable problems are the distance measures becoming less useful along with higher potential for irrelevant variables and differing scales of measurement for the variables ⫸ Data management Answer: - All of the activities associated with assembling a dataset for analysis
- With the arrival of the larger and diverse datasets from Big Data, researchers may now find they spend a vast majority of their time on this task rather than analysis ⫸ Data quality Answer: - Generally referring to the accuracy of the information in a dataset
- Recent efforts have identified eight dimensions that are much broader in scope and reflect the usefulness in many aspects of analysis and application: completeness, availability and accessibility, currency, accuracy, validity, usability and interpretability, reliability and credibility, and consistency ⫸ Data transformations Answer: - A variable may have an undesirable characteristic, such as non-normality, that detracts from its use in a multivariate technique
- A transformation, such as taking the logarithm or square root of the variable, creates a transformed variable that is more suited to portraying the relationship
- Transformations may be applied to either the dependent or independent variables, or both
- The need and specific type of transformation may be based on theoretical reasons (e.g., transforming a known nonlinear relationship), empirical reasons (e.g., problems identified through graphical or statistical means) or for interpretation purposes (e.g., standardization). ⫸ dCor Answer: A newer measure of association that is distance-based and more sensitive to nonlinear patterns in the data ⫸ Dichotomization Answer: Dividing cases into two classes based on being above or below a specified value ⫸ Dummy variable Answer: - Special metric variable used to represent a single category of a nonmetric variable
⫸ Heat map Answer: Form of scatterplot of nonmetric variables where frequency within each cell is color-coded to depict relationships ⫸ Histogram Answer: - Graphical display of the distribution of a single variable
- By forming frequency counts in categories, the shape of the variable's distribution can be shown
- Used to make a visual comparison to the normal distribution ⫸ Hoeffding's D Answer: New measure of association/correlation that is based on distance measures between the variables and thus more likely to incorporate nonlinear components ⫸ Homoscedasticity Answer: - When the variance of the error terms (e) appears constant over a range of predictor variables, the data are said to be homoscedastic
- The assumption of equal variance of the population error E (where E is estimated from e) is critical to the proper application of many multivariate techniques
- When the error terms have increasing or modulating variance, the data are said to be heteroscedastic
- Analysis of residuals best illustrates this point ⫸ Hot deck imputation Answer: Imputation method in which the imputed value is taken from an existing observation deemed similar
⫸ Ignorable missing data Answer: - Missing data process that is explicitly identifiable and/or is under the control of the researcher
- Ignorable missing data do not require a remedy because the missing data are explicitly handled in the technique used ⫸ Imputation Answer: - Process of estimating the missing data of an observation based on valid values of the other variables
- The objective is to employ known relationships that can be identified in the valid values of the sample to assist in representing or even estimating the replacements for missing values ⫸ Indicator coding Answer: - Method for specifying the reference category for a set of dummy variables where the reference category receives a value of zero across the set of dummy variables
- The dummy variable coefficients represent the category differences from the reference category. Also see effects coding ⫸ Ipsatizing Answer: Method of transformation for a set of variables on the same scale similar to centering, except that the variable used for centering all of the variables is the mean value for the observation (e.g., person-centered) ⫸ Kurtosis Answer: - Measure of the peakedness or flatness of a distribution when compared with a normal distribution
- A positive value indicates a relatively peaked distribution, and a negative value indicates a relatively flat distribution
⫸ Missing completely at random (MCAR) Answer: - Classification of missing data applicable when missing values of Y are not dependent on X
- When missing data are MCAR, observed values of Y are a truly random sample of all Y values, with no underlying process that lends bias to the observed data ⫸ Missing data Answer: - Information not available for a subject (or case) about whom other information is available
- Missing data often occur, for example, when a respondent fails to answer one or more questions in a survey ⫸ Missing data process Answer: Any systematic event external to the respondent (such as data entry errors or data collection problems) or any action on the part of the respondent (such as refusal to answer a question) that leads to missing data ⫸ Missingness Answer: - The absence or presence of missing data for a case or observation
- Does not relate directly to how that missing data value might be imputed ⫸ Multiple imputation Answer: - Imputation method applicable to MAR missing data processes in which several datasets are created with different sets of imputed data
- The process eliminates not only bias in imputed values, but also provides more appropriate measures of standard errors
⫸ Multivariate graphical display Answer: - Method of presenting a multivariate profile of an observation on three or more variables
- The methods include approaches such as glyphs, mathematical transformations, and even iconic representations (e.g., faces) ⫸ Normal distribution Answer: - Purely theoretical continuous probability distribution in which the horizontal axis represents all possible values of a variable and the vertical axis represents the probability of those values occurring
- The scores on the variable are clustered around the mean in a symmetrical, unimodal pattern known as the bell-shaped, or normal, curve ⫸ normal probability plot Answer: - Graphical comparison of the form of the distribution to the normal distribution
- In the normal probability plot, the normal distribution is represented by a straight line angled at 45 degrees
- The actual distribution is plotted against this line so that any differences are shown as deviations from the straight line, making identification of differences quite apparent and interpretable ⫸ Normality Answer: Degree to which the distribution of the sample data corresponds to a normal distribution
⫸ Response surface Answer: A transformation method in which a form of polynomial regression is used to represent the distribution of an outcome variable in an empirical form that can be portrayed as a surface ⫸ Robustness Answer: The ability of a statistical technique to perform reasonably well even when the underlying statistical assumptions have been violated in some manner ⫸ Scatterplot Answer: Representation of the relationship between two metric variables portraying the joint values of each observation in a two- dimensional graph ⫸ Skewness Answer: - Measure of the symmetry of a distribution, in most instances the comparison is made to a normal distribution
- A positively skewed distribution has relatively few large values and tails off to the right, and a negatively skewed distribution has relatively few small values and tails off to the left
- Skewness values falling outside the range of - 1 to +1 indicate a substantially skewed distribution ⫸ Standardization Answer: - Transformation method where a variable is centered (i.e., variable's mean value subtracted from each observation's value) and then "standardized" by dividing the difference by the variable's standard deviation
- Provides a measure that is comparable across variables no matter what their original scale
⫸ Variate Answer: Linear combination of variables formed in the multivariate technique by deriving empirical weights applied to a set of variables specified by the researcher ⫸ 1 - R^2 ratio Answer: Diagnostic measure employed in variable clustering to assess whether variables are singularly represented by a cluster component or have a substantive cross-loading ⫸ Anti-image correlation matrix Answer: - Matrix of the partial correlations among variables after factor analysis, representing the degree to which the factors explain each other in the results
- The diagonal contains the measures of sampling adequacy for each variable, and the off-diagonal values are partial correlations among variables ⫸ A priori criterion Answer: - A stopping rule for determining the number of factors. This rule is determined solely on the researcher's judgment and experience
- The researcher may know the desired structure, be testing a specific structure or other conceptually-based considerations so that the number of factors can be predetermined ⫸ Bartlett test of sphericity Answer: Statistical test for the overall significance of all correlations within a correlation matrix