

















































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The PrepIQ NWCA Linear Regression Analysis Ultimate Exam focuses on statistical modeling and predictive analysis techniques. Learners study regression equations, correlation analysis, trend forecasting, data interpretation, and statistical decision-making concepts.
Typology: Exams
1 / 57
This page cannot be seen from the preview
Don't miss anything!


















































Question 1. Which of the following best describes the primary purpose of regression analysis in health data? A) To calculate the mean of a single variable B) To predict the value of a dependent variable from one or more independent variables C) To test for normality of a dataset D) To rank variables by frequency Answer: B Explanation: Regression analysis models the relationship between a dependent variable and one or more independent variables to make predictions. Question 2. In the context of regression, what distinguishes a deterministic model from a probabilistic model? A) Deterministic models include random error terms, while probabilistic models do not B) Probabilistic models produce the same output for identical inputs, deterministic models do not C) Deterministic models predict exact outcomes, whereas probabilistic models predict outcomes with associated uncertainty D) Probabilistic models are only used for categorical data Answer: C Explanation: Deterministic models give fixed outputs for given inputs, while probabilistic models incorporate randomness and provide predictions with uncertainty. Question 3. Which step typically follows data collection in the regression modeling process? A) Hypothesis testing of the slope B) Model validation
C) Variable selection D) Data cleaning and exploratory analysis Answer: D Explanation: After collecting data, analysts usually clean the data and explore relationships before building a model. Question 4. The Pearson correlation coefficient measures which of the following? A) The causal effect of X on Y B) The linear association strength and direction between two continuous variables C) The difference between means of two groups D) The variance of a single variable Answer: B Explanation: Pearson’s r quantifies the linear relationship between two continuous variables, ranging from –1 to +1. Question 5. A correlation of –0.85 between physical activity minutes and BMI indicates what? A) A strong positive linear relationship B) No linear relationship C) A strong negative linear relationship D) That increased activity causes higher BMI Answer: C Explanation: An r of –0.85 reflects a strong inverse linear association; as activity increases, BMI tends to decrease. Question 6. Which statement correctly differentiates correlation from causation?
Explanation: (\epsilon) denotes the random error (residual) capturing variability not explained by the linear model. Question 9. Which of the following is NOT an assumption of simple linear regression? A) Linearity of the relationship between X and Y B) Independence of observations C) Homoscedasticity (constant variance of errors) D) Multicollinearity among predictors Answer: D Explanation: Multicollinearity concerns multiple predictors and is not an assumption for simple linear regression with a single predictor. Question 10. In the OLS method, the slope (\beta_1) is estimated by which formula? A) (\frac{\sum (x_i - \bar{x})}{\sum (y_i - \bar{y})}) B) (\frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sum (x_i - \bar{x})^2}) C) (\frac{\sum y_i}{\sum x_i}) D) (\frac{\sum (y_i - \bar{y})^2}{\sum (x_i - \bar{x})^2}) Answer: B Explanation: The OLS slope estimator is the covariance of X and Y divided by the variance of X. Question 11. The intercept (\beta_0) in a simple linear regression model represents: A) The predicted value of Y when X equals its mean B) The predicted value of Y when X is zero
C) The average of Y across all observations D) The residual variance Answer: B Explanation: (\beta_0) is the estimated value of Y when the predictor X equals zero. Question 12. Which metric quantifies the proportion of total variability in Y explained by the regression model? A) Standard Error of Estimate B) Adjusted R² C) Coefficient of Determination (R²) D) F-statistic Answer: C Explanation: R² = SSR / SST measures the proportion of total variation accounted for by the model. Question 13. If a simple linear regression yields (R^2 = 0.64), what does this imply? A) 64% of the variation in the predictor is explained by the response B) The model explains 64% of the variation in the response variable C) The slope is 0. D) There is a 64% chance that the model is correct Answer: B Explanation: An R² of 0.64 indicates that 64% of the variability in Y is explained by X.
Answer: A Explanation: SST = SSR (regression sum of squares) + SSE (error sum of squares). Question 17. The F-test in simple linear regression assesses: A) Whether the intercept differs from zero B) Whether the slope coefficient is significantly different from zero C) Whether the residuals are normally distributed D) Whether the predictor variable is categorical Answer: B Explanation: The overall F-test evaluates the null hypothesis that all regression coefficients (in SLR, just the slope) are zero. Question 18. To test the hypothesis (H_0: \beta_1 = 0) at α = 0.05, which statistic is most appropriate? A) Z-statistic B) t-statistic with n-2 degrees of freedom C) Chi-square statistic D) F-statistic with 1 and n-2 degrees of freedom Answer: B Explanation: In SLR, the slope is tested using a t-test with n-2 degrees of freedom. Question 19. A 95% confidence interval for the slope (\beta_1) is (0.12, 0.45). Which conclusion is correct? A) The slope is not significantly different from zero at the 0.05 level B) The true slope is definitely 0.
C) We are 95% confident that the true slope lies between 0.12 and 0. D) The interval indicates heteroscedasticity Answer: C Explanation: The interval provides a range within which the true slope is expected to fall with 95% confidence. Question 20. A prediction interval for a new observation is wider than the corresponding confidence interval for the mean response because: A) It accounts for both the uncertainty in estimating the mean and the random error of an individual observation B) It uses a larger critical t-value C) It ignores the residual variance D) It is based on a smaller sample size Answer: A Explanation: Prediction intervals incorporate both the variability of the estimated mean and the inherent variability of individual outcomes. Question 21. In multiple linear regression, the model expands to (y = \beta_0 + beta_1x_1 + \beta_2x_2 + \dots + \beta_kx_k + \epsilon). What does (\beta_2) represent? A) The effect of (x_2) on Y when all other predictors are held constant B) The total effect of all predictors on Y C) The correlation between (x_1) and (x_2) D) The intercept for the second predictor Answer: A Explanation: Each (\beta_i) reflects the change in Y associated with a one-unit change in its predictor, controlling for all other variables.
Answer: B Explanation: Partial coefficients isolate the effect of one predictor controlling for the others. Question 25. Which test evaluates the overall significance of a multiple regression model? A) t-test for each coefficient B) F-test comparing the full model to a model with no predictors C) Chi-square test of residuals D) Durbin-Watson test Answer: B Explanation: The overall F-test assesses whether at least one predictor has a non-zero coefficient. Question 26. Standardized regression coefficients (beta weights) are useful because they: A) Provide coefficients in original measurement units B) Allow direct comparison of predictor importance regardless of variable scales C) Eliminate multicollinearity D) Convert categorical variables to continuous Answer: B Explanation: Standardized coefficients are expressed in standard deviation units, facilitating comparison across variables. Question 27. To include gender (male/female) in a regression model, which coding scheme is appropriate?
A) Treat gender as a continuous variable ranging from 0 to 1 B) Use a dummy variable (e.g., 0 = female, 1 = male) C) Exclude gender because it is categorical D) Use the Pearson correlation coefficient Answer: B Explanation: Dummy coding converts a binary categorical variable into a numeric 0/1 indicator. Question 28. When modeling the interaction between exercise frequency (X1) and diet quality (X2) on weight loss, the interaction term is: A) (X1 + X2) B) (X1 - X2) C) (X1 \times X2) D) (\frac{X1}{X2}) Answer: C Explanation: An interaction term is the product of the two predictors, capturing how the effect of one variable changes at levels of the other. Question 29. Heteroscedasticity in regression residuals can be detected most readily by: A) A Q-Q plot of residuals B) Plotting residuals versus fitted values and observing a funnel shape C) Computing the correlation coefficient D) Performing a chi-square test Answer: B
D) Severe heteroscedasticity Answer: C Explanation: A Durbin-Watson value around 2 indicates that residuals are approximately independent. Question 33. Which plot is most useful for checking normality of residuals? A) Scatter plot of residuals vs. predictor B) Histogram of residuals C) Normal probability (Q-Q) plot D) Bar chart of residual frequencies Answer: C Explanation: A Q-Q plot compares the distribution of residuals to a normal distribution; deviations from the line suggest non-normality. Question 34. The Shapiro-Wilk test evaluates: A) Equality of variances across groups B) Whether a sample comes from a normal distribution C) Presence of multicollinearity D) The significance of the regression slope Answer: B Explanation: The Shapiro-Wilk test is a formal test for normality. Question 35. Leverage points in regression are identified by: A) Large absolute residuals alone B) High values of the hat matrix diagonal (h_ii)
C) Small Cook’s distance values D) Low VIF values Answer: B Explanation: Leverage measures how far an observation’s predictor values are from the mean of the predictors; high hat values indicate high leverage. Question 36. Cook’s distance combines information from residuals and leverage to assess: A) Multicollinearity B) Influence of an observation on overall regression coefficients C) Heteroscedasticity D) Normality of errors Answer: B Explanation: Cook’s D quantifies the change in fitted values when a particular observation is omitted. Question 37. A VIF (Variance Inflation Factor) value of 12 for a predictor suggests: A) No multicollinearity concerns B) Moderate multicollinearity C) Severe multicollinearity, possibly inflating standard errors D) That the predictor should be transformed Answer: C Explanation: VIF values above 10 are commonly taken as indicating serious multicollinearity.
Explanation: WLS assigns weights inversely proportional to error variance, correcting heteroscedasticity. Question 41. In stepwise forward selection, a variable is added to the model only if: A) Its p-value exceeds a pre-specified threshold B) Its inclusion reduces the Akaike Information Criterion (AIC) C) Its VIF is greater than 5 D) It has the highest correlation with the response Answer: B Explanation: Forward stepwise adds variables that improve model fit, often judged by decreasing AIC (or increasing F-statistic significance). Question 42. The Bayesian Information Criterion (BIC) differs from AIC primarily by: A) Penalizing model complexity more heavily, especially with larger sample sizes B) Ignoring likelihood altogether C) Being used only for logistic regression D) Always selecting the model with the most predictors Answer: A Explanation: BIC includes a stronger penalty term (log n) for the number of parameters, favoring simpler models as n grows. Question 43. Cross-validation helps to assess: A) Multicollinearity within a single dataset B) The predictive performance of a model on unseen data C) The normality of residuals
D) The exact value of β coefficients Answer: B Explanation: Cross-validation partitions data into training and testing subsets to estimate out-of-sample prediction error. Question 44. In k-fold cross-validation with k = 5, the data are: A) Split into 5 equal parts, each serving once as the validation set while the other 4 are used for training B) Randomly sampled 5 times with replacement C) Divided into 5 groups, but only one group is ever used for training D) Tested on all 5 folds simultaneously Answer: A Explanation: 5-fold cross-validation rotates the validation set across the five partitions. Question 45. Polynomial regression of degree 2 can be expressed as: A) (y = \beta_0 + \beta_1x + \epsilon) B) (y = \beta_0 + \beta_1x + \beta_2x^2 + \epsilon) C) (y = \beta_0 + \beta_1\log(x) + \epsilon) D) (y = \beta_0 + \beta_1x^{-1} + \epsilon) Answer: B Explanation: A quadratic (second-degree) polynomial includes both (x) and (x^2) terms. Question 46. When adding a quadratic term (x^2) to a linear model, the primary purpose is to:
Explanation: Over-fitting occurs when a model is overly complex and fits idiosyncrasies of the sample rather than the underlying pattern. Question 49. Which diagnostic plot is most useful for detecting influential observations? A) Residuals vs. fitted values B) Cook’s distance plot C) Histogram of Y D) Scatter plot of X vs. Y Answer: B Explanation: A plot of Cook’s D highlights observations that have a large impact on regression coefficients. Question 50. A Durbin-Watson statistic of 1.2 suggests: A) Positive autocorrelation of residuals B) Negative autocorrelation of residuals C) No autocorrelation D) Perfect multicollinearity Answer: A Explanation: Values substantially less than 2 indicate positive autocorrelation. Question 51. Which of the following best describes the purpose of the “adjusted R²” when comparing two models with different numbers of predictors? A) To always increase with more predictors B) To penalize unnecessary predictors, allowing fair comparison of model explanatory power C) To replace the F-test for overall significance
D) To measure the correlation between observed and predicted values Answer: B Explanation: Adjusted R² adjusts for model complexity, decreasing if added variables do not improve fit. Question 52. In a multiple regression output, a predictor has a p-value of 0.08. At α = 0.05, the appropriate decision is to: A) Retain the predictor because p < 0. B) Remove the predictor because it is not statistically significant at the 0.05 level C) Conclude that the model is invalid D) Increase the sample size to achieve significance Answer: B Explanation: With α = 0.05, a p-value of 0.08 fails to meet the significance criterion, suggesting the predictor may be dropped. Question 53. The term “partial F-test” in multiple regression is used to: A) Compare two nested models, testing whether a set of predictors improves the model significantly B) Test the normality of residuals C) Evaluate multicollinearity among predictors D) Determine the best transformation for Y Answer: A Explanation: A partial F-test assesses whether adding (or removing) a group of variables significantly changes model fit.