Download Logistic Regression: A Technique for Binary Response Variables and more Summaries Statistics in PDF only on Docsity! Chapter 3 3. Logistic Regression Logistic regression is a technique used when the dependent variable is dichotomous or binary such that Yi = 1 or 0. The distribution of Yi is binomial and we are interested in modeling the conditional probability that Yi = 1 for a given X=xl as a function of the independent or explanatory variables or π(x). As with the other techniques, independent variables may be either continuous or categorical. Why not just use linear regression? Because in the case of a binary response variable, the assumptions of linear regression are not valid: The relationship between X and Y is nonlinear Error terms are not normally distributed Error terms are heteroscedastic If you proceeded in light of these violations, the result would be: Predicted values that are not possible (greater than a value of 1, smaller than a value of 0) Magnitude of the effects of independent variables may be greatly underestimated Relationships between π(x) and x are usually nonlinear rather than linear. A fixed change in x may have less impact when π is near 0 or 1 than when π is near the middle of its range. In practice, π(x) often either increases continuously or decreases continuously as x increases. The S-shaped curves displayed in Figure 3.1 are often realistic shapes for the relationship. The most important mathematical function with this shape has formula 1 1 x e 3.1 Logistic regression model To begin, suppose there is a single explanatory variable X, which is quantitative. Remember that a logistic regression model with a single predictor is called simple logistic regression. For a binary response variable Y , recall that π(x) denotes the “success” probability at value x. This probability is the parameter for the binomial distribution. 1 ( ) , ...., 3.1 11 x xx e x ee This is called the logistic regression function. The corresponding logistic regression model form is given in equation 3.2. The logistic regression model has linear form for the logit of this probability ( ) logitπ(x) = log ,....,3.2 1 ( ) x x x In logistic regression, a logistic transformation of the odds (referred to as logit) serves as the dependent variable. One can re arrange equation in 3.2 to have the expression ( ) ,....,3.3, where e is a constant which equal to 2.718... 1 ( ) xx e x 3.1.1 Linear Approximation Interpretations The logistic regression formula (3.2) indicates that the logit increases by β for every 1 cm increase in x. Most of us do not think naturally on a logit (logarithm of the odds) scale, so we need to consider alternative interpretations. 2 The parameter β in equations (3.1) and (3.2) determines the rate of increase or decrease of the S- shaped curve for π(x). The sign of β indicates whether the curve ascends (β > 0) or descends (β < 0), and the rate of change increases as |β| increases. When β = 0, the right-hand side of equation (3.1) simplifies to a constant. Then, π(x) is identical at all x, so the curve becomes a horizontal straight line. The binary response Y is then independent of X. Figure 3.1 shows the S-shaped appearance of the model for π(x), as fitted for the example in the following subsection. Since it is curved rather than a straight line, the rate of change in π(x) per 1-unit increase in x depends on the value of x. A straight line drawn tangent to the curve at a particular x value, such as shown in Figure 3.1 describes the rate of change at that point. For logistic regression parameter β, that line has slope equal to βπ(x)[1 − π(x)]. For instance, the line tangent to the curve at x for which π(x) = 0.50 has slope β(0.50)(0.50) = 0.25β; by contrast, when π(x) = 0.90 or 0.10, it has slope 0.09β. The slope approaches 0 as the probability approaches 1.0 or 0. The magnitude of β determines how “steep” the curve is. You can think about it as determining the slope of the line at tangents along the curve. The steepest slope of the curve occurs when π(x)= 0.5. The x value for which π(x) = 0.5 is sometimes referred to as the median effective level and it represent the level at which the outcome has a 50% chance. That x value relates to the logistic regression parameters by x = −α/β. X Figure 3.1. Linear approximation to logistic curve There are many uses of logistic regression such as: 1. To model the probabilities of certain conditions or states as a function of some explanatory variables such as to identify “risk” factors for certain conditions (i.e. divorce, disease, adjustment, etc.). For example one might want to model whether or not one has diabetes as a function of weight, plasma insulin, fasting plasma glucose, and test plasma glucose intolerance. 2. To describe differences between individuals from separate groups as a function of some explanatory variables, also known as descriptive discriminate analysis. For example, one might want to test whether a student attended an academic program in high school is a function of achievement test scores, desired occupation, and SES. 5 3.2.1 Confidence Intervals for Effects A large-sample Wald confidence interval for the parameter β in the logistic regression model, logit[π(x)] = α + βx, is A (1 - α)100% confidence interval estimate of β can be obtained by: / 2 ( )z SE where SE represents the asymptotic standard error of β To more readily interpret this confidence interval we can exponent the endpoints to determine the effect on the odds for a 1-unit increase in x. In our example, a 95% confidence interval for β is .111± 1.96(.024) = (0.64, 1.5) To get a confidence interval for the effect of age on the odds, that is for eβ, simply take the exponential of the confidence interval for β (e0.64, e1.5)= (1.066, 1.17) We can also get an interval for the linear approximation to the curve. Recall that β()(1-) approximates the change in probability given a 1 unit change in x. We can multiply the endpoints by ()(1-) using some probability that we are interested in. for example, suppose we want to determine what the increase in probability of being developing heart disease is at (x)= 0.25 given a 1 unit change in x. Then we can multiply the endpoints of our confidence interval for β by (0.25(1-0.25) = (.25)(.75) = 0.1875 Using our example we would obtain: (.0211*.1875, .0313*.1875) = (.00396, .00587) So the rate of increase in the probability of being in an academic program for values of x near achievement level at which =0.25 is somewhere between .00396 and .00587. It should be noted that due to the large range in achievement test scores a 1-unit increase in score is not very noticeable, nor is it likely given the scale. 3.2.2 Significance Testing To test hypothesis one can use one of the following methods 1. Wald test 2. Likelihood ratio test In the case of a simple logistic regression (i.e., only a single predictor), the tests of overall fit and the tests of the predictor test the same hypothesis: is the predictor useful in predicting the outcome? The Wald test is the usual test for the significance of a single predictor (is or is 1e ). Thus, for simple logistic both the likelihood ratio for the full model and the Wald test for the significance of the predictor test the same hypothesis. The Likelihood Ratio and Wald test of the significance of a single predictor are said to be “asymptotically” equivalent, which means that their significance values will converge with larger N. With small samples, however, they are not likely to be equal and may sometimes lead to different statistical conclusions (i.e., significance). The likelihood ratio test for a single predictor 6 is usually recommended by logistic texts as the most powerful (although some authors have stated that neither the Wald nor the LR test is superior). Then to test the hypothesis that the probability of success is independent of X) versus H≠or the one side alternatives Hor <we use the following a test statistic 0 ˆ (0,1)z N SE where is the hypothesized value under the null (i.e. Using our example, we would obtain: .111 0 4.625 .024 z We can also use Wald statistics which are simply squared z-statistics with 1 d.f. 2 2 2 (1) ˆ X SE Likelihood ratio test statistic Test statistic: LR = −2(L0 − L1) where L0 is the log of the maximum likelihood for the model logitπ(x) and L1 is the log of the maximum likelihood for the model logitπ(x) x If the null is true, then the likelihood ratio test statistic is approximately chi-square distributed with df = 1. Although the Wald test is adequate for large samples, the likelihood-ratio test is a more powerful alternative to the Wald statistic. The test statistic = -2(L0 - L1), where L0 is the log of the maximum likelihood for the more parsimonious model (less complex), logit(x)) = and L1 is the log of the maximum likelihood for the more complex model, logit logit (x)) = xOnce again, to conduct this test you must fit both models to obtain both likelihoods and calculate the test statistic by hand. Suppose that we fit both of these models to our data and we obtain: L1 = -351.9207 and L0 = -415.6749 Therefore our log likelihood ratio test is -2(L0 - L1) = -2(-415.6749 - (-351.9207) = 127.5084 with 1 df We can also look at the confidence intervals for the true probabilities, under the model. it make use of the covariance matrix of the model parameter estimates. 7 3.3 Logistic regression with categorical predictors Logistic regression, like ordinary regression, can have multiple explanatory variables. Some or all of those predictors can be categorical, rather than quantitative. We look at how to include categorical predictors, often called factors, into the model. 3.3.1 Indicator Variables Represent Categories of Predictors Suppose a binary response Y has two binary predictors, X and Z. The data are then displayed in a 2 × 2 × 2 contingency table, such as we’ll see in the example in the next subsection. Let x and z each take values 0 and 1 to represent the two categories of each explanatory variable. The model for P(Y = 1), logit[P(Y = 1)] = α + β1x + β2z has main effects for x and z. The variables x and z are called indicator variables. They indicate categories for the predictors. Indicator variables are also called dummy variables. For this coding, the following table shows the logit values at the four combinations of values of the two predictors This difference between two logits equals the difference of log odds. Equivalently that difference equals the log of the odds ratio between X and Y , at that category of Z. Thus, exp(β1) equals the conditional odds ratio between X and Y . Controlling for Z, the odds of “success” at x = 1 equal exp(β1) times the odds of success at x = 0. This conditional odds ratio is the same at each