Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

logistic displays estimates as odds ratios; to view coefficients, type logit after running logistic.

Typology: Study notes

2022/2023

1 / 11

Download Logistic regression, reporting odds ratios and more Study notes Statistics in PDF only on Docsity! Title stata.com logistic — Logistic regression, reporting odds ratios Description Quick start Menu Syntax Options Remarks and examples Stored results Methods and formulas References Also see Description logistic fits a logistic regression model of depvar on indepvars, where depvar is a 0/1 variable (or, more precisely, a 0/non-0 variable). Without arguments, logistic redisplays the last logistic estimates. logistic displays estimates as odds ratios; to view coefficients, type logit after running logistic. To obtain odds ratios for any covariate pattern relative to another, see [R] lincom. Quick start Report odds ratios from logistic regression of y on x1 and x2 logistic y x1 x2 Add indicators for values of categorical variable a logistic y x1 x2 i.a As above, and apply frequency weights defined by wvar logistic y x1 x2 i.a [fweight=wvar] Show base level of a logistic y x1 x2 i.a, baselevels Menu Statistics > Binary outcomes > Logistic regression 1 2 logistic — Logistic regression, reporting odds ratios Syntax logistic depvar indepvars [ if ] [ in ] [ weight ] [ , options ] options Description Model noconstant suppress constant term offset(varname) include varname in model with coefficient constrained to 1 asis retain perfect predictor variables constraints(constraints) apply specified linear constraints SE/Robust vce(vcetype) vcetype may be oim, opg, robust, cluster clustvar, bootstrap, or jackknife Reporting level(#) set confidence level; default is level(95) coef report estimated coefficients nocnsreport do not display constraints display options control columns and column formats, row spacing, line width, display of omitted variables and base and empty cells, and factor-variable labeling Maximization maximize options control the maximization process; seldom used collinear keep collinear variables coeflegend display legend instead of statistics indepvars may contain factor variables; see [U] 11.4.3 Factor variables. depvar and indepvars may contain time-series operators; see [U] 11.4.4 Time-series varlists. bayes, bootstrap, by, collect, fp, jackknife, mfp, mi estimate, nestreg, rolling, statsby, stepwise, and svy are allowed; see [U] 11.1.10 Prefix commands. For more details, see [BAYES] bayes: logistic. vce(bootstrap) and vce(jackknife) are not allowed with the mi estimate prefix; see [MI] mi estimate. Weights are not allowed with the bootstrap prefix; see [R] bootstrap. vce() and weights are not allowed with the svy prefix; see [SVY] svy. fweights, iweights, and pweights are allowed; see [U] 11.1.6 weight. collinear and coeflegend do not appear in the dialog box. See [U] 20 Estimation and postestimation commands for more capabilities of estimation commands. Options Model noconstant, offset(varname), constraints(constraints); see [R] Estimation options. asis forces retention of perfect predictor variables and their associated perfectly predicted observations and may produce instabilities in maximization; see [R] probit. logistic — Logistic regression, reporting odds ratios 5 . logistic low age lwt i.race smoke ptl ht ui Logistic regression Number of obs = 189 LR chi2(8) = 33.22 Prob > chi2 = 0.0001 Log likelihood = -100.724 Pseudo R2 = 0.1416 low Odds ratio Std. err. z P>|z| [95% conf. interval] age .9732636 .0354759 -0.74 0.457 .9061578 1.045339 lwt .9849634 .0068217 -2.19 0.029 .9716834 .9984249 race Black 3.534767 1.860737 2.40 0.016 1.259736 9.918406 Other 2.368079 1.039949 1.96 0.050 1.001356 5.600207 smoke 2.517698 1.00916 2.30 0.021 1.147676 5.523162 ptl 1.719161 .5952579 1.56 0.118 .8721455 3.388787 ht 6.249602 4.322408 2.65 0.008 1.611152 24.24199 ui 2.1351 .9808153 1.65 0.099 .8677528 5.2534 _cons 1.586014 1.910496 0.38 0.702 .1496092 16.8134 Note: _cons estimates baseline odds. The odds ratios are for a one-unit change in the variable. If we wanted the odds ratio for age to be in terms of 4-year intervals, we would type . generate age4 = age/4 . logistic low age4 lwt i.race smoke ptl ht ui (output omitted ) After logistic, we can type logit to see the model in terms of coefficients and standard errors: . logit Logistic regression Number of obs = 189 LR chi2(8) = 33.22 Prob > chi2 = 0.0001 Log likelihood = -100.724 Pseudo R2 = 0.1416 low Coefficient Std. err. z P>|z| [95% conf. interval] age4 -.1084012 .1458017 -0.74 0.457 -.3941673 .1773649 lwt -.0151508 .0069259 -2.19 0.029 -.0287253 -.0015763 race Black 1.262647 .5264101 2.40 0.016 .2309024 2.294392 Other .8620792 .4391532 1.96 0.050 .0013548 1.722804 smoke .9233448 .4008266 2.30 0.021 .137739 1.708951 ptl .5418366 .346249 1.56 0.118 -.136799 1.220472 ht 1.832518 .6916292 2.65 0.008 .4769494 3.188086 ui .7585135 .4593768 1.65 0.099 -.1418484 1.658875 _cons .4612239 1.20459 0.38 0.702 -1.899729 2.822176 If we wanted to see the logistic output again, we would type logistic without arguments. 6 logistic — Logistic regression, reporting odds ratios Example 2 We can specify the confidence interval for the odds ratios with the level() option, and we can do this either at estimation time or when replaying the model. For instance, to see our first model in example 1 with narrower, 90% confidence intervals, we might type . logistic, level(90) Logistic regression Number of obs = 189 LR chi2(8) = 33.22 Prob > chi2 = 0.0001 Log likelihood = -100.724 Pseudo R2 = 0.1416 low Odds ratio Std. err. z P>|z| [90% conf. interval] age4 .8972675 .1308231 -0.74 0.457 .7059409 1.140448 lwt .9849634 .0068217 -2.19 0.029 .9738063 .9962483 race Black 3.534767 1.860737 2.40 0.016 1.487028 8.402379 Other 2.368079 1.039949 1.96 0.050 1.149971 4.876471 smoke 2.517698 1.00916 2.30 0.021 1.302185 4.867819 ptl 1.719161 .5952579 1.56 0.118 .9726876 3.038505 ht 6.249602 4.322408 2.65 0.008 2.003487 19.49478 ui 2.1351 .9808153 1.65 0.099 1.00291 4.545424 _cons 1.586014 1.910496 0.38 0.702 .2186791 11.50288 Note: _cons estimates baseline odds. Robust estimate of variance If you specify vce(robust), Stata reports the robust estimate of variance described in [U] 20.22 Ob- taining robust variance estimates. Here is the model previously fit with the robust estimate of variance: . logistic low age lwt i.race smoke ptl ht ui, vce(robust) Logistic regression Number of obs = 189 Wald chi2(8) = 29.02 Prob > chi2 = 0.0003 Log pseudolikelihood = -100.724 Pseudo R2 = 0.1416 Robust low Odds ratio std. err. z P>|z| [95% conf. interval] age .9732636 .0329376 -0.80 0.423 .9108015 1.040009 lwt .9849634 .0070209 -2.13 0.034 .9712984 .9988206 race Black 3.534767 1.793616 2.49 0.013 1.307504 9.556051 Other 2.368079 1.026563 1.99 0.047 1.012512 5.538501 smoke 2.517698 .9736417 2.39 0.017 1.179852 5.372537 ptl 1.719161 .7072902 1.32 0.188 .7675715 3.850476 ht 6.249602 4.102026 2.79 0.005 1.726445 22.6231 ui 2.1351 1.042775 1.55 0.120 .8197749 5.560858 _cons 1.586014 1.939482 0.38 0.706 .144345 17.42658 Note: _cons estimates baseline odds. logistic — Logistic regression, reporting odds ratios 7 Also, you can specify vce(cluster clustvar) and then, within cluster, relax the assumption of independence. To illustrate this, we have made some fictional additions to the low-birthweight data. Say that these data are not a random sample of mothers but instead are a random sample of mothers from a random sample of hospitals. In fact, that may be true—we do not know the history of these data. Hospitals specialize, and it would not be too incorrect to say that some hospitals specialize in more difficult cases. We are going to show two extremes. In one, all hospitals are alike, but we are going to estimate under the possibility that they might differ. In the other, hospitals are strikingly different. In both cases, we assume that patients are drawn from 20 hospitals. In both examples, we will fit the same model, and we will type the same command to fit it. Below are the same data we have been using but with a new variable, hospid, that identifies from which of the 20 hospitals each patient was drawn (and which we have made up): . use https://www.stata-press.com/data/r17/hospid1, clear . logistic low age lwt i.race smoke ptl ht ui, vce(cluster hospid) Logistic regression Number of obs = 189 Wald chi2(8) = 49.67 Prob > chi2 = 0.0000 Log pseudolikelihood = -100.724 Pseudo R2 = 0.1416 (Std. err. adjusted for 20 clusters in hospid) Robust low Odds ratio std. err. z P>|z| [95% conf. interval] age .9732636 .0397476 -0.66 0.507 .898396 1.05437 lwt .9849634 .0057101 -2.61 0.009 .9738352 .9962187 race Black 3.534767 2.013285 2.22 0.027 1.157563 10.79386 Other 2.368079 .8451325 2.42 0.016 1.176562 4.766257 smoke 2.517698 .8284259 2.81 0.005 1.321062 4.79826 ptl 1.719161 .6676221 1.40 0.163 .8030814 3.680219 ht 6.249602 4.066275 2.82 0.005 1.74591 22.37086 ui 2.1351 1.093144 1.48 0.138 .7827337 5.824014 _cons 1.586014 1.661913 0.44 0.660 .2034094 12.36639 Note: _cons estimates baseline odds. The standard errors are similar to the standard errors we have previously obtained, whether we used the robust or conventional estimators. In this example, we invented the hospital IDs randomly. 10 logistic — Logistic regression, reporting odds ratios In addition to the above, the following is stored in r(): Matrices r(table) matrix containing the coefficients with their standard errors, test statistics, p-values, and confidence intervals Note that results stored in r() are updated when the command is replayed and will be replaced when any r-class command is run after the estimation command. Methods and formulas Define xj as the (row) vector of independent variables, augmented by 1, and b as the corresponding estimated parameter (column) vector. The logistic regression model is fit by logit; see [R] logit for details of estimation. The odds ratio corresponding to the ith coefficient is ψi = exp(bi). The standard error of the odds ratio is sψi = ψisi, where si is the standard error of bi estimated by logit. Define Ij = xjb as the predicted index of the jth observation. The predicted probability of a positive outcome is pj = exp(Ij) 1 + exp(Ij) This command supports the Huber/White/sandwich estimator of the variance and its clustered version using vce(robust) and vce(cluster clustvar), respectively. See [P] robust, particularly Maximum likelihood estimators and Methods and formulas. logistic also supports estimation with survey data. For details on VCEs with survey data, see [SVY] Variance estimation. References Archer, K. J., and S. A. Lemeshow. 2006. Goodness-of-fit test for a logistic regression model fitted using survey sample data. Stata Journal 6: 97–105. Buis, M. L. 2010a. Direct and indirect effects in a logit model. Stata Journal 10: 11–29. . 2010b. Stata tip 87: Interpretation of interactions in nonlinear models. Stata Journal 10: 305–308. Dupont, W. D. 2009. Statistical Modeling for Biomedical Researchers: A Simple Introduction to the Analysis of Complex Data. 2nd ed. Cambridge: Cambridge University Press. Fernandez-Felix, B. M., E. Garcı́a-Esquinas, A. Muriel, A. Royuela, and J. Zamora. 2021. Bootstrap internal validation command for predictive logistic regression models. Stata Journal 21: 498–509. Freese, J. 2002. Least likely observations in regression models for categorical outcomes. Stata Journal 2: 296–300. Gould, W. W. 2000. sg124: Interpreting logistic regression in all its forms. Stata Technical Bulletin 53: 19–29. Reprinted in Stata Technical Bulletin Reprints, vol. 9, pp. 257–270. College Station, TX: Stata Press. Hilbe, J. M. 2009. Logistic Regression Models. Boca Raton, FL: Chapman & Hill/CRC. Hosmer, D. W., Jr., S. A. Lemeshow, and R. X. Sturdivant. 2013. Applied Logistic Regression. 3rd ed. Hoboken, NJ: Wiley. Kleinbaum, D. G., and M. Klein. 2010. Logistic Regression: A Self-Learning Text. 3rd ed. New York: Springer. Lalanne, C., and M. Mesbah. 2016. Biostatistics and Computer-based Analysis of Health Data Using Stata. London: ISTE Press. Lemeshow, S. A., and J.-R. L. Gall. 1994. Modeling the severity of illness of ICU patients: A systems update. Journal of the American Medical Association 272: 1049–1055. https://doi.org/10.1001/jama.1994.03520130087038. logistic — Logistic regression, reporting odds ratios 11 Lemeshow, S. A., and D. W. Hosmer, Jr. 2005. Logistic regression. In Vol. 2 of Encyclopedia of Biostatistics, ed. P. Armitage and T. Colton, 2870–2880. Chichester, UK: Wiley. Long, J. S., and J. Freese. 2014. Regression Models for Categorical Dependent Variables Using Stata. 3rd ed. College Station, TX: Stata Press. Mehmetoglu, M., and T. G. Jakobsen. 2017. Applied Statistics Using Stata: A Guide for the Social Sciences. Thousand Oaks, CA: SAGE. Miranda, A., and S. Rabe-Hesketh. 2006. Maximum likelihood estimation of endogenous switching and sample selection models for binary, ordinal, and count variables. Stata Journal 6: 285–308. Mitchell, M. N., and X. Chen. 2005. Visualizing main effects and interactions for binary logit models. Stata Journal 5: 64–82. Pagano, M., and K. Gauvreau. (2000) 2018. Principles of Biostatistics. 2nd ed. Reprint, Boca Raton, FL: CRC Press. Pampel, F. C. 2000. Logistic Regression: A Primer. Thousand Oaks, CA: SAGE. Pregibon, D. 1981. Logistic regression diagnostics. Annals of Statistics 9: 705–724. https://doi.org/10.1214/aos/1176345513. Schonlau, M. 2005. Boosted regression (boosting): An introductory tutorial and a Stata plugin. Stata Journal 5: 330–354. Uberti, L. J. 2022. Interpreting logit models. Stata Journal 22: 60–76. Vittinghoff, E., D. V. Glidden, S. C. Shiboski, and C. E. McCulloch. 2012. Regression Methods in Biostatistics: Linear, Logistic, Survival, and Repeated Measures Models. 2nd ed. New York: Springer. Xu, J., and J. S. Long. 2005. Confidence intervals for predicted outcomes in regression models for categorical outcomes. Stata Journal 5: 537–559. Also see [R] logistic postestimation — Postestimation tools for logistic [R] brier — Brier score decomposition [R] cloglog — Complementary log–log regression [R] exlogistic — Exact logistic regression [R] logit — Logistic regression, reporting coefficients [R] npregress kernel — Nonparametric kernel regression [R] npregress series — Nonparametric series regression [R] roc — Receiver operating characteristic (ROC) analysis [BAYES] bayes: logistic — Bayesian logistic regression, reporting odds ratios [FMM] fmm: logit — Finite mixtures of logistic regression models [LASSO] Lasso intro — Introduction to lasso [MI] Estimation — Estimation commands for use with mi estimate [SVY] svy estimation — Estimation commands for survey data [XT] xtlogit — Fixed-effects, random-effects, and population-averaged logit models [U] 20 Estimation and postestimation commands