IDEAS home Printed from https://ideas.repec.org/p/boc/scon20/1.html
   My bibliography  Save this paper

Better predicted probabilities from linear probability models with applications to multiple imputation

Author

Listed:
  • Paul Allison

    (Statistical Horizons LLC)

Abstract

Although logistic regression is the most popular method for regression analysis of binary outcomes, there are still many attractions to using least-squares regression to estimate a linear probability model. A major downside, however, is that predicted “probabilities” from a linear model are often greater than 1 or less than 0. That can be problematic for many real-world applications. As a solution, we propose to generate predicted probabilities based on a linear discriminant model, which Haggstrom (1983) showed could be obtained by rescaling coefficients from OLS regression. We offer a new Stata command, predict_ldm, that can be used after the regress command to generate predicted values that always fall within the (0,1) interval. We show that, for many applications, these values are very close to those produced by logistic regression. We also explore applications where there are substantial differences between logistic predictions and those produced by predict_ldm. Finally, we show that the linear discriminant method can be used to substantially improve multiple imputations of categorical data based on the multivariate normal model. We are currently developing a new mi impute command to implement this method.

Suggested Citation

  • Paul Allison, 2020. "Better predicted probabilities from linear probability models with applications to multiple imputation," 2020 Stata Conference 1, Stata Users Group.
  • Handle: RePEc:boc:scon20:1
    as

    Download full text from publisher

    File URL: http://fmwww.bc.edu/repec/scon2020/us20_Allison.pdf
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Jeffrey M Wooldridge, 2010. "Econometric Analysis of Cross Section and Panel Data," MIT Press Books, The MIT Press, edition 2, volume 1, number 0262232588, December.
    2. Westin, Richard B., 1974. "Predictions from binary choice models," Journal of Econometrics, Elsevier, vol. 2(1), pages 1-16, May.
    3. Haggstrom, Gus W, 1983. "Logistic Regression and Discriminant Analysis by Ordinary Least Squares," Journal of Business & Economic Statistics, American Statistical Association, vol. 1(3), pages 229-238, July.
    4. Mroz, Thomas A, 1987. "The Sensitivity of an Empirical Model of Married Women's Hours of Work to Economic and Statistical Assumptions," Econometrica, Econometric Society, vol. 55(4), pages 765-799, July.
    5. Ottar Hellevik, 2009. "Linear versus logistic regression when the dependent variable is a dichotomy," Quality & Quantity: International Journal of Methodology, Springer, vol. 43(1), pages 59-74, January.
    6. ., 2017. "Econometric analysis: loopholes and shortcomings," Chapters, in: Econometrics as a Con Art, chapter 5, pages 88-105, Edward Elgar Publishing.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Mohammad Reza Farzanegan & Hassan F. Gholipour, 2021. "Growing up in the Iran–Iraq war and preferences for strong defense," Review of Development Economics, Wiley Blackwell, vol. 25(4), pages 1945-1968, November.
    2. Marshall L. White & William J. Sabol, 2021. "Legal Financial Obligations and Probation: Findings from the 1995 Survey of Adults on Probation," Social Sciences, MDPI, vol. 10(12), pages 1-22, November.
    3. Didier, Nicolás, 2021. "Does the expansion of higher education reduce gender gaps in the labor market? Evidence from a natural experiment," International Journal of Educational Development, Elsevier, vol. 86(C).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Baron, Opher & Callen, Jeffrey L. & Segal, Dan, 2023. "Does the bullwhip matter economically? A cross-sectional firm-level analysis," International Journal of Production Economics, Elsevier, vol. 259(C).
    2. Patrick Kline & Christopher R. Walters, 2019. "On Heckits, LATE, and Numerical Equivalence," Econometrica, Econometric Society, vol. 87(2), pages 677-696, March.
    3. Bharati, Tushar & Jetter, Michael & Malik, Muhammad Nauman, 2024. "Types of communications technology and civil conflict," Journal of Development Economics, Elsevier, vol. 170(C).
    4. Riccardo Fiorito & Giulio Zanella, 2012. "The Anatomy of the Aggregate Labor Supply Elasticity," Review of Economic Dynamics, Elsevier for the Society for Economic Dynamics, vol. 15(2), pages 171-187, April.
    5. Thomas Bassetti & Filippo Pavesi, 2017. "Electoral Contributions And The Cost Of Unpopularity," Economic Inquiry, Western Economic Association International, vol. 55(4), pages 1771-1791, October.
    6. Seonho Shin, 2021. "Were they a shock or an opportunity?: The heterogeneous impacts of the 9/11 attacks on refugees as job seekers—a nonlinear multi-level approach," Empirical Economics, Springer, vol. 61(5), pages 2827-2864, November.
    7. Denisard Alves & Walter Belluzzo, 2005. "Child Health and Infant Mortality in Brazil," Research Department Publications 3187, Inter-American Development Bank, Research Department.
    8. Guy Lacroix & Nadia Joubert & Bernard Fortin, 2004. "Offre de travail au noir en présence de la fiscalité et des contrôles fiscaux," Économie et Prévision, Programme National Persée, vol. 164(3), pages 145-163.
    9. Pierre LEVASSEUR, 2016. "The effects of bodyweight on wages in urban Mexico," Cahiers du GREThA (2007-2019) 2016-18, Groupe de Recherche en Economie Théorique et Appliquée (GREThA).
    10. David T. Frazier & Eric Renault & Lina Zhang & Xueyan Zhao, 2020. "Weak Identification in Discrete Choice Models," Papers 2011.06753, arXiv.org, revised Jan 2021.
    11. Jaime Andres Sarmiento Espinel & Edwin van Gameren, 2016. "A collective household labor supply model with children and non-participation: Theory and empirical application," Serie documentos de trabajo del Centro de Estudios Económicos 2016-11, El Colegio de México, Centro de Estudios Económicos.
    12. Masayuki Hirukawa & Di Liu & Irina Murtazashvili & Artem Prokhorov, 2023. "DS-HECK: double-lasso estimation of Heckman selection model," Empirical Economics, Springer, vol. 64(6), pages 3167-3195, June.
    13. Dhamija, Gaurav & Roychowdhury, Punarjit, 2018. "The impact of women's age at marriage on own and spousal labor market outcomes in India: causation or selection?," MPRA Paper 86686, University Library of Munich, Germany.
    14. Denisard Alves & Walter Belluzzo, 2005. "Salud y mortalidad infantil en Brasil," Research Department Publications 3188, Inter-American Development Bank, Research Department.
    15. Kieschnick, Robert & Moussawi, Rabih, 2018. "Firm age, corporate governance, and capital structure choices," Journal of Corporate Finance, Elsevier, vol. 48(C), pages 597-614.
    16. Kaiser, Ulrich & Kuhn, Johan M., 2020. "The value of publicly available, textual and non-textual information for startup performance prediction," Journal of Business Venturing Insights, Elsevier, vol. 14(C).
    17. Fernandes, Mario & Hilber, Simon & Sturm, Jan-Egbert & Walter, Andreas, 2023. "Closing the gender gap in academia? Evidence from an affirmative action program," Research Policy, Elsevier, vol. 52(9).
    18. Jörg Schwiebert, 2015. "Estimation and interpretation of a Heckman selection model with endogenous covariates," Empirical Economics, Springer, vol. 49(2), pages 675-703, September.
    19. Chunbei Wang & Le Wang, 2017. "Knot yet: minimum marriage age law, marriage delay, and earnings," Journal of Population Economics, Springer;European Society for Population Economics, vol. 30(3), pages 771-804, July.
    20. Carson, Richard T. & Eagle, Thomas C. & Islam, Towhidul & Louviere, Jordan J., 2022. "Volumetric choice experiments (VCEs)," Journal of choice modelling, Elsevier, vol. 42(C).

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:boc:scon20:1. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Christopher F Baum (email available below). General contact details of provider: https://edirc.repec.org/data/stataea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.