IDEAS home Printed from https://ideas.repec.org/a/gam/jijerp/v16y2019i23p4658-d289973.html
   My bibliography  Save this article

Bring More Data!—A Good Advice? Removing Separation in Logistic Regression by Increasing Sample Size

Author

Listed:
  • Hana Šinkovec

    (Institute of Clinical Biometrics, Center for Medical Statistics, Informatics and Intelligent Systems (CEMSIIS), Spitalgasse 23, 1090 Vienna, Austria)

  • Angelika Geroldinger

    (Institute of Clinical Biometrics, Center for Medical Statistics, Informatics and Intelligent Systems (CEMSIIS), Spitalgasse 23, 1090 Vienna, Austria)

  • Georg Heinze

    (Institute of Clinical Biometrics, Center for Medical Statistics, Informatics and Intelligent Systems (CEMSIIS), Spitalgasse 23, 1090 Vienna, Austria)

Abstract

The parameters of logistic regression models are usually obtained by the method of maximum likelihood (ML). However, in analyses of small data sets or data sets with unbalanced outcomes or exposures, ML parameter estimates may not exist. This situation has been termed ‘separation’ as the two outcome groups are separated by the values of a covariate or a linear combination of covariates. To overcome the problem of non-existing ML parameter estimates, applying Firth’s correction (FC) was proposed. In practice, however, a principal investigator might be advised to ‘bring more data’ in order to solve a separation issue. We illustrate the problem by means of examples from colorectal cancer screening and ornithology. It is unclear if such an increasing sample size (ISS) strategy that keeps sampling new observations until separation is removed improves estimation compared to applying FC to the original data set. We performed an extensive simulation study where the main focus was to estimate the cost-adjusted relative efficiency of ML combined with ISS compared to FC. FC yielded reasonably small root mean squared errors and proved to be the more efficient estimator. Given our findings, we propose not to adapt the sample size when separation is encountered but to use FC as the default method of analysis whenever the number of observations or outcome events is critically low.

Suggested Citation

  • Hana Šinkovec & Angelika Geroldinger & Georg Heinze, 2019. "Bring More Data!—A Good Advice? Removing Separation in Logistic Regression by Increasing Sample Size," IJERPH, MDPI, vol. 16(23), pages 1-12, November.
  • Handle: RePEc:gam:jijerp:v:16:y:2019:i:23:p:4658-:d:289973
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/1660-4601/16/23/4658/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/1660-4601/16/23/4658/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Rousseeuw, Peter J. & Christmann, Andreas, 2003. "Robustness against separation and outliers in logistic regression," Computational Statistics & Data Analysis, Elsevier, vol. 43(3), pages 315-332, July.
    2. King, Gary & Zeng, Langche, 2001. "Logistic Regression in Rare Events Data," Political Analysis, Cambridge University Press, vol. 9(2), pages 137-163, January.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Domenica Matranga & Filippa Bono & Laura Maniscalco, 2021. "Statistical Advances in Epidemiology and Public Health," IJERPH, MDPI, vol. 18(7), pages 1-5, March.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Angel M. Morales & Patrick Tarwater & Indika Mallawaarachchi & Alok Kumar Dwivedi & Juan B. Figueroa-Casas, 2015. "Multinomial logistic regression approach for the evaluation of binary diagnostic test in medical research," Statistics in Transition new series, Główny Urząd Statystyczny (Polska), vol. 16(2), pages 203-222, June.
    2. F. Gauthier & D. Germain & B. Hétu, 2017. "Logistic models as a forecasting tool for snow avalanches in a cold maritime climate: northern Gaspésie, Québec, Canada," Natural Hazards: Journal of the International Society for the Prevention and Mitigation of Natural Hazards, Springer;International Society for the Prevention and Mitigation of Natural Hazards, vol. 89(1), pages 201-232, October.
    3. Douglas Cumming & Lars Hornuf & Moein Karami & Denis Schweizer, 2023. "Disentangling Crowdfunding from Fraudfunding," Journal of Business Ethics, Springer, vol. 182(4), pages 1103-1128, February.
    4. Eunae Yoo & Elliot Rabinovich & Bin Gu, 2020. "The Growth of Follower Networks on Social Media Platforms for Humanitarian Operations," Production and Operations Management, Production and Operations Management Society, vol. 29(12), pages 2696-2715, December.
    5. Cemal Eren Arbatlı & Quamrul H. Ashraf & Oded Galor & Marc Klemp, 2020. "Diversity and Conflict," Econometrica, Econometric Society, vol. 88(2), pages 727-797, March.
    6. Lo Turco, Alessia & Maggioni, Daniela, 2018. "Effects of Islamic religiosity on bilateral trust in trade: The case of Turkish exports," Journal of Comparative Economics, Elsevier, vol. 46(4), pages 947-965.
    7. Matija Kovacic & Claudio Zoli, 2021. "Ethnic distribution, effective power and conflict," Social Choice and Welfare, Springer;The Society for Social Choice and Welfare, vol. 57(2), pages 257-299, August.
    8. Blackman, Allen & Guerrero, Santiago, 2012. "What drives voluntary eco-certification in Mexico?," Journal of Comparative Economics, Elsevier, vol. 40(2), pages 256-268.
    9. Jacob Ausderan, 2018. "Reassessing the democratic advantage in interstate wars using k-adic datasets," Conflict Management and Peace Science, Peace Science Society (International), vol. 35(5), pages 451-473, September.
    10. Alessandra Iannamorelli & Stefano Nobili & Antonio Scalia & Luana Zaccaria, 2024. "Asymmetric Information and Corporate Lending: Evidence from SME Bond Markets," Review of Finance, European Finance Association, vol. 28(1), pages 163-201.
    11. Paul Poast, 2013. "Issue linkage and international cooperation: An empirical investigation," Conflict Management and Peace Science, Peace Science Society (International), vol. 30(3), pages 286-303, July.
    12. Yerko Rojas, 2017. "Evictions and short-term all-cause mortality: a 3-year follow-up study of a middle-aged Swedish population," International Journal of Public Health, Springer;Swiss School of Public Health (SSPH+), vol. 62(3), pages 343-351, April.
    13. Mehrez Ben Slama & Dhafer Saidane & Hassouna Fedhila, 2012. "How to identify targets in the M&A banking operations? Case of cross-border strategies in Europe by line of activity," Review of Quantitative Finance and Accounting, Springer, vol. 38(2), pages 209-240, February.
    14. Marcin Chlebus, 2014. "One-day prediction of state of turbulence for financial instrument based on models for binary dependent variable," Ekonomia journal, Faculty of Economic Sciences, University of Warsaw, vol. 37.
    15. Lorenzo Cassi & Anne Plunket, 2014. "Proximity, network formation and inventive performance: in search of the proximity paradox," The Annals of Regional Science, Springer;Western Regional Science Association, vol. 53(2), pages 395-422, September.
    16. Trent Geisler & Herman Ray & Ying Xie, 2023. "Finding the Proverbial Needle: Improving Minority Class Identification Under Extreme Class Imbalance," Journal of Classification, Springer;The Classification Society, vol. 40(1), pages 192-212, April.
    17. Wegenast, Tim, 2013. "The Impact of Fuel Ownership on Intrastate Violence," GIGA Working Papers 225, GIGA German Institute of Global and Area Studies.
    18. Xinfu Xing & Chenglong Wu & Jinhui Li & Xueyou Li & Limin Zhang & Rongjie He, 2021. "Susceptibility assessment for rainfall-induced landslides using a revised logistic regression method," Natural Hazards: Journal of the International Society for the Prevention and Mitigation of Natural Hazards, Springer;International Society for the Prevention and Mitigation of Natural Hazards, vol. 106(1), pages 97-117, March.
    19. Hwang, Seokyoun & Sarath, Bharat & Han, Seung-youb, 2022. "Auditor independence: The effect of auditors’ quality control efforts and corporate governance," Journal of International Accounting, Auditing and Taxation, Elsevier, vol. 47(C).
    20. Lahiri, Kajal & Yang, Liu, 2013. "Forecasting Binary Outcomes," Handbook of Economic Forecasting, in: G. Elliott & C. Granger & A. Timmermann (ed.), Handbook of Economic Forecasting, edition 1, volume 2, chapter 0, pages 1025-1106, Elsevier.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jijerp:v:16:y:2019:i:23:p:4658-:d:289973. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.