IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0217057.html
   My bibliography  Save this article

Penalized logistic regression with low prevalence exposures beyond high dimensional settings

Author

Listed:
  • Sam Doerken
  • Marta Avalos
  • Emmanuel Lagarde
  • Martin Schumacher

Abstract

Estimating and selecting risk factors with extremely low prevalences of exposure for a binary outcome is a challenge because classical standard techniques, markedly logistic regression, often fail to provide meaningful results in such settings. While penalized regression methods are widely used in high-dimensional settings, we were able to show their usefulness in low-dimensional settings as well. Specifically, we demonstrate that Firth correction, ridge, the lasso and boosting all improve the estimation for low-prevalence risk factors. While the methods themselves are well-established, comparison studies are needed to assess their potential benefits in this context. This is done here using the dataset of a large unmatched case-control study from France (2005-2008) about the relationship between prescription medicines and road traffic accidents and an accompanying simulation study. Results show that the estimation of risk factors with prevalences below 0.1% can be drastically improved by using Firth correction and boosting in particular, especially for ultra-low prevalences. When a moderate number of low prevalence exposures is available, we recommend the use of penalized techniques.

Suggested Citation

  • Sam Doerken & Marta Avalos & Emmanuel Lagarde & Martin Schumacher, 2019. "Penalized logistic regression with low prevalence exposures beyond high dimensional settings," PLOS ONE, Public Library of Science, vol. 14(5), pages 1-14, May.
  • Handle: RePEc:plo:pone00:0217057
    DOI: 10.1371/journal.pone.0217057
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0217057
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0217057&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0217057?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Gerhard Tutz & Harald Binder, 2006. "Generalized Additive Modeling with Implicit Variable Selection by Likelihood-Based Boosting," Biometrics, The International Biometric Society, vol. 62(4), pages 961-971, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Tutz, Gerhard & Pößnecker, Wolfgang & Uhlmann, Lorenz, 2015. "Variable selection in general multinomial logit models," Computational Statistics & Data Analysis, Elsevier, vol. 82(C), pages 207-222.
    2. Philip Kostov, 2010. "Do Buyers’ Characteristics and Personal Relationships Affect Agricultural Land Prices?," Land Economics, University of Wisconsin Press, vol. 86(1), pages 48-65.
    3. Marra, Giampiero & Wood, Simon N., 2011. "Practical variable selection for generalized additive models," Computational Statistics & Data Analysis, Elsevier, vol. 55(7), pages 2372-2387, July.
    4. Osamu Komori, 2011. "A boosting method for maximization of the area under the ROC curve," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 63(5), pages 961-979, October.
    5. Stefanie Hieke & Axel Benner & Richard F Schlenk & Martin Schumacher & Lars Bullinger & Harald Binder, 2016. "Identifying Prognostic SNPs in Clinical Cohorts: Complementing Univariate Analyses by Resampling and Multivariable Modeling," PLOS ONE, Public Library of Science, vol. 11(5), pages 1-18, May.
    6. Faisal Zahid & Gerhard Tutz, 2013. "Multinomial logit models with implicit variable selection," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 7(4), pages 393-416, December.
    7. Gerhard Tutz & Gunther Schauberger, 2015. "A Penalty Approach to Differential Item Functioning in Rasch Models," Psychometrika, Springer;The Psychometric Society, vol. 80(1), pages 21-43, March.
    8. Battauz, Michela & Vidoni, Paolo, 2022. "A likelihood-based boosting algorithm for factor analysis models with binary data," Computational Statistics & Data Analysis, Elsevier, vol. 168(C).
    9. Hendrik van der Wurp & Andreas Groll, 2023. "Introducing LASSO-type penalisation to generalised joint regression modelling for count data," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 107(1), pages 127-151, March.
    10. Junyi Xin & Dongying Gu & Shuwei Li & Sangni Qian & Yifei Cheng & Wei Shao & Shuai Ben & Silu Chen & Linjun Zhu & Mingjuan Jin & Kun Chen & Zhibin Hu & Zhengdong Zhang & Mulong Du & Hongbing Shen & Me, 2024. "Integration of pathologic characteristics, genetic risk and lifestyle exposure for colorectal cancer survival assessment," Nature Communications, Nature, vol. 15(1), pages 1-11, December.
    11. Fabian Scheipl & Thomas Kneib & Ludwig Fahrmeir, 2013. "Penalized likelihood and Bayesian function selection in regression models," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 97(4), pages 349-385, October.
    12. Colin Griesbach & Andreas Mayr & Elisabeth Bergherr, 2023. "Variable Selection and Allocation in Joint Models via Gradient Boosting Techniques," Mathematics, MDPI, vol. 11(2), pages 1-16, January.
    13. Joseph Sexton & Petter Laake, 2009. "Stochastic Approximation Boosting for Incomplete Data Problems," Biometrics, The International Biometric Society, vol. 65(4), pages 1156-1163, December.
    14. Zheng, Shurong, 2008. "Selection of components and degrees of smoothing via lasso in high dimensional nonparametric additive models," Computational Statistics & Data Analysis, Elsevier, vol. 53(1), pages 164-175, September.
    15. Thomas Kneib & Torsten Hothorn & Gerhard Tutz, 2009. "Variable Selection and Model Choice in Geoadditive Regression Models," Biometrics, The International Biometric Society, vol. 65(2), pages 626-634, June.
    16. Lore Zumeta-Olaskoaga & Maximilian Weigert & Jon Larruskain & Eder Bikandi & Igor Setuain & Josean Lekue & Helmut Küchenhoff & Dae-Jin Lee, 2023. "Prediction of sports injuries in football: a recurrent time-to-event approach using regularized Cox models," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 107(1), pages 101-126, March.
    17. Peter C. B. Phillips & Zhentao Shi, 2021. "Boosting: Why You Can Use The Hp Filter," International Economic Review, Department of Economics, University of Pennsylvania and Osaka University Institute of Social and Economic Research Association, vol. 62(2), pages 521-570, May.
    18. Simon N. Wood, 2020. "Inference and computation with generalized additive models and their extensions," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 29(2), pages 307-339, June.
    19. Colin Griesbach & Andreas Groll & Elisabeth Bergherr, 2021. "Addressing cluster-constant covariates in mixed effects models via likelihood-based boosting techniques," PLOS ONE, Public Library of Science, vol. 16(7), pages 1-17, July.
    20. Shafik, Nivien & Tutz, Gerhard, 2009. "Boosting nonlinear additive autoregressive time series," Computational Statistics & Data Analysis, Elsevier, vol. 53(7), pages 2453-2464, May.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0217057. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.