IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1008831.html
   My bibliography  Save this article

Accuracy in the prediction of disease epidemics when ensembling simple but highly correlated models

Author

Listed:
  • Denis A Shah
  • Erick D De Wolf
  • Pierce A Paul
  • Laurence V Madden

Abstract

Ensembling combines the predictions made by individual component base models with the goal of achieving a predictive accuracy that is better than that of any one of the constituent member models. Diversity among the base models in terms of predictions is a crucial criterion in ensembling. However, there are practical instances when the available base models produce highly correlated predictions, because they may have been developed within the same research group or may have been built from the same underlying algorithm. We investigated, via a case study on Fusarium head blight (FHB) on wheat in the U.S., whether ensembles of simple yet highly correlated models for predicting the risk of FHB epidemics, all generated from logistic regression, provided any benefit to predictive performance, despite relatively low levels of base model diversity. Three ensembling methods were explored: soft voting, weighted averaging of smaller subsets of the base models, and penalized regression as a stacking algorithm. Soft voting and weighted model averages were generally better at classification than the base models, though not universally so. The performances of stacked regressions were superior to those of the other two ensembling methods we analyzed in this study. Ensembling simple yet correlated models is computationally feasible and is therefore worth pursuing for models of epidemic risk.Author summary: Ensembling takes a set of predictions from individual models and combines them such that the performance of the ensemble is ideally better than that of any one of the constituent models. Ensembling requires diversity among the individual models in terms of their predictions. However, models developed within the same research group may in fact be interrelated, and high levels of correlation among their predictions could theoretically negate any ensembling benefit. We examined, using a case study on predicting epidemics of Fusarium head blight of wheat, whether ensembling could still be beneficial when the individual models were simple but highly correlated. Even in this situation ensembling led to improvements in prediction without a high computational cost and was therefore profitable even when the diversity in model predictions was low.

Suggested Citation

  • Denis A Shah & Erick D De Wolf & Pierce A Paul & Laurence V Madden, 2021. "Accuracy in the prediction of disease epidemics when ensembling simple but highly correlated models," PLOS Computational Biology, Public Library of Science, vol. 17(3), pages 1-23, March.
  • Handle: RePEc:plo:pcbi00:1008831
    DOI: 10.1371/journal.pcbi.1008831
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008831
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1008831&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1008831?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Evan L Ray & Nicholas G Reich, 2018. "Prediction of infectious disease epidemics via weighted density ensembles," PLOS Computational Biology, Public Library of Science, vol. 14(2), pages 1-23, February.
    2. Bradley Efron, 2020. "Prediction, Estimation, and Attribution," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 115(530), pages 636-655, April.
    3. Chen, Kefei & O'Leary, Rebecca A. & Evans, Fiona H., 2019. "A simple and parsimonious generalised additive model for predicting wheat yield in a decision support tool," Agricultural Systems, Elsevier, vol. 173(C), pages 140-150.
    4. Bradley Efron, 2020. "Prediction, Estimation, and Attribution," International Statistical Review, International Statistical Institute, vol. 88(S1), pages 28-59, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Benítez-Peña, Sandra & Carrizosa, Emilio & Guerrero, Vanesa & Jiménez-Gamero, M. Dolores & Martín-Barragán, Belén & Molero-Río, Cristina & Ramírez-Cobo, Pepa & Romero Morales, Dolores & Sillero-Denami, 2021. "On sparse ensemble methods: An application to short-term predictions of the evolution of COVID-19," European Journal of Operational Research, Elsevier, vol. 295(2), pages 648-663.
    2. Manski, Charles F., 2023. "Probabilistic prediction for binary treatment choice: With focus on personalized medicine," Journal of Econometrics, Elsevier, vol. 234(2), pages 647-663.
    3. Weishampel, Anthony & Staicu, Ana-Maria & Rand, William, 2023. "Classification of social media users with generalized functional data analysis," Computational Statistics & Data Analysis, Elsevier, vol. 179(C).
    4. Nelson P. Rayl & Nitish R. Sinha, 2022. "Integrating Prediction and Attribution to Classify News," Finance and Economics Discussion Series 2022-042, Board of Governors of the Federal Reserve System (U.S.).
    5. Paolo Libenzio Brignoli & Alessandro Varacca & Cornelis Gardebroek & Paolo Sckokai, 2024. "Machine learning to predict grains futures prices," Agricultural Economics, International Association of Agricultural Economists, vol. 55(3), pages 479-497, May.
    6. M. Merz & R. Richman & T. Tsanakas & M. V. Wuthrich, 2021. "Interpreting Deep Learning Models with Marginal Attribution by Conditioning on Quantiles," Papers 2103.11706, arXiv.org.
    7. Rich, Jeppe & Myhrmann, Marcus Skyum & Mabit, Stefan Eriksen, 2023. "Our children cycle less - A Danish pseudo-panel analysis," Journal of Transport Geography, Elsevier, vol. 106(C).
    8. Chun Chieh Fan & Robert Loughnan & Carolina Makowski & Diliana Pecheva & Chi-Hua Chen & Donald J. Hagler & Wesley K. Thompson & Nadine Parker & Dennis van der Meer & Oleksandr Frei & Ole A. Andreassen, 2022. "Multivariate genome-wide association study on tissue-sensitive diffusion metrics highlights pathways that shape the human brain," Nature Communications, Nature, vol. 13(1), pages 1-10, December.
    9. Jack Jewson & David Rossell, 2022. "General Bayesian loss function selection and the use of improper models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 84(5), pages 1640-1665, November.
    10. Anna Gottard & Giulia Vannucci & Leonardo Grilli & Carla Rampichini, 2023. "Mixed-effect models with trees," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 17(2), pages 431-461, June.
    11. COJOCARIU Irina-Cristina, 2023. "Analysis Of Sports Performances Using Machine Learning And Statistical Models - A General Analysis Of The Literature," Revista Economica, Lucian Blaga University of Sibiu, Faculty of Economic Sciences, vol. 75(2), pages 34-39, June.
    12. Ord, J. Keith, 2022. "The uncertainty track: Machine learning, statistical modeling, synthesis," International Journal of Forecasting, Elsevier, vol. 38(4), pages 1526-1530.
    13. Victor Quintas-Martinez & Mohammad Taha Bahadori & Eduardo Santiago & Jeff Mu & Dominik Janzing & David Heckerman, 2024. "Multiply-Robust Causal Change Attribution," Papers 2404.08839, arXiv.org, revised Sep 2024.
    14. Jeff Dominitz & Charles F. Manski, 2024. "Comprehensive OOS Evaluation of Predictive Algorithms with Statistical Decision Theory," Papers 2403.11016, arXiv.org, revised May 2024.
    15. Junyi Lu & Sebastian Meyer, 2020. "Forecasting Flu Activity in the United States: Benchmarking an Endemic-Epidemic Beta Model," IJERPH, MDPI, vol. 17(4), pages 1-13, February.
    16. Irina Pilvere & Aleksejs Nipers & Agnese Krievina & Ilze Upite & Daniels Kotovs, 2022. "LASAM Model: An Important Tool in the Decision Support System for Policymakers and Farmers," Agriculture, MDPI, vol. 12(5), pages 1-26, May.
    17. Anna Florence & Andrew Revill & Stephen Hoad & Robert Rees & Mathew Williams, 2021. "The Effect of Antecedence on Empirical Model Forecasts of Crop Yield from Observations of Canopy Properties," Agriculture, MDPI, vol. 11(3), pages 1-16, March.
    18. Yan Hao & Ting Xu & Hongping Hu & Peng Wang & Yanping Bai, 2020. "Prediction and analysis of Corona Virus Disease 2019," PLOS ONE, Public Library of Science, vol. 15(10), pages 1-15, October.
    19. Ray, Evan L. & Brooks, Logan C. & Bien, Jacob & Biggerstaff, Matthew & Bosse, Nikos I. & Bracher, Johannes & Cramer, Estee Y. & Funk, Sebastian & Gerding, Aaron & Johansson, Michael A. & Rumack, Aaron, 2023. "Comparing trained and untrained probabilistic ensemble forecasts of COVID-19 cases and deaths in the United States," International Journal of Forecasting, Elsevier, vol. 39(3), pages 1366-1383.
    20. Sen Pei & Jeffrey Shaman, 2020. "Aggregating forecasts of multiple respiratory pathogens supports more accurate forecasting of influenza-like illness," PLOS Computational Biology, Public Library of Science, vol. 16(10), pages 1-19, October.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1008831. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.