IDEAS home Printed from https://ideas.repec.org/a/eee/ecomod/v406y2019icp109-120.html
   My bibliography  Save this article

Hyperparameter tuning and performance assessment of statistical and machine-learning algorithms using spatial data

Author

Listed:
  • Schratz, Patrick
  • Muenchow, Jannes
  • Iturritxa, Eugenia
  • Richter, Jakob
  • Brenning, Alexander

Abstract

While the application of machine-learning algorithms has been highly simplified in the last years due to their well-documented integration in commonly used statistical programming languages (such as R or Python), there are several practical challenges in the field of ecological modeling related to unbiased performance estimation. One is the influence of spatial autocorrelation in both hyperparameter tuning and performance estimation. Grouped cross-validation strategies have been proposed in recent years in environmental as well as medical contexts to reduce bias in predictive performance. In this study we show the effects of spatial autocorrelation on hyperparameter tuning and performance estimation by comparing several widely used machine-learning algorithms such as boosted regression trees (BRT), k-nearest neighbor (KNN), random forest (RF) and support vector machine (SVM) with traditional parametric algorithms such as logistic regression (GLM) and semi-parametric ones like generalized additive models (GAM) in terms of predictive performance. Spatial and non-spatial cross-validation methods were used to evaluate model performances aiming to obtain bias-reduced performance estimates. A detailed analysis on the sensitivity of hyperparameter tuning when using different resampling methods (spatial/non-spatial) was performed. As a case study the spatial distribution of forest disease (Diplodia sapinea) in the Basque Country (Spain) was investigated using common environmental variables such as temperature, precipitation, soil and lithology as predictors. Random Forest (mean Brier score estimate of 0.166) outperformed all other methods with regard to predictive accuracy. Though the sensitivity to hyperparameter tuning differed between the ML algorithms, there were in most cases no substantial differences between spatial and non-spatial partitioning for hyperparameter tuning. However, spatial hyperparameter tuning maintains consistency with spatial estimation of classifier performance and should be favored over non-spatial hyperparameter optimization. High performance differences (up to 47%) between the bias-reduced (spatial cross-validation) and overoptimistic (non-spatial cross-validation) cross-validation settings showed the high need to account for the influence of spatial autocorrelation. Overoptimistic performance estimates may lead to false actions in ecological decision making based on biased model predictions.

Suggested Citation

  • Schratz, Patrick & Muenchow, Jannes & Iturritxa, Eugenia & Richter, Jakob & Brenning, Alexander, 2019. "Hyperparameter tuning and performance assessment of statistical and machine-learning algorithms using spatial data," Ecological Modelling, Elsevier, vol. 406(C), pages 109-120.
  • Handle: RePEc:eee:ecomod:v:406:y:2019:i:c:p:109-120
    DOI: 10.1016/j.ecolmodel.2019.06.002
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0304380019302145
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.ecolmodel.2019.06.002?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Wright, Marvin N. & Ziegler, Andreas, 2017. "ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 77(i01).
    2. Voyant, Cyril & Notton, Gilles & Kalogirou, Soteris & Nivet, Marie-Laure & Paoli, Christophe & Motte, Fabrice & Fouilloy, Alexis, 2017. "Machine learning methods for solar radiation forecasting: A review," Renewable Energy, Elsevier, vol. 105(C), pages 569-582.
    3. Jarnevich, Catherine S. & Talbert, Marian & Morisette, Jeffery & Aldridge, Cameron & Brown, Cynthia S. & Kumar, Sunil & Manier, Daniel & Talbert, Colin & Holcombe, Tracy, 2017. "Minimizing effects of methodological decisions on interpretation and prediction in species distribution studies: An example with background selection," Ecological Modelling, Elsevier, vol. 363(C), pages 48-56.
    4. Wieland, Ralf & Kerkow, Antje & Früh, Linus & Kampen, Helge & Walther, Doreen, 2017. "Automated feature selection for a machine learning approach toward modeling a mosquito distribution," Ecological Modelling, Elsevier, vol. 352(C), pages 108-112.
    5. Gneiting, Tilmann & Raftery, Adrian E., 2007. "Strictly Proper Scoring Rules, Prediction, and Estimation," Journal of the American Statistical Association, American Statistical Association, vol. 102, pages 359-378, March.
    6. Tomislav Hengl & Jorge Mendes de Jesus & Gerard B M Heuvelink & Maria Ruiperez Gonzalez & Milan Kilibarda & Aleksandar Blagotić & Wei Shangguan & Marvin N Wright & Xiaoyuan Geng & Bernhard Bauer-Marsc, 2017. "SoilGrids250m: Global gridded soil information based on machine learning," PLOS ONE, Public Library of Science, vol. 12(2), pages 1-40, February.
    7. Vorpahl, Peter & Elsenbeer, Helmut & Märker, Michael & Schröder, Boris, 2012. "How can statistical models help to determine driving factors of landslides?," Ecological Modelling, Elsevier, vol. 239(C), pages 27-39.
    8. Srivastava, Vivek & Griess, Verena C. & Padalia, Hitendra, 2018. "Mapping invasion potential using ensemble modelling. A case study on Yushania maling in the Darjeeling Himalayas," Ecological Modelling, Elsevier, vol. 385(C), pages 35-44.
    9. Racine, Jeff, 2000. "Consistent cross-validatory model-selection for dependent data: hv-block cross-validation," Journal of Econometrics, Elsevier, vol. 99(1), pages 39-61, November.
    10. Watanabe, Marcos D.B. & Ortega, Enrique, 2014. "Dynamic emergy accounting of water and carbon ecosystem services: A model to simulate the impacts of land-use change," Ecological Modelling, Elsevier, vol. 271(C), pages 113-131.
    11. Karatzoglou, Alexandros & Smola, Alexandros & Hornik, Kurt & Zeileis, Achim, 2004. "kernlab - An S4 Package for Kernel Methods in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 11(i09).
    12. Baasch, David M. & Tyre, Andrew J. & Millspaugh, Joshua J. & Hygnstrom, Scott E. & Vercauteren, Kurt C., 2010. "An evaluation of three statistical methods used to model resource selection," Ecological Modelling, Elsevier, vol. 221(4), pages 565-574.
    13. Halvorsen, Rune & Mazzoni, Sabrina & Dirksen, John Wirkola & Næsset, Erik & Gobakken, Terje & Ohlson, Mikael, 2016. "How important are choice of model selection method and spatial autocorrelation of presence data for distribution modelling by MaxEnt?," Ecological Modelling, Elsevier, vol. 328(C), pages 108-118.
    14. Loehle, Craig, 2018. "Disequilibrium and relaxation times for species responses to climate change," Ecological Modelling, Elsevier, vol. 384(C), pages 23-29.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Katarzyna Kopczewska, 2022. "Spatial machine learning: new opportunities for regional science," The Annals of Regional Science, Springer;Western Regional Science Association, vol. 68(3), pages 713-755, June.
    2. Meyer, Hanna & Reudenbach, Christoph & Wöllauer, Stephan & Nauss, Thomas, 2019. "Importance of spatial predictor variable selection in machine learning applications – Moving from data reproduction to spatial prediction," Ecological Modelling, Elsevier, vol. 411(C).
    3. Juergen Deppner & Marcelo Cajias, 2024. "Accounting for Spatial Autocorrelation in Algorithm-Driven Hedonic Models: A Spatial Cross-Validation Approach," The Journal of Real Estate Finance and Economics, Springer, vol. 68(2), pages 235-273, February.
    4. Vo Thanh, Hung & Zamanyad, Aiyoub & Safaei-Farouji, Majid & Ashraf, Umar & Hemeng, Zhang, 2022. "Application of hybrid artificial intelligent models to predict deliverability of underground natural gas storage sites," Renewable Energy, Elsevier, vol. 200(C), pages 169-184.
    5. Morakot Worachairungreung & Sarawut Ninsawat & Apichon Witayangkurn & Matthew N. Dailey, 2021. "Identification of Road Traffic Injury Risk Prone Area Using Environmental Factors by Machine Learning Classification in Nonthaburi, Thailand," Sustainability, MDPI, vol. 13(7), pages 1-25, April.
    6. Zhu Liang & Wei Liu & Weiping Peng & Lingwei Chen & Changming Wang, 2022. "Improved Shallow Landslide Susceptibility Prediction Based on Statistics and Ensemble Learning," Sustainability, MDPI, vol. 14(10), pages 1-21, May.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Mariana Oliveira & Luís Torgo & Vítor Santos Costa, 2021. "Evaluation Procedures for Forecasting with Spatiotemporal Data," Mathematics, MDPI, vol. 9(6), pages 1-27, March.
    2. Yang, Dazhi & van der Meer, Dennis, 2021. "Post-processing in solar forecasting: Ten overarching thinking tools," Renewable and Sustainable Energy Reviews, Elsevier, vol. 140(C).
    3. Arezou Rahimi & Luis A. Vale-Silva & Maria Fälth Savitski & Jovan Tanevski & Julio Saez-Rodriguez, 2024. "DOT: a flexible multi-objective optimization framework for transferring features across single-cell and spatial omics," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    4. Samir K. Safi & Sheema Gul, 2024. "An Enhanced Tree Ensemble for Classification in the Presence of Extreme Class Imbalance," Mathematics, MDPI, vol. 12(20), pages 1-17, October.
    5. Fois, Mauro & Cuena-Lombraña, Alba & Fenu, Giuseppe & Bacchetta, Gianluigi, 2018. "Using species distribution models at local scale to guide the search of poorly known species: Review, methodological issues and future directions," Ecological Modelling, Elsevier, vol. 385(C), pages 124-132.
    6. Andree,Bo Pieter Johannes & Chamorro Elizondo,Andres Fernando & Kraay,Aart C. & Spencer,Phoebe Girouard & Wang,Dieter, 2020. "Predicting Food Crises," Policy Research Working Paper Series 9412, The World Bank.
    7. Patrick José Jeetze & Isabelle Weindl & Justin Andrew Johnson & Pasquale Borrelli & Panos Panagos & Edna J. Molina Bacca & Kristine Karstens & Florian Humpenöder & Jan Philipp Dietrich & Sara Minoli &, 2023. "Projected landscape-scale repercussions of global action for climate and biodiversity protection," Nature Communications, Nature, vol. 14(1), pages 1-14, December.
    8. Fitzpatrick, Trevor & Mues, Christophe, 2021. "How can lenders prosper? Comparing machine learning approaches to identify profitable peer-to-peer loan investments," European Journal of Operational Research, Elsevier, vol. 294(2), pages 711-722.
    9. Maia, Mateus & Murphy, Keefe & Parnell, Andrew C., 2024. "GP-BART: A novel Bayesian additive regression trees approach using Gaussian processes," Computational Statistics & Data Analysis, Elsevier, vol. 190(C).
    10. Konrad Bogner & Florian Pappenberger & Massimiliano Zappa, 2019. "Machine Learning Techniques for Predicting the Energy Consumption/Production and Its Uncertainties Driven by Meteorological Observations and Forecasts," Sustainability, MDPI, vol. 11(12), pages 1-22, June.
    11. Petropoulos, Fotios & Apiletti, Daniele & Assimakopoulos, Vassilios & Babai, Mohamed Zied & Barrow, Devon K. & Ben Taieb, Souhaib & Bergmeir, Christoph & Bessa, Ricardo J. & Bijak, Jakub & Boylan, Joh, 2022. "Forecasting: theory and practice," International Journal of Forecasting, Elsevier, vol. 38(3), pages 705-871.
      • Fotios Petropoulos & Daniele Apiletti & Vassilios Assimakopoulos & Mohamed Zied Babai & Devon K. Barrow & Souhaib Ben Taieb & Christoph Bergmeir & Ricardo J. Bessa & Jakub Bijak & John E. Boylan & Jet, 2020. "Forecasting: theory and practice," Papers 2012.03854, arXiv.org, revised Jan 2022.
    12. Anton M. Potapov & Carlos A. Guerra & Johan Hoogen & Anatoly Babenko & Bruno C. Bellini & Matty P. Berg & Steven L. Chown & Louis Deharveng & Ľubomír Kováč & Natalia A. Kuznetsova & Jean-François Pong, 2023. "Globally invariant metabolism but density-diversity mismatch in springtails," Nature Communications, Nature, vol. 14(1), pages 1-13, December.
    13. Daniel S. Maynard & Lalasia Bialic-Murphy & Constantin M. Zohner & Colin Averill & Johan Hoogen & Haozhi Ma & Lidong Mo & Gabriel Reuben Smith & Alicia T. R. Acosta & Isabelle Aubin & Erika Berenguer , 2022. "Global relationships in tree functional traits," Nature Communications, Nature, vol. 13(1), pages 1-12, December.
    14. Roland Langrock & Théo Michelot & Alexander Sohn & Thomas Kneib, 2015. "Semiparametric stochastic volatility modelling using penalized splines," Computational Statistics, Springer, vol. 30(2), pages 517-537, June.
    15. Erdener, Burcin Cakir & Feng, Cong & Doubleday, Kate & Florita, Anthony & Hodge, Bri-Mathias, 2022. "A review of behind-the-meter solar forecasting," Renewable and Sustainable Energy Reviews, Elsevier, vol. 160(C).
    16. Bommert, Andrea & Sun, Xudong & Bischl, Bernd & Rahnenführer, Jörg & Lang, Michel, 2020. "Benchmark for filter methods for feature selection in high-dimensional classification data," Computational Statistics & Data Analysis, Elsevier, vol. 143(C).
    17. Muniain, Peru & Ziel, Florian, 2020. "Probabilistic forecasting in day-ahead electricity markets: Simulating peak and off-peak prices," International Journal of Forecasting, Elsevier, vol. 36(4), pages 1193-1210.
    18. Backer, David & Billing, Trey, 2024. "Forecasting the prevalence of child acute malnutrition using environmental and conflict conditions as leading indicators," World Development, Elsevier, vol. 176(C).
    19. Azar, Pablo D. & Micali, Silvio, 2018. "Computational principal agent problems," Theoretical Economics, Econometric Society, vol. 13(2), May.
    20. Voyant, Cyril & Motte, Fabrice & Notton, Gilles & Fouilloy, Alexis & Nivet, Marie-Laure & Duchaud, Jean-Laurent, 2018. "Prediction intervals for global solar irradiation forecasting using regression trees methods," Renewable Energy, Elsevier, vol. 126(C), pages 332-340.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:ecomod:v:406:y:2019:i:c:p:109-120. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.journals.elsevier.com/ecological-modelling .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.