IDEAS home Printed from https://ideas.repec.org/a/eee/ecomod/v217y2008i1p48-58.html
   My bibliography  Save this article

A comparison of the performance of threshold criteria for binary classification in terms of predicted prevalence and kappa

Author

Listed:
  • Freeman, Elizabeth A.
  • Moisen, Gretchen G.

Abstract

Modelling techniques used in binary classification problems often result in a predicted probability surface, which is then translated into a presence–absence classification map. However, this translation requires a (possibly subjective) choice of threshold above which the variable of interest is predicted to be present. The selection of this threshold value can have dramatic effects on model accuracy as well as the predicted prevalence for the variable (the overall proportion of locations where the variable is predicted to be present). The traditional default is to simply use a threshold of 0.5 as the cut-off, but this does not necessarily preserve the observed prevalence or result in the highest prediction accuracy, especially for data sets with very high or very low observed prevalence. Alternatively, the thresholds can be chosen to optimize map accuracy, as judged by various criteria. Here we examine the effect of 11 of these potential criteria on predicted prevalence, prediction accuracy, and the resulting map output. Comparisons are made using output from presence–absence models developed for 13 tree species in the northern mountains of Utah. We found that species with poor model quality or low prevalence were most sensitive to the choice of threshold. For these species, a 0.5 cut-off was unreliable, sometimes resulting in substantially lower kappa and underestimated prevalence, with possible detrimental effects on a management decision. If a management objective requires a map to portray unbiased estimates of species prevalence, then the best results were obtained from thresholds deliberately chosen so that the predicted prevalence equaled the observed prevalence, followed closely by thresholds chosen to maximize kappa. These were also the two criteria with the highest mean kappa from our independent test data. For particular management applications the special cases of user specified required accuracy may be most appropriate. Ultimately, maps will typically have multiple and somewhat conflicting management applications. Therefore, providing users with a continuous probability surface may be the most versatile and powerful method, allowing threshold choice to be matched with each maps intended use.

Suggested Citation

  • Freeman, Elizabeth A. & Moisen, Gretchen G., 2008. "A comparison of the performance of threshold criteria for binary classification in terms of predicted prevalence and kappa," Ecological Modelling, Elsevier, vol. 217(1), pages 48-58.
  • Handle: RePEc:eee:ecomod:v:217:y:2008:i:1:p:48-58
    DOI: 10.1016/j.ecolmodel.2008.05.015
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0304380008002275
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.ecolmodel.2008.05.015?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Freeman, Elizabeth A. & Moisen, Gretchen, 2008. "PresenceAbsence: An R Package for Presence Absence Analysis," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 23(i11).
    2. Cramer,J. S., 2011. "Logit Models from Economics and Other Fields," Cambridge Books, Cambridge University Press, number 9780521188036, November.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Pecchi, Matteo & Marchi, Maurizio & Burton, Vanessa & Giannetti, Francesca & Moriondo, Marco & Bernetti, Iacopo & Bindi, Marco & Chirici, Gherardo, 2019. "Species distribution modelling to support forest management. A literature review," Ecological Modelling, Elsevier, vol. 411(C).
    2. Václavík, Tomáš & Meentemeyer, Ross K., 2009. "Invasive species distribution modeling (iSDM): Are absence data and dispersal constraints needed to predict actual distributions?," Ecological Modelling, Elsevier, vol. 220(23), pages 3248-3258.
    3. Aziza Usmanova & Ahmed Aziz & Dilshodjon Rakhmonov & Walid Osamy, 2022. "Utilities of Artificial Intelligence in Poverty Prediction: A Review," Sustainability, MDPI, vol. 14(21), pages 1-39, October.
    4. Watling, James I. & Romañach, Stephanie S. & Bucklin, David N. & Speroterra, Carolina & Brandt, Laura A. & Pearlstine, Leonard G. & Mazzotti, Frank J., 2012. "Do bioclimate variables improve performance of climate envelope models?," Ecological Modelling, Elsevier, vol. 246(C), pages 79-85.
    5. Nenzén, H.K. & Araújo, M.B., 2011. "Choice of threshold alters projections of species range shifts under climate change," Ecological Modelling, Elsevier, vol. 222(18), pages 3346-3354.
    6. Salvador Arenas-Castro & João Gonçalves & Paulo Alves & Domingo Alcaraz-Segura & João P Honrado, 2018. "Assessing the multi-scale predictive ability of ecosystem functional attributes for species distribution modelling," PLOS ONE, Public Library of Science, vol. 13(6), pages 1-31, June.
    7. Alessandra Guglielmi & Francesca Ieva & Anna M. Paganoni & Fabrizio Ruggeri & Jacopo Soriano, 2014. "Semiparametric Bayesian models for clustering and classification in the presence of unbalanced in-hospital survival," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 63(1), pages 25-46, January.
    8. Brice B Hanberry & Hong S He & Brian J Palik, 2012. "Pseudoabsence Generation Strategies for Species Distribution Models," PLOS ONE, Public Library of Science, vol. 7(8), pages 1-12, August.
    9. Benkendorf, Donald J. & Schwartz, Samuel D. & Cutler, D. Richard & Hawkins, Charles P., 2023. "Correcting for the effects of class imbalance improves the performance of machine-learning based species distribution models," Ecological Modelling, Elsevier, vol. 483(C).
    10. Peter M Rose & Mark J Kennard & David B Moffatt & Fran Sheldon & Gavin L Butler, 2016. "Testing Three Species Distribution Modelling Strategies to Define Fish Assemblage Reference Conditions for Stream Bioassessment and Related Applications," PLOS ONE, Public Library of Science, vol. 11(1), pages 1-23, January.
    11. Sillero, Neftalí & Arenas-Castro, Salvador & Enriquez‐Urzelai, Urtzi & Vale, Cândida Gomes & Sousa-Guedes, Diana & Martínez-Freiría, Fernando & Real, Raimundo & Barbosa, A.Márcia, 2021. "Want to model a species niche? A step-by-step guideline on correlative ecological niche modelling," Ecological Modelling, Elsevier, vol. 456(C).
    12. Dean Fantazzini & Yufeng Xiao, 2023. "Detecting Pump-and-Dumps with Crypto-Assets: Dealing with Imbalanced Datasets and Insiders’ Anticipated Purchases," Econometrics, MDPI, vol. 11(3), pages 1-73, August.
    13. Vu, Khoa & Vuong, Nguyen Dinh Tuan & Vu-Thanh, Tu-Anh & Nguyen, Anh Ngoc, 2022. "Income shock and food insecurity prediction Vietnam under the pandemic," World Development, Elsevier, vol. 153(C).
    14. Liu, Fang & McShea, William J. & Li, Diqiang, 2017. "Correlating habitat suitability with landscape connectivity: A case study of Sichuan golden monkey in China," Ecological Modelling, Elsevier, vol. 353(C), pages 37-46.
    15. Alexandra D Syphard & Avi Bar Massada & Van Butsic & Jon E Keeley, 2013. "Land Use Planning and Wildfire: Development Policies Influence Future Probability of Housing Loss," PLOS ONE, Public Library of Science, vol. 8(8), pages 1-12, August.
    16. Freeman, Elizabeth A. & Moisen, Gretchen G. & Frescino, Tracey S., 2012. "Evaluating effectiveness of down-sampling for stratified designs and unbalanced prevalence in Random Forest models of tree species distributions in Nevada," Ecological Modelling, Elsevier, vol. 233(C), pages 1-10.
    17. Toshiya Matsuura & Ken Sugimura & Asako Miyamoto & Nobuhiko Tanaka, 2013. "Knowledge-Based Estimation of Edible Fern Harvesting Sites in Mountainous Communities of Northeastern Japan," Sustainability, MDPI, vol. 6(1), pages 1-18, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Annemiek Vuren & Daniel Vuuren, 2007. "Financial Incentives in Disability Insurance in the Netherlands," De Economist, Springer, vol. 155(1), pages 73-98, March.
    2. Gordon Kemp & João Santos Silva, 2016. "Partial effects in fixed-effects models," United Kingdom Stata Users' Group Meetings 2016 06, Stata Users Group.
    3. Akpoti, Komlavi & Groen, Thomas & Dossou-Yovo, Elliott & Kabo-bah, Amos T. & Zwart, Sander J., 2022. "Climate change-induced reduction in agricultural land suitability of West-Africa's inland valley landscapes," Agricultural Systems, Elsevier, vol. 200(C).
    4. Trinh, Thoai Quang & Rañola, Roberto F. & Camacho, Leni D. & Simelton, Elisabeth, 2018. "Determinants of farmers’ adaptation to climate change in agricultural production in the central region of Vietnam," Land Use Policy, Elsevier, vol. 70(C), pages 224-231.
    5. Sillero, Neftalí & Campos, João Carlos & Arenas-Castro, Salvador & Barbosa, A.Márcia, 2023. "A curated list of R packages for ecological niche modelling," Ecological Modelling, Elsevier, vol. 476(C).
    6. Eleftherios Giovanis, 2012. "Study of Discrete Choice Models and Adaptive Neuro-Fuzzy Inference System in the Prediction of Economic Crisis Periods in USA," Economic Analysis and Policy, Elsevier, vol. 42(1), pages 79-96, March.
    7. Karacuka, Mehmet & Çatık, A. Nazif & Haucap, Justus, 2013. "Consumer choice and local network effects in mobile telecommunications in Turkey," Telecommunications Policy, Elsevier, vol. 37(4), pages 334-344.
    8. Viktoria Graskemper & Xiaohua Yu & Jan‐Henning Feil, 2021. "Analyzing strategic entrepreneurial choices in agriculture—Empirical evidence from Germany," Agribusiness, John Wiley & Sons, Ltd., vol. 37(3), pages 569-589, July.
    9. Guido, Cataife, 2007. "The pronouncements of paranoid politicians," MPRA Paper 4473, University Library of Munich, Germany.
    10. Nuria Ceular-Villamandos & Virginia Navajas-Romero & Lorena Caridad y López del Río & Lucia Zita Zambrano-Santos, 2021. "Workplace Situation and Well-Being of Ecuadorian Self-Employed," Sustainability, MDPI, vol. 13(4), pages 1-26, February.
    11. Maalouf, Maher & Trafalis, Theodore B., 2011. "Robust weighted kernel logistic regression in imbalanced and rare events data," Computational Statistics & Data Analysis, Elsevier, vol. 55(1), pages 168-183, January.
    12. Pliscoff, Patricio & Luebert, Federico & Hilger, Hartmut H. & Guisan, Antoine, 2014. "Effects of alternative sets of climatic predictors on species distribution models and associated estimates of extinction risk: A test with plants in an arid environment," Ecological Modelling, Elsevier, vol. 288(C), pages 166-177.
    13. Damian Walczak & Dorota Krupa, 2020. "Exchange Transactions and Socioeconomic Determinants of Solidarity: The Case of Post-Solidarity Poland," European Research Studies Journal, European Research Studies Journal, vol. 0(3), pages 364-377.
    14. Beate Henschel, 2008. "Why is the share of women willing to work in East Germany larger than in West Germany? A logit model of extensive labour supply decision," ifo Working Paper Series 56, ifo Institute - Leibniz Institute for Economic Research at the University of Munich.
    15. Jaap Boter & Jan Rouwendal & Michel Wedel, 2005. "Employing Travel Time to Compare the Value of Competing Cultural Organizations," Journal of Cultural Economics, Springer;The Association for Cultural Economics International, vol. 29(1), pages 19-33, February.
    16. Donato Masciandaro, 2012. "Determinants of Financial Supervision Regimes: Markets, Institutions, Politics, Law or Geography?," Chapters, in: Kern Alexander & Rahul Dhumale (ed.), Research Handbook on International Financial Regulation, chapter 14, Edward Elgar Publishing.
    17. Amanda West & Sunil Kumar & Catherine Jarnevich, 2016. "Regional modeling of large wildfires under current and potential future climates in Colorado and Wyoming, USA," Climatic Change, Springer, vol. 134(4), pages 565-577, February.
    18. Ehara, Makoto & Matsuura, Toshiya & Gong, Hao & Sokh, Heng & Leng, Chivin & Choeung, Hong Narith & Sem, Rida & Nomura, Hisako & Tsuyama, Ikutaro & Matsui, Tetsuya & Hyakumura, Kimihiko, 2023. "Where do people vulnerable to deforestation live? Triaging forest conservation interventions for sustainable non-timber forest products," Land Use Policy, Elsevier, vol. 131(C).
    19. Agnieszka Kurdyś-Kujawska & Agnieszka Strzelecka & Danuta Zawadzka, 2021. "The Impact of Crop Diversification on the Economic Efficiency of Small Farms in Poland," Agriculture, MDPI, vol. 11(3), pages 1-21, March.
    20. Paolo Coccorese & Alfonso Pellecchia, 2006. "Local Tourism Features in Italy: A Binomial Logit Analysis," Tourism Economics, , vol. 12(4), pages 565-583, December.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:ecomod:v:217:y:2008:i:1:p:48-58. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.journals.elsevier.com/ecological-modelling .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.