IDEAS home Printed from https://ideas.repec.org/a/eee/ecomod/v233y2012icp1-10.html
   My bibliography  Save this article

Evaluating effectiveness of down-sampling for stratified designs and unbalanced prevalence in Random Forest models of tree species distributions in Nevada

Author

Listed:
  • Freeman, Elizabeth A.
  • Moisen, Gretchen G.
  • Frescino, Tracey S.

Abstract

Random Forests is frequently used to model species distributions over large geographic areas. Complications arise when data used to train the models have been collected in stratified designs that involve different sampling intensity per stratum. The modeling process is further complicated if some of the target species are relatively rare on the landscape leading to an unbalanced number of presences and absences in the training data. We explored means to accommodate unequal sampling intensity across strata as well as the unbalanced species prevalence in Random Forest models for tree and shrub species distributions in the state of Nevada. For the unequal sampling intensity issue, we tested three modeling strategies: fitting models using all the data, down-sampling the intensified stratum; and building separate models for each stratum. We explored unbalanced species prevalence by investigating the effects of down-sampling the more prevalent response (presence or absence), and by optimizing the cutoff thresholds for declaring a species present. When modeling species presence with stratified data that was collected with different sampling intensities per stratum, we found that neither down-sampling the intensified stratum, nor fitting individual strata models, improved model performance. We also found that balancing the number of presences and absences in a training data set by down-sampling did not improve predictive models of species distributions, and did not eliminate the need to optimize thresholds. We then apply our final choice of model to the full raster layers for Nevada to produce statewide species distribution maps.

Suggested Citation

  • Freeman, Elizabeth A. & Moisen, Gretchen G. & Frescino, Tracey S., 2012. "Evaluating effectiveness of down-sampling for stratified designs and unbalanced prevalence in Random Forest models of tree species distributions in Nevada," Ecological Modelling, Elsevier, vol. 233(C), pages 1-10.
  • Handle: RePEc:eee:ecomod:v:233:y:2012:i:c:p:1-10
    DOI: 10.1016/j.ecolmodel.2012.03.007
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0304380012001147
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.ecolmodel.2012.03.007?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Freeman, Elizabeth A. & Moisen, Gretchen G., 2008. "A comparison of the performance of threshold criteria for binary classification in terms of predicted prevalence and kappa," Ecological Modelling, Elsevier, vol. 217(1), pages 48-58.
    2. Freeman, Elizabeth A. & Moisen, Gretchen, 2008. "PresenceAbsence: An R Package for Presence Absence Analysis," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 23(i11).
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Zhang, Quanzhong & Wei, Haiyan & Liu, Jing & Zhao, Zefang & Ran, Qiao & Gu, Wei, 2021. "A Bayesian network with fuzzy mathematics for species habitat suitability analysis: A case with limited Angelica sinensis (Oliv.) Diels data," Ecological Modelling, Elsevier, vol. 450(C).
    2. Benkendorf, Donald J. & Schwartz, Samuel D. & Cutler, D. Richard & Hawkins, Charles P., 2023. "Correcting for the effects of class imbalance improves the performance of machine-learning based species distribution models," Ecological Modelling, Elsevier, vol. 483(C).
    3. Quanzhong Zhang & Haiyan Wei & Zefang Zhao & Jing Liu & Qiao Ran & Junhong Yu & Wei Gu, 2018. "Optimization of the Fuzzy Matter Element Method for Predicting Species Suitability Distribution Based on Environmental Data," Sustainability, MDPI, vol. 10(10), pages 1-16, September.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Václavík, Tomáš & Meentemeyer, Ross K., 2009. "Invasive species distribution modeling (iSDM): Are absence data and dispersal constraints needed to predict actual distributions?," Ecological Modelling, Elsevier, vol. 220(23), pages 3248-3258.
    2. Vorpahl, Peter & Elsenbeer, Helmut & Märker, Michael & Schröder, Boris, 2012. "How can statistical models help to determine driving factors of landslides?," Ecological Modelling, Elsevier, vol. 239(C), pages 27-39.
    3. Akpoti, Komlavi & Groen, Thomas & Dossou-Yovo, Elliott & Kabo-bah, Amos T. & Zwart, Sander J., 2022. "Climate change-induced reduction in agricultural land suitability of West-Africa's inland valley landscapes," Agricultural Systems, Elsevier, vol. 200(C).
    4. Sillero, Neftalí & Campos, João Carlos & Arenas-Castro, Salvador & Barbosa, A.Márcia, 2023. "A curated list of R packages for ecological niche modelling," Ecological Modelling, Elsevier, vol. 476(C).
    5. Sillero, Neftalí & Arenas-Castro, Salvador & Enriquez‐Urzelai, Urtzi & Vale, Cândida Gomes & Sousa-Guedes, Diana & Martínez-Freiría, Fernando & Real, Raimundo & Barbosa, A.Márcia, 2021. "Want to model a species niche? A step-by-step guideline on correlative ecological niche modelling," Ecological Modelling, Elsevier, vol. 456(C).
    6. Vu, Khoa & Vuong, Nguyen Dinh Tuan & Vu-Thanh, Tu-Anh & Nguyen, Anh Ngoc, 2022. "Income shock and food insecurity prediction Vietnam under the pandemic," World Development, Elsevier, vol. 153(C).
    7. Pliscoff, Patricio & Luebert, Federico & Hilger, Hartmut H. & Guisan, Antoine, 2014. "Effects of alternative sets of climatic predictors on species distribution models and associated estimates of extinction risk: A test with plants in an arid environment," Ecological Modelling, Elsevier, vol. 288(C), pages 166-177.
    8. Freeman, Elizabeth A. & Moisen, Gretchen G., 2008. "A comparison of the performance of threshold criteria for binary classification in terms of predicted prevalence and kappa," Ecological Modelling, Elsevier, vol. 217(1), pages 48-58.
    9. Amanda West & Sunil Kumar & Catherine Jarnevich, 2016. "Regional modeling of large wildfires under current and potential future climates in Colorado and Wyoming, USA," Climatic Change, Springer, vol. 134(4), pages 565-577, February.
    10. Ehara, Makoto & Matsuura, Toshiya & Gong, Hao & Sokh, Heng & Leng, Chivin & Choeung, Hong Narith & Sem, Rida & Nomura, Hisako & Tsuyama, Ikutaro & Matsui, Tetsuya & Hyakumura, Kimihiko, 2023. "Where do people vulnerable to deforestation live? Triaging forest conservation interventions for sustainable non-timber forest products," Land Use Policy, Elsevier, vol. 131(C).
    11. Watling, James I. & Romañach, Stephanie S. & Bucklin, David N. & Speroterra, Carolina & Brandt, Laura A. & Pearlstine, Leonard G. & Mazzotti, Frank J., 2012. "Do bioclimate variables improve performance of climate envelope models?," Ecological Modelling, Elsevier, vol. 246(C), pages 79-85.
    12. Salvador Arenas-Castro & João Gonçalves & Paulo Alves & Domingo Alcaraz-Segura & João P Honrado, 2018. "Assessing the multi-scale predictive ability of ecosystem functional attributes for species distribution modelling," PLOS ONE, Public Library of Science, vol. 13(6), pages 1-31, June.
    13. Dean Fantazzini & Yufeng Xiao, 2023. "Detecting Pump-and-Dumps with Crypto-Assets: Dealing with Imbalanced Datasets and Insiders’ Anticipated Purchases," Econometrics, MDPI, vol. 11(3), pages 1-73, August.
    14. Liu, Fang & McShea, William J. & Li, Diqiang, 2017. "Correlating habitat suitability with landscape connectivity: A case study of Sichuan golden monkey in China," Ecological Modelling, Elsevier, vol. 353(C), pages 37-46.
    15. Toshiya Matsuura & Ken Sugimura & Asako Miyamoto & Nobuhiko Tanaka, 2013. "Knowledge-Based Estimation of Edible Fern Harvesting Sites in Mountainous Communities of Northeastern Japan," Sustainability, MDPI, vol. 6(1), pages 1-18, December.
    16. Pecchi, Matteo & Marchi, Maurizio & Burton, Vanessa & Giannetti, Francesca & Moriondo, Marco & Bernetti, Iacopo & Bindi, Marco & Chirici, Gherardo, 2019. "Species distribution modelling to support forest management. A literature review," Ecological Modelling, Elsevier, vol. 411(C).
    17. Mestre, Frederico & Pita, Ricardo & Paupério, Joana & Martins, Filipa M.S. & Alves, Paulo Célio & Mira, António & Beja, Pedro, 2015. "Combining distribution modelling and non-invasive genetics to improve range shift forecasting," Ecological Modelling, Elsevier, vol. 297(C), pages 171-179.
    18. Aziza Usmanova & Ahmed Aziz & Dilshodjon Rakhmonov & Walid Osamy, 2022. "Utilities of Artificial Intelligence in Poverty Prediction: A Review," Sustainability, MDPI, vol. 14(21), pages 1-39, October.
    19. Simon, Alois & Katzensteiner, Klaus & Wallentin, Gudrun, 2023. "The integration of hierarchical levels of scale in tree species distribution models of silver fir (Abies alba Mill.) and European beech (Fagus sylvatica L.) in mountain forests," Ecological Modelling, Elsevier, vol. 485(C).
    20. Amanda M. West & Sunil Kumar & Catherine S. Jarnevich, 2016. "Regional modeling of large wildfires under current and potential future climates in Colorado and Wyoming, USA," Climatic Change, Springer, vol. 134(4), pages 565-577, February.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:ecomod:v:233:y:2012:i:c:p:1-10. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.journals.elsevier.com/ecological-modelling .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.