IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0149089.html
   My bibliography  Save this article

Selecting Optimal Random Forest Predictive Models: A Case Study on Predicting the Spatial Distribution of Seabed Hardness

Author

Listed:
  • Jin Li
  • Maggie Tran
  • Justy Siwabessy

Abstract

Spatially continuous predictions of seabed hardness are important baseline environmental information for sustainable management of Australia’s marine jurisdiction. Seabed hardness is often inferred from multibeam backscatter data with unknown accuracy and can be inferred from underwater video footage at limited locations. In this study, we classified the seabed into four classes based on two new seabed hardness classification schemes (i.e., hard90 and hard70). We developed optimal predictive models to predict seabed hardness using random forest (RF) based on the point data of hardness classes and spatially continuous multibeam data. Five feature selection (FS) methods that are variable importance (VI), averaged variable importance (AVI), knowledge informed AVI (KIAVI), Boruta and regularized RF (RRF) were tested based on predictive accuracy. Effects of highly correlated, important and unimportant predictors on the accuracy of RF predictive models were examined. Finally, spatial predictions generated using the most accurate models were visually examined and analysed. This study confirmed that: 1) hard90 and hard70 are effective seabed hardness classification schemes; 2) seabed hardness of four classes can be predicted with a high degree of accuracy; 3) the typical approach used to pre-select predictive variables by excluding highly correlated variables needs to be re-examined; 4) the identification of the important and unimportant predictors provides useful guidelines for further improving predictive models; 5) FS methods select the most accurate predictive model(s) instead of the most parsimonious ones, and AVI and Boruta are recommended for future studies; and 6) RF is an effective modelling method with high predictive accuracy for multi-level categorical data and can be applied to ‘small p and large n’ problems in environmental sciences. Additionally, automated computational programs for AVI need to be developed to increase its computational efficiency and caution should be taken when applying filter FS methods in selecting predictive models.

Suggested Citation

  • Jin Li & Maggie Tran & Justy Siwabessy, 2016. "Selecting Optimal Random Forest Predictive Models: A Case Study on Predicting the Spatial Distribution of Seabed Hardness," PLOS ONE, Public Library of Science, vol. 11(2), pages 1-29, February.
  • Handle: RePEc:plo:pone00:0149089
    DOI: 10.1371/journal.pone.0149089
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0149089
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0149089&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0149089?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. David Stephens & Markus Diesing, 2014. "A Comparison of Supervised Classification Methods for the Prediction of Substrate Type Using Multibeam Acoustic and Legacy Grain-Size Data," PLOS ONE, Public Library of Science, vol. 9(4), pages 1-14, April.
    2. Kursa, Miron B. & Rudnicki, Witold R., 2010. "Feature Selection with the Boruta Package," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 36(i11).
    3. Hapfelmeier, A. & Ulm, K., 2013. "A new variable selection approach using Random Forests," Computational Statistics & Data Analysis, Elsevier, vol. 60(C), pages 50-69.
    4. Marmion, Mathieu & Luoto, Miska & Heikkinen, Risto K. & Thuiller, Wilfried, 2009. "The performance of state-of-the-art modelling techniques depends on geographical distribution of species," Ecological Modelling, Elsevier, vol. 220(24), pages 3512-3520.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Jamei, Mehdi & Maroufpoor, Saman & Aminpour, Younes & Karbasi, Masoud & Malik, Anurag & Karimi, Bakhtiar, 2022. "Developing hybrid data-intelligent method using Boruta-random forest optimizer for simulation of nitrate distribution pattern," Agricultural Water Management, Elsevier, vol. 270(C).
    2. Manuel S. González Canché, 2022. "Post-purchase Federal Financial Aid: How (in)Effective is the IRS’s Student Loan Interest Deduction (SLID) in Reaching Lower-Income Taxpayers and Students?," Research in Higher Education, Springer;Association for Institutional Research, vol. 63(6), pages 933-986, September.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Asma Shaheen & Javed Iqbal, 2018. "Spatial Distribution and Mobility Assessment of Carcinogenic Heavy Metals in Soil Profiles Using Geostatistics and Random Forest, Boruta Algorithm," Sustainability, MDPI, vol. 10(3), pages 1-20, March.
    2. Tong, Jianfeng & Liu, Zhenxing & Zhang, Yong & Zheng, Xiujuan & Jin, Junyang, 2023. "Improved multi-gate mixture-of-experts framework for multi-step prediction of gas load," Energy, Elsevier, vol. 282(C).
    3. Ramón Ferri-García & María del Mar Rueda, 2022. "Variable selection in Propensity Score Adjustment to mitigate selection bias in online surveys," Statistical Papers, Springer, vol. 63(6), pages 1829-1881, December.
    4. Liangyuan Hu & Lihua Li, 2022. "Using Tree-Based Machine Learning for Health Studies: Literature Review and Case Series," IJERPH, MDPI, vol. 19(23), pages 1-13, December.
    5. Yvan Devaux & Lu Zhang & Andrew I. Lumley & Kanita Karaduzovic-Hadziabdic & Vincent Mooser & Simon Rousseau & Muhammad Shoaib & Venkata Satagopam & Muhamed Adilovic & Prashant Kumar Srivastava & Costa, 2024. "Development of a long noncoding RNA-based machine learning model to predict COVID-19 in-hospital mortality," Nature Communications, Nature, vol. 15(1), pages 1-12, December.
    6. Ghosh, Indranil & Chaudhuri, Tamal Datta & Alfaro-Cortés, Esteban & Gámez, Matías & García, Noelia, 2022. "A hybrid approach to forecasting futures prices with simultaneous consideration of optimality in ensemble feature selection and advanced artificial intelligence," Technological Forecasting and Social Change, Elsevier, vol. 181(C).
    7. Weijun Wang & Dan Zhao & Liguo Fan & Yulong Jia, 2019. "Study on Icing Prediction of Power Transmission Lines Based on Ensemble Empirical Mode Decomposition and Feature Selection Optimized Extreme Learning Machine," Energies, MDPI, vol. 12(11), pages 1-21, June.
    8. Manuel J. García Rodríguez & Vicente Rodríguez Montequín & Francisco Ortega Fernández & Joaquín M. Villanueva Balsera, 2019. "Public Procurement Announcements in Spain: Regulations, Data Analysis, and Award Price Estimator Using Machine Learning," Complexity, Hindawi, vol. 2019, pages 1-20, November.
    9. Sangjin Kim & Jong-Min Kim, 2019. "Two-Stage Classification with SIS Using a New Filter Ranking Method in High Throughput Data," Mathematics, MDPI, vol. 7(6), pages 1-16, May.
    10. Arjan S. Gosal & Janine A. McMahon & Katharine M. Bowgen & Catherine H. Hoppe & Guy Ziv, 2021. "Identifying and Mapping Groups of Protected Area Visitors by Environmental Awareness," Land, MDPI, vol. 10(6), pages 1-14, May.
    11. Zhao-Yue Chen & Hervé Petetin & Raúl Fernando Méndez Turrubiates & Hicham Achebak & Carlos Pérez García-Pando & Joan Ballester, 2024. "Population exposure to multiple air pollutants and its compound episodes in Europe," Nature Communications, Nature, vol. 15(1), pages 1-11, December.
    12. Silke Janitza & Ender Celik & Anne-Laure Boulesteix, 2018. "A computationally fast variable importance test for random forests for high-dimensional data," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 12(4), pages 885-915, December.
    13. Schrader, Silja & Graham, Sonia & Campbell, Rebecca & Height, Kaitlyn & Hawkes, Gina, 2024. "Grower attitudes and practices toward area-wide management of cropping weeds in Australia," Land Use Policy, Elsevier, vol. 137(C).
    14. Bram Janssens & Matthias Bogaert & Mathijs Maton, 2023. "Predicting the next Pogačar: a data analytical approach to detect young professional cycling talents," Annals of Operations Research, Springer, vol. 325(1), pages 557-588, June.
    15. Cooray, Upul & Watt, Richard G. & Tsakos, Georgios & Heilmann, Anja & Hariyama, Masanori & Yamamoto, Takafumi & Kuruppuarachchige, Isuruni & Kondo, Katsunori & Osaka, Ken & Aida, Jun, 2021. "Importance of socioeconomic factors in predicting tooth loss among older adults in Japan: Evidence from a machine learning analysis," Social Science & Medicine, Elsevier, vol. 291(C).
    16. Simon Besnard & Nuno Carvalhais & M Altaf Arain & Andrew Black & Benjamin Brede & Nina Buchmann & Jiquan Chen & Jan G P W Clevers & Loïc P Dutrieux & Fabian Gans & Martin Herold & Martin Jung & Yoshik, 2019. "Memory effects of climate and vegetation affecting net ecosystem CO2 fluxes in global forests," PLOS ONE, Public Library of Science, vol. 14(2), pages 1-22, February.
    17. Francesco Sartor & Jonathan P. Moore & Hans-Peter Kubis, 2021. "Plasma Interleukin-10 and Cholesterol Levels May Inform about Interdependences between Fitness and Fatness in Healthy Individuals," IJERPH, MDPI, vol. 18(4), pages 1-19, February.
    18. Nawin Raj, 2022. "Prediction of Sea Level with Vertical Land Movement Correction Using Deep Learning," Mathematics, MDPI, vol. 10(23), pages 1-23, November.
    19. Sameer Al-Dahidi & Piero Baraldi & Miriam Fresc & Enrico Zio & Lorenzo Montelatici, 2024. "Feature Selection by Binary Differential Evolution for Predicting the Energy Production of a Wind Plant," Energies, MDPI, vol. 17(10), pages 1-19, May.
    20. Piotr Pomorski & Denise Gorse, 2023. "Improving Portfolio Performance Using a Novel Method for Predicting Financial Regimes," Papers 2310.04536, arXiv.org.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0149089. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.