IDEAS home Printed from https://ideas.repec.org/a/gam/jsusta/v16y2024i23p10634-d1536642.html
   My bibliography  Save this article

Evaluation of Tree-Based Voting Algorithms in Water Quality Classification Prediction

Author

Listed:
  • Lili Li

    (School of Economics, Qingdao University, Qingdao 266071, China)

  • Jianhui Wei

    (School of Economics, Qingdao University, Qingdao 266071, China)

Abstract

Accurately predicting the state of surface water quality is crucial for ensuring the sustainable use of water resources and environmental protection. This often requires a focus on the range of factors affecting water quality, such as physical and chemical parameters. Tree models, with their flexible tree-like structure and strong capability for partitioning and selecting influential water quality features, offer clear decision-making rules, making them suitable for this task. However, an individual decision tree model has limitations and cannot fully capture the complex relationships between all influencing parameters and water quality. Therefore, this study proposes a method combining ensemble tree models with voting algorithms to predict water quality classification. This study was conducted using five surface water monitoring sites in Qingdao, representing a portion of many municipal water environment monitoring stations in China, employing a single-factor determination method with stringent surface water standards. The soft voting algorithm achieved the highest accuracy of 99.91%, and the model addressed the imbalance in original water quality categories, reaching a Matthews Correlation Coefficient (MCC) of 99.88%. In contrast, conventional machine learning algorithms, such as logistic regression and K-nearest neighbors, achieved lower accuracies of 75.90% and 91.33%, respectively. Additionally, the model’s supervision of misclassified data demonstrated its good learning of water quality determination rules. The trained model was also transferred directly to predict water quality at 13 monitoring stations in Beijing, where it performed robustly, achieving an ensemble hard voting accuracy of 97.73% and an MCC of 96.81%. In many countries’ water environment systems, different water qualities correspond to different uses, and the magnitude of influencing parameters is directly related to water quality categories; critical parameters can even directly determine the quality category. Tree models are highly capable of handling nonlinear relationships and selecting important water quality features, allowing them to identify and exploit interactions between water quality parameters, which is especially important when multiple parameters together determine the water quality category. Therefore, there is significant motivation to develop tree model-based water quality prediction models.

Suggested Citation

  • Lili Li & Jianhui Wei, 2024. "Evaluation of Tree-Based Voting Algorithms in Water Quality Classification Prediction," Sustainability, MDPI, vol. 16(23), pages 1-25, December.
  • Handle: RePEc:gam:jsusta:v:16:y:2024:i:23:p:10634-:d:1536642
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2071-1050/16/23/10634/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2071-1050/16/23/10634/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Shweta Agrawal & Sanjiv Kumar Jain & Ajay Khatri & Mohit Agarwal & Anshul Tripathi & Yu-Chen Hu & Junwei Ma, 2022. "Novel PSO Optimized Voting Classifier Approach for Predicting Water Quality," Mathematical Problems in Engineering, Hindawi, vol. 2022, pages 1-14, July.
    2. Yang Luo & Jin-Wen Liu & Jian-Wei Wu & Zheng Yuan & Ji-Wei Zhang & Chao Gao & Zhi-Yu Lin, 2022. "Comprehensive Assessment of Eutrophication in Xiamen Bay and Its Implications for Management Strategy in Southeast China," IJERPH, MDPI, vol. 19(20), pages 1-15, October.
    3. Yumin Wang & Weijian Ran, 2019. "Comprehensive Eutrophication Assessment Based on Fuzzy Matter Element Model and Monte Carlo-Triangular Fuzzy Numbers Approach," IJERPH, MDPI, vol. 16(10), pages 1-17, May.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Yumin Wang & Xian’e Zhang & Yifeng Wu, 2020. "Eutrophication Assessment Based on the Cloud Matter Element Model," IJERPH, MDPI, vol. 17(1), pages 1-19, January.
    2. Yumin Wang & Weijian Ran & Lei Wu & Yifeng Wu, 2019. "Assessment of River Water Quality Based on an Improved Fuzzy Matter-Element Model," IJERPH, MDPI, vol. 16(15), pages 1-11, August.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jsusta:v:16:y:2024:i:23:p:10634-:d:1536642. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.