IDEAS home Printed from https://ideas.repec.org/a/gam/jijerp/v17y2020i24p9322-d461317.html
   My bibliography  Save this article

Classification of Biodegradable Substances Using Balanced Random Trees and Boosted C5.0 Decision Trees

Author

Listed:
  • Alaa M. Elsayad

    (Department of Electrical Engineering, College of Engineering, Prince Sattam Bin Abdulaziz University, P.O. Box 54, Wadi Aldawaser 11991, Saudi Arabia
    Computers and Systems Department, Electronics Research Institute, Giza 12622, Egypt)

  • Ahmed M. Nassef

    (Department of Electrical Engineering, College of Engineering, Prince Sattam Bin Abdulaziz University, P.O. Box 54, Wadi Aldawaser 11991, Saudi Arabia
    Department of Computers and Automatic Control Engineering, Faculty of Engineering, Tanta University, Tanta 31733, Egypt)

  • Mujahed Al-Dhaifallah

    (Systems Engineering Department, King Fahd University of Petroleum & Minerals, Dhahran 31261, Saudi Arabia)

  • Khaled A. Elsayad

    (Pharmacy Department, Cairo University Hospitals, Cairo University, Cairo 11662, Egypt)

Abstract

Substances that do not degrade over time have proven to be harmful to the environment and are dangerous to living organisms. Being able to predict the biodegradability of substances without costly experiments is useful. Recently, the quantitative structure–activity relationship (QSAR) models have proposed effective solutions to this problem. However, the molecular descriptor datasets usually suffer from the problems of unbalanced class distribution, which adversely affects the efficiency and generalization of the derived models. Accordingly, this study aims at validating the performances of balanced random trees (RTs) and boosted C5.0 decision trees (DTs) to construct QSAR models to classify the ready biodegradation of substances and their abilities to deal with unbalanced data. The balanced RTs model algorithm builds individual trees using balanced bootstrap samples, while the boosted C5.0 DT is modeled using cost-sensitive learning. We employed the two-dimensional molecular descriptor dataset, which is publicly available through the University of California, Irvine (UCI) machine learning repository. The molecular descriptors were ranked according to their contributions to the balanced RTs classification process. The performance of the proposed models was compared with previously reported results. Based on the statistical measures, the experimental results showed that the proposed models outperform the classification results of the support vector machine (SVM), K-nearest neighbors (KNN), and discrimination analysis (DA). Classification measures were analyzed in terms of accuracy, sensitivity, specificity, precision, false positive rate, false negative rate, F1 score, receiver operating characteristic (ROC) curve, and area under the ROC curve (AUROC).

Suggested Citation

  • Alaa M. Elsayad & Ahmed M. Nassef & Mujahed Al-Dhaifallah & Khaled A. Elsayad, 2020. "Classification of Biodegradable Substances Using Balanced Random Trees and Boosted C5.0 Decision Trees," IJERPH, MDPI, vol. 17(24), pages 1-20, December.
  • Handle: RePEc:gam:jijerp:v:17:y:2020:i:24:p:9322-:d:461317
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/1660-4601/17/24/9322/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/1660-4601/17/24/9322/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Javad Hassannataj Joloudari & Edris Hassannataj Joloudari & Hamid Saadatfar & Mohammad Ghasemigol & Seyyed Mohammad Razavi & Amir Mosavi & Narjes Nabipour & Shahaboddin Shamshirband & Laszlo Nadai, 2020. "Coronary Artery Disease Diagnosis; Ranking the Significant Features Using a Random Trees Model," IJERPH, MDPI, vol. 17(3), pages 1-24, January.
    2. Muhammad Salman Saeed & Mohd Wazir Mustafa & Usman Ullah Sheikh & Touqeer Ahmed Jumani & Ilyas Khan & Samer Atawneh & Nawaf N. Hamadneh, 2020. "An Efficient Boosted C5.0 Decision-Tree-Based Classification Approach for Detecting Non-Technical Losses in Power Utilities," Energies, MDPI, vol. 13(12), pages 1-19, June.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Alaa M. Elsayad & Medien Zeghid & Hassan Yousif Ahmed & Khaled A. Elsayad, 2023. "Exploration of Biodegradable Substances Using Machine Learning Techniques," Sustainability, MDPI, vol. 15(17), pages 1-22, August.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Haewon Byeon, 2020. "Is the Random Forest Algorithm Suitable for Predicting Parkinson’s Disease with Mild Cognitive Impairment out of Parkinson’s Disease with Normal Cognition?," IJERPH, MDPI, vol. 17(7), pages 1-14, April.
    2. Sadegh Fathi & Hassan Sajadzadeh & Faezeh Mohammadi Sheshkal & Farshid Aram & Gergo Pinter & Imre Felde & Amir Mosavi, 2020. "The Role of Urban Morphology Design on Enhancing Physical Activity and Public Health," IJERPH, MDPI, vol. 17(7), pages 1-29, March.
    3. Mehmet Efe Biresselioglu & Muhittin Hakan Demir, 2022. "Constructing a Decision Tree for Energy Policy Domain Based on Real-Life Data," Energies, MDPI, vol. 15(7), pages 1-15, March.
    4. Muhammad Salman Saeed & Mohd Wazir Mustafa & Nawaf N. Hamadneh & Nawa A. Alshammari & Usman Ullah Sheikh & Touqeer Ahmed Jumani & Saifulnizam Bin Abd Khalid & Ilyas Khan, 2020. "Detection of Non-Technical Losses in Power Utilities—A Comprehensive Systematic Review," Energies, MDPI, vol. 13(18), pages 1-25, September.
    5. Vanessa Gindri Vieira & Daniel Pinheiro Bernardon & Vinícius André Uberti & Rodrigo Marques de Figueiredo & Lucas Melo de Chiara & Juliano Andrade Silva, 2023. "Detection of Non-Technical Losses in Irrigant Consumers through Artificial Intelligence: A Pilot Study," Energies, MDPI, vol. 16(19), pages 1-17, September.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jijerp:v:17:y:2020:i:24:p:9322-:d:461317. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.