IDEAS home Printed from https://ideas.repec.org/a/hin/jnlmpe/5749746.html
   My bibliography  Save this article

The Prediction of Diatom Abundance by Comparison of Various Machine Learning Methods

Author

Listed:
  • Yuna Shin
  • Heesuk Lee
  • Young-Joo Lee
  • Dae Keun Seo
  • Bomi Jeong
  • Seoksu Hong
  • Jaehoon Kim
  • Taekgeun Kim
  • Jae-Kyeong Lee
  • Tae-Young Heo

Abstract

This study adopts two approaches to analyze the occurrence of algae at Haman Weir for Nakdong River; one is the traditional statistical method, such as logistic regression, while the other is machine learning technique, such as kNN, ANN, RF, Bagging, Boosting, and SVM. In order to compare the performance of the models, this study measured the accuracy, specificity, sensitivity, and AUC, which are representative model evaluation tools. The ROC curve is created by plotting association of sensitivity and (1-specificity). The AUC that is area of ROC curve represents sensitivity and specificity. This measure has two competitive advantages compared to other evaluation tools. One is that it is scale-invariant. It means that purpose of AUC is how well the model predicts. The other is that the AUC is classification-threshold-invariant. It shows that the AUC is independent of threshold because it is plotted association of sensitivity and (1-specificity) obtained by threshold. We chose AUC as a final model evaluation tool with two advantages. Also, variable selection was conducted using the Boruta algorithm. In addition, we tried to distinguish the better model by comparing the model with the variable selection method and the model without the variable selection method. As a result of the analysis, Boruta algorithm as a variable selection method suggested PO 4 -P, DO, BOD, NH 3 -N, Susp, pH, TOC, Temp, TN, and TP as significant explanatory variables. A comparison was made between the model with and without these selected variables. Among the models without variable selection method, the accuracy of RF analysis was highest, and ANN analysis showed the highest AUC. In conclusion, ANN analysis using the variable selection method showed the best performance among the models with and without variable selection method.

Suggested Citation

  • Yuna Shin & Heesuk Lee & Young-Joo Lee & Dae Keun Seo & Bomi Jeong & Seoksu Hong & Jaehoon Kim & Taekgeun Kim & Jae-Kyeong Lee & Tae-Young Heo, 2019. "The Prediction of Diatom Abundance by Comparison of Various Machine Learning Methods," Mathematical Problems in Engineering, Hindawi, vol. 2019, pages 1-13, May.
  • Handle: RePEc:hin:jnlmpe:5749746
    DOI: 10.1155/2019/5749746
    as

    Download full text from publisher

    File URL: http://downloads.hindawi.com/journals/MPE/2019/5749746.pdf
    Download Restriction: no

    File URL: http://downloads.hindawi.com/journals/MPE/2019/5749746.xml
    Download Restriction: no

    File URL: https://libkey.io/10.1155/2019/5749746?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:hin:jnlmpe:5749746. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Mohamed Abdelhakeem (email available below). General contact details of provider: https://www.hindawi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.