IDEAS home Printed from https://ideas.repec.org/a/gam/jsusta/v14y2022i2p798-d722504.html
   My bibliography  Save this article

Prediction of Daily Mean PM 10 Concentrations Using Random Forest, CART Ensemble and Bagging Stacked by MARS

Author

Listed:
  • Snezhana Gocheva-Ilieva

    (Department of Mathematical Analysis, Faculty of Mathematics and Informatics, Paisii Hilendarski University of Plovdiv, 4000 Plovdiv, Bulgaria)

  • Atanas Ivanov

    (Department of Mathematical Analysis, Faculty of Mathematics and Informatics, Paisii Hilendarski University of Plovdiv, 4000 Plovdiv, Bulgaria)

  • Maya Stoimenova-Minova

    (Department of Mathematical Analysis, Faculty of Mathematics and Informatics, Paisii Hilendarski University of Plovdiv, 4000 Plovdiv, Bulgaria)

Abstract

A novel framework for stacked regression based on machine learning was developed to predict the daily average concentrations of particulate matter (PM 10 ), one of Bulgaria’s primary health concerns. The measurements of nine meteorological parameters were introduced as independent variables. The goal was to carefully study a limited number of initial predictors and extract stochastic information from them to build an extended set of data that allowed the creation of highly efficient predictive models. Four base models using random forest, CART ensemble and bagging, and their rotation variants, were built and evaluated. The heterogeneity of these base models was achieved by introducing five types of diversities, including a new simplified selective ensemble algorithm. The predictions from the four base models were then used as predictors in multivariate adaptive regression splines (MARS) models. All models were statistically tested using out-of-bag or with 5-fold and 10-fold cross-validation. In addition, a variable importance analysis was conducted. The proposed framework was used for short-term forecasting of out-of-sample data for seven days. It was shown that the stacked models outperformed all single base models. An index of agreement IA = 0.986 and a coefficient of determination of about 95% were achieved.

Suggested Citation

  • Snezhana Gocheva-Ilieva & Atanas Ivanov & Maya Stoimenova-Minova, 2022. "Prediction of Daily Mean PM 10 Concentrations Using Random Forest, CART Ensemble and Bagging Stacked by MARS," Sustainability, MDPI, vol. 14(2), pages 1-26, January.
  • Handle: RePEc:gam:jsusta:v:14:y:2022:i:2:p:798-:d:722504
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2071-1050/14/2/798/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2071-1050/14/2/798/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Flores, Benito E., 1989. "The utilization of the Wilcoxon test to compare forecasting methods: A note," International Journal of Forecasting, Elsevier, vol. 5(4), pages 529-535.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Yadong Pei & Chiou-Jye Huang & Yamin Shen & Yuxuan Ma, 2022. "An Ensemble Model with Adaptive Variational Mode Decomposition and Multivariate Temporal Graph Neural Network for PM2.5 Concentration Forecasting," Sustainability, MDPI, vol. 14(20), pages 1-22, October.
    2. Syamsiyatul Muzayyanah & Cheng-Yih Hong & Rishan Adha & Su-Fen Yang, 2023. "The Non-Linear Relationship between Air Pollution, Labor Insurance and Productivity: Multivariate Adaptive Regression Splines Approach," Sustainability, MDPI, vol. 15(12), pages 1-20, June.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Md. Iftekharul Alam Efat & Petr Hajek & Mohammad Zoynul Abedin & Rahat Uddin Azad & Md. Al Jaber & Shuvra Aditya & Mohammad Kabir Hassan, 2024. "Deep-learning model using hybrid adaptive trend estimated series for modelling and forecasting sales," Annals of Operations Research, Springer, vol. 339(1), pages 297-328, August.
    2. Snezhana Gocheva-Ilieva & Antoaneta Yordanova & Hristina Kulina, 2022. "Predicting the 305-Day Milk Yield of Holstein-Friesian Cows Depending on the Conformation Traits and Farm Using Simplified Selective Ensembles," Mathematics, MDPI, vol. 10(8), pages 1-20, April.
    3. Franses, Philip Hans & Kleibergen, Frank, 1996. "Unit roots in the Nelson-Plosser data: Do they matter for forecasting?," International Journal of Forecasting, Elsevier, vol. 12(2), pages 283-288, June.
    4. Thomas Wenzel, 2001. "Hits-and-misses for the evaluation and combination of forecasts," Journal of Applied Statistics, Taylor & Francis Journals, vol. 28(6), pages 759-773.
    5. Benevento, Elisabetta & Aloini, Davide & Squicciarini, Nunzia, 2023. "Towards a real-time prediction of waiting times in emergency departments: A comparative analysis of machine learning techniques," International Journal of Forecasting, Elsevier, vol. 39(1), pages 192-208.
    6. De Gooijer, Jan G. & Hyndman, Rob J., 2006. "25 years of time series forecasting," International Journal of Forecasting, Elsevier, vol. 22(3), pages 443-473.
    7. Jan G. De Gooijer & Rob J. Hyndman, 2005. "25 Years of IIF Time Series Forecasting: A Selective Review," Monash Econometrics and Business Statistics Working Papers 12/05, Monash University, Department of Econometrics and Business Statistics.
    8. Wenzel, Thomas, 2000. "Hits-and-misses for the evaluation and combination of forecasts," Technical Reports 2000,26, Technische Universität Dortmund, Sonderforschungsbereich 475: Komplexitätsreduktion in multivariaten Datenstrukturen.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jsusta:v:14:y:2022:i:2:p:798-:d:722504. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.