IDEAS home Printed from https://ideas.repec.org/a/gam/jijerp/v18y2021i13p7087-d587380.html
   My bibliography  Save this article

Design of a Spark Big Data Framework for PM 2.5 Air Pollution Forecasting

Author

Listed:
  • Dong-Her Shih

    (Department of Information Management, National Yunlin University of Science & Technology, Douliu 64002, Taiwan)

  • Thi Hien To

    (Faculty of Environment, University of Science, 227 Nguyen Van Cu Street, District 5, Ho Chi Minh City 700000, Vietnam
    Vietnam National University, Linh Trung Ward, Thu Duc District, Ho Chi Minh City 700000, Vietnam)

  • Ly Sy Phu Nguyen

    (Faculty of Environment, University of Science, 227 Nguyen Van Cu Street, District 5, Ho Chi Minh City 700000, Vietnam
    Vietnam National University, Linh Trung Ward, Thu Duc District, Ho Chi Minh City 700000, Vietnam)

  • Ting-Wei Wu

    (Department of Information Management, National Yunlin University of Science & Technology, Douliu 64002, Taiwan)

  • Wen-Ting You

    (Department of Information Management, National Yunlin University of Science & Technology, Douliu 64002, Taiwan)

Abstract

In recent years, with rapid economic development, air pollution has become extremely serious, causing many negative effects on health, environment and medical costs. PM 2.5 is one of the main components of air pollution. Therefore, it is necessary to know the PM 2.5 air quality in advance for health. Many studies on air quality are based on the government’s official air quality monitoring stations, which cannot be widely deployed due to high cost constraints. Furthermore, the update frequency of government monitoring stations is once an hour, and it is hard to capture short-term PM 2.5 concentration peaks with little warning. Nevertheless, dealing with short-term data with many stations, the volume of data is huge and is calculated, analyzed and predicted in a complex way. This alleviates the high computational requirements of the original predictor, thus making Spark suitable for the considered problem. This study proposes a PM 2.5 instant prediction architecture based on the Spark big data framework to handle the huge data from the LASS community. The Spark big data framework proposed in this study is divided into three modules. It collects real time PM 2.5 data and performs ensemble learning through three machine learning algorithms (Linear Regression, Random Forest, Gradient Boosting Decision Tree) to predict the PM 2.5 concentration value in the next 30 to 180 min with accompanying visualization graph. The experimental results show that our proposed Spark big data ensemble prediction model in next 30-min prediction has the best performance (R 2 up to 0.96), and the ensemble model has better performance than any single machine learning model. Taiwan has been suffering from a situation of relatively poor air pollution quality for a long time. Air pollutant monitoring data from LASS community can provide a wide broader monitoring, however the data is large and difficult to integrate or analyze. The proposed Spark big data framework system can provide short-term PM 2.5 forecasts and help the decision-maker to take proper action immediately.

Suggested Citation

  • Dong-Her Shih & Thi Hien To & Ly Sy Phu Nguyen & Ting-Wei Wu & Wen-Ting You, 2021. "Design of a Spark Big Data Framework for PM 2.5 Air Pollution Forecasting," IJERPH, MDPI, vol. 18(13), pages 1-22, July.
  • Handle: RePEc:gam:jijerp:v:18:y:2021:i:13:p:7087-:d:587380
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/1660-4601/18/13/7087/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/1660-4601/18/13/7087/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Zhang, Chen & Ni, Zhiwei & Ni, Liping, 2015. "Multifractal detrended cross-correlation analysis between PM2.5 and meteorological factors," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 438(C), pages 114-123.
    2. Du, Limin & Wei, Chu & Cai, Shenghua, 2012. "Economic development and carbon dioxide emissions in China: Provincial panel data analysis," China Economic Review, Elsevier, vol. 23(2), pages 371-384.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ni, Jinlan & Wei, Chu & Du, Limin, 2015. "Revealing the political decision toward Chinese carbon abatement: Based on equity and efficiency criteria," Energy Economics, Elsevier, vol. 51(C), pages 609-621.
    2. Ling Xiong & Shaozhou Qi, 2018. "Financial Development And Carbon Emissions In Chinese Provinces: A Spatial Panel Data Analysis," The Singapore Economic Review (SER), World Scientific Publishing Co. Pte. Ltd., vol. 63(02), pages 447-464, March.
    3. Xu, Bin & Lin, Boqiang, 2018. "Do we really understand the development of China's new energy industry?," Energy Economics, Elsevier, vol. 74(C), pages 733-745.
    4. Jiansheng You & Guohan Ding & Liyuan Zhang, 2022. "Heterogeneous Dynamic Correlation Research among Industrial Structure Distortion, Two-Way FDI and Carbon Emission Intensity in China," Sustainability, MDPI, vol. 14(15), pages 1-23, July.
    5. Du, Limin & Hanley, Aoife & Wei, Chu, 2015. "Estimating the Marginal Abatement Cost Curve of CO2 Emissions in China: Provincial Panel Data Analysis," Energy Economics, Elsevier, vol. 48(C), pages 217-229.
    6. Manimaran, P. & Narayana, A.C., 2018. "Multifractal detrended cross-correlation analysis on air pollutants of University of Hyderabad Campus, India," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 502(C), pages 228-235.
    7. Li, Jianglong & Lin, Boqiang, 2017. "Does energy and CO2 emissions performance of China benefit from regional integration?," Energy Policy, Elsevier, vol. 101(C), pages 366-378.
    8. Wu, Yinyin & Wang, Ping & Liu, Xin & Chen, Jiandong & Song, Malin, 2020. "Analysis of regional carbon allocation and carbon trading based on net primary productivity in China," China Economic Review, Elsevier, vol. 60(C).
    9. Moataz Elshimy & Khadiga M. El-Aasar, 2020. "Carbon footprint, renewable energy, non-renewable energy, and livestock: testing the environmental Kuznets curve hypothesis for the Arab world," Environment, Development and Sustainability: A Multidisciplinary Approach to the Theory and Practice of Sustainable Development, Springer, vol. 22(7), pages 6985-7012, October.
    10. QIN, Bo & WU, Jianfeng, 2015. "Does urban concentration mitigate CO2 emissions? Evidence from China 1998–2008," China Economic Review, Elsevier, vol. 35(C), pages 220-231.
    11. Llorca, Manuel & Rodriguez-Alvarez, Ana, 2024. "Economic, environmental, and energy equity convergence: Evidence of a multi-speed Europe?," Ecological Economics, Elsevier, vol. 219(C).
    12. Li, Xin & Li, Zheng & Su, Chi-Wei & Umar, Muhammad & Shao, Xuefeng, 2022. "Exploring the asymmetric impact of economic policy uncertainty on China's carbon emissions trading market price: Do different types of uncertainty matter?," Technological Forecasting and Social Change, Elsevier, vol. 178(C).
    13. Le Hoang Phong, 2019. "Globalization, Financial Development, and Environmental Degradation in the Presence of Environmental Kuznets Curve: Evidence from ASEAN-5 Countries," International Journal of Energy Economics and Policy, Econjournals, vol. 9(2), pages 40-50.
    14. Chulin Pan & Huayi Wang & Hongpeng Guo & Hong Pan, 2021. "How Do the Population Structure Changes of China Affect Carbon Emissions? An Empirical Study Based on Ridge Regression Analysis," Sustainability, MDPI, vol. 13(6), pages 1-16, March.
    15. Wang, Yuan & Zhang, Xiang & Kubota, Jumpei & Zhu, Xiaodong & Lu, Genfa, 2015. "A semi-parametric panel data analysis on the urbanization-carbon emissions nexus for OECD countries," Renewable and Sustainable Energy Reviews, Elsevier, vol. 48(C), pages 704-709.
    16. Omri, Anis, 2018. "Entrepreneurship, sectoral outputs and environmental improvement: International evidence," Technological Forecasting and Social Change, Elsevier, vol. 128(C), pages 46-55.
    17. João Tovar Jalles, 2019. "Polluting Emissions and GDP: Decoupling Evidence from Brazilian States," Working Papers REM 2019/0104, ISEG - Lisbon School of Economics and Management, REM, Universidade de Lisboa.
    18. Le Hoang Phong & Dang Thi Bach Van & Ho Hoang Gia Bao, 2018. "The Role of Globalization on CO2 Emission in Vietnam Incorporating Industrialization, Urbanization, GDP per Capita and Energy Use," International Journal of Energy Economics and Policy, Econjournals, vol. 8(6), pages 275-283.
    19. Chao-Qun Ma & Jiang-Long Liu & Yi-Shuai Ren & Yong Jiang, 2019. "The Impact of Economic Growth, FDI and Energy Intensity on China’s Manufacturing Industry’s CO 2 Emissions: An Empirical Study Based on the Fixed-Effect Panel Quantile Regression Model," Energies, MDPI, vol. 12(24), pages 1-16, December.
    20. Yufeng Wang & Shijun Zhang & Luyao Zhang, 2023. "The Impact of Location-Based Tax Incentives and Carbon Emission Intensity: Evidence from China’s Western Development Strategy," IJERPH, MDPI, vol. 20(3), pages 1-23, February.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jijerp:v:18:y:2021:i:13:p:7087-:d:587380. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.