IDEAS home Printed from https://ideas.repec.org/a/gam/jsusta/v13y2021i19p10856-d646836.html
   My bibliography  Save this article

Applying Text Mining, Clustering Analysis, and Latent Dirichlet Allocation Techniques for Topic Classification of Environmental Education Journals

Author

Listed:
  • I-Cheng Chang

    (Department of Environmental Engineering, National Ilan University, Yilan 260, Taiwan)

  • Tai-Kuei Yu

    (Department of Business Administration, National Quemoy University, Kinmen 892, Taiwan)

  • Yu-Jie Chang

    (Department of Earth and Life Science, University of Taipei, Taipei 100, Taiwan)

  • Tai-Yi Yu

    (Department of Risk Management and Insurance, Ming Chuan University, Taipei 111, Taiwan)

Abstract

Facing the big data wave, this study applied artificial intelligence to cite knowledge and find a feasible process to play a crucial role in supplying innovative value in environmental education. Intelligence agents of artificial intelligence and natural language processing (NLP) are two key areas leading the trend in artificial intelligence; this research adopted NLP to analyze the research topics of environmental education research journals in the Web of Science (WoS) database during 2011–2020 and interpret the categories and characteristics of abstracts for environmental education papers. The corpus data were selected from abstracts and keywords of research journal papers, which were analyzed with text mining, cluster analysis, latent Dirichlet allocation (LDA), and co-word analysis methods. The decisions regarding the classification of feature words were determined and reviewed by domain experts, and the associated TF-IDF weights were calculated for the following cluster analysis, which involved a combination of hierarchical clustering and K-means analysis. The hierarchical clustering and LDA decided the number of required categories as seven, and the K-means cluster analysis classified the overall documents into seven categories. This study utilized co-word analysis to check the suitability of the K-means classification, analyzed the terms with high TF-IDF wights for distinct K-means groups, and examined the terms for different topics with the LDA technique. A comparison of the results demonstrated that most categories that were recognized with K-means and LDA methods were the same and shared similar words; however, two categories had slight differences. The involvement of field experts assisted with the consistency and correctness of the classified topics and documents.

Suggested Citation

  • I-Cheng Chang & Tai-Kuei Yu & Yu-Jie Chang & Tai-Yi Yu, 2021. "Applying Text Mining, Clustering Analysis, and Latent Dirichlet Allocation Techniques for Topic Classification of Environmental Education Journals," Sustainability, MDPI, vol. 13(19), pages 1-20, September.
  • Handle: RePEc:gam:jsusta:v:13:y:2021:i:19:p:10856-:d:646836
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2071-1050/13/19/10856/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2071-1050/13/19/10856/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Sunghae Jun & Sangsung Park & Dongsik Jang, 2015. "A Technology Valuation Model Using Quantitative Patent Analysis: A Case Study of Technology Transfer in Big Data Marketing," Emerging Markets Finance and Trade, Taylor & Francis Journals, vol. 51(5), pages 963-974, September.
    2. Gabjo Kim & Joonhyuck Lee & Dongsik Jang & Sangsung Park, 2016. "Technology Clusters Exploration for Patent Portfolio through Patent Abstract Analysis," Sustainability, MDPI, vol. 8(12), pages 1-13, December.
    3. Peter van den Besselaar & Gaston Heimeriks, 2006. "Mapping research topics using word-reference co-occurrences: A method and an exploratory case study," Scientometrics, Springer;Akadémiai Kiadó, vol. 68(3), pages 377-393, September.
    4. Yen‐Liang Chen & Yi‐Hung Liu & Wu‐Liang Ho, 2013. "A text mining approach to assist the general public in the retrieval of legal documents," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 64(2), pages 280-290, February.
    5. Diego Corrales-Garay & Eva-María Mora-Valentín & Marta Ortiz-de-Urbina-Criado, 2020. "Entrepreneurship Through Open Data: An Opportunity for Sustainable Development," Sustainability, MDPI, vol. 12(12), pages 1-25, June.
    6. A. Christy & G. Meera Gandhi & S. Vaithyasubramanian, 2019. "Clustering of text documents with keyword weighting function," International Journal of Intelligent Enterprise, Inderscience Enterprises Ltd, vol. 6(1), pages 19-31.
    7. Seungsu Paek & Namhyoung Kim, 2021. "Analysis of Worldwide Research Trends on the Impact of Artificial Intelligence in Education," Sustainability, MDPI, vol. 13(14), pages 1-20, July.
    8. Huiyun Zhu & Kecheng Liu, 2021. "Temporal, Spatial, and Socioeconomic Dynamics in Social Media Thematic Emphases during Typhoon Mangkhut," Sustainability, MDPI, vol. 13(13), pages 1-17, July.
    9. Xin Ying An & Qing Qiang Wu, 2011. "Co-word analysis of the trends in stem cells field based on subject heading weighting," Scientometrics, Springer;Akadémiai Kiadó, vol. 88(1), pages 133-144, July.
    10. Hansu Hwang & SeJin An & Eunchang Lee & Suhyeon Han & Cheon-hwan Lee, 2021. "Cross-Societal Analysis of Climate Change Awareness and Its Relation to SDG 13: A Knowledge Synthesis from Text Mining," Sustainability, MDPI, vol. 13(10), pages 1-21, May.
    11. Ruomu Miao & Yuxia Wang & Shuang Li, 2021. "Analyzing Urban Spatial Patterns and Functional Zones Using Sina Weibo POI Data: A Case Study of Beijing," Sustainability, MDPI, vol. 13(2), pages 1-15, January.
    12. Yen-Liang Chen & Yi-Hung Liu & Wu-Liang Ho, 2013. "A text mining approach to assist the general public in the retrieval of legal documents," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 64(2), pages 280-290, February.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Mini Zhu & Gang Wang & Chaoping Li & Hongjun Wang & Bin Zhang, 2023. "Artificial Intelligence Classification Model for Modern Chinese Poetry in Education," Sustainability, MDPI, vol. 15(6), pages 1-19, March.
    2. Vrdoljak Ivana, 2023. "Lifelong Education in Economics, Business and Management Research: Literature Review," Business Systems Research, Sciendo, vol. 14(1), pages 153-172, September.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Sung Kim & Derek Hansen & Richard Helps, 2018. "Computing research in the academy: insights from theses and dissertations," Scientometrics, Springer;Akadémiai Kiadó, vol. 114(1), pages 135-158, January.
    2. So-Hui Park & Dong-Gu Lee & Jin-Sung Park & Jun-Woo Kim, 2021. "A Survey of Research on Data Analytics-Based Legal Tech," Sustainability, MDPI, vol. 13(14), pages 1-24, July.
    3. Yi-Ming Wei & Jin-Wei Wang & Tianqi Chen & Bi-Ying Yu & Hua Liao, 2018. "Frontiers of Low-Carbon Technologies: Results from Bibliographic Coupling with Sliding Window," CEEP-BIT Working Papers 116, Center for Energy and Environmental Policy Research (CEEP), Beijing Institute of Technology.
    4. Shu Yan & Lizi Pan & Yan Lu & Juan Chen & Ting Zhang & Dongzi Xu & Zhaolian Ouyang, 2023. "Towards Sustainable Drug Supply in China: A Bibliometric Analysis of Drug Reform Policies," Sustainability, MDPI, vol. 15(13), pages 1-20, June.
    5. Jiaxin Zhang & Zhilin Yu & Yunqin Li & Xueqiang Wang, 2023. "Uncovering Bias in Objective Mapping and Subjective Perception of Urban Building Functionality: A Machine Learning Approach to Urban Spatial Perception," Land, MDPI, vol. 12(7), pages 1-20, June.
    6. Yuewen Yang & Dongyan Wang & Zhuoran Yan & Shuwen Zhang, 2021. "Delineating Urban Functional Zones Using U-Net Deep Learning: Case Study of Kuancheng District, Changchun, China," Land, MDPI, vol. 10(11), pages 1-21, November.
    7. Raymundo das Neves Machado & Benjamín Vargas-Quesada & Jacqueline Leta, 2016. "Intellectual structure in stem cell research: exploring Brazilian scientific articles from 2001 to 2010," Scientometrics, Springer;Akadémiai Kiadó, vol. 106(2), pages 525-537, February.
    8. Yang, Siluo & Han, Ruizhen & Wolfram, Dietmar & Zhao, Yuehua, 2016. "Visualizing the intellectual structure of information science (2006–2015): Introducing author keyword coupling analysis," Journal of Informetrics, Elsevier, vol. 10(1), pages 132-150.
    9. Oliveira, Renata Lúcia Magalhães de & Dablanc, Laetitia & Schorung, Matthieu, 2022. "Changes in warehouse spatial patterns and rental prices: Are they related? Exploring the case of US metropolitan areas," Journal of Transport Geography, Elsevier, vol. 104(C).
    10. Tinôco, Daniel & Genier, Hugo Leonardo André & da Silveira, Wendel Batista, 2021. "Technology valuation of cellulosic ethanol production by Kluyveromyces marxianus CCT 7735 from sweet sorghum bagasse at elevated temperatures," Renewable Energy, Elsevier, vol. 173(C), pages 188-196.
    11. Jan M. Gerken & Martin G. Moehrle, 2012. "A new instrument for technology monitoring: novelty in patents measured by semantic patent analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 91(3), pages 645-670, June.
    12. Shi Shen & Ke Shi & Junwang Huang & Changxiu Cheng & Min Zhao, 2023. "Global online social response to a natural disaster and its influencing factors: a case study of Typhoon Haiyan," Palgrave Communications, Palgrave Macmillan, vol. 10(1), pages 1-15, December.
    13. Rosa Maria Arnaldo Valdés & Serhat Burmaoglu & Vincenzo Tucci & Luiz Manuel Braga da Costa Campos & Lucia Mattera & Víctor Fernando Gomez Comendador, 2019. "Flight Path 2050 and ACARE Goals for Maintaining and Extending Industrial Leadership in Aviation: A Map of the Aviation Technology Space," Sustainability, MDPI, vol. 11(7), pages 1-24, April.
    14. Michel Zitt, 2015. "Meso-level retrieval: IR-bibliometrics interplay and hybrid citation-words methods in scientific fields delineation," Scientometrics, Springer;Akadémiai Kiadó, vol. 102(3), pages 2223-2245, March.
    15. Gabriel Marcuzzo Canto Cavalheiro & Mariana Brandao Cavalheiro, 2024. "Cluster Analysis of the Internationalization of Unicorns from Latin America Based on Trademark Registrations," Journal of the Knowledge Economy, Springer;Portland International Center for Management of Engineering and Technology (PICMET), vol. 15(1), pages 1650-1665, March.
    16. Zhichao Ba & Yujie Cao & Jin Mao & Gang Li, 2019. "A hierarchical approach to analyzing knowledge integration between two fields—a case study on medical informatics and computer science," Scientometrics, Springer;Akadémiai Kiadó, vol. 119(3), pages 1455-1486, June.
    17. De Paulo, A.F. & Porto, G.S., 2023. "Unveiling the cooperation dynamics in the photovoltaic technologies’ development," Renewable and Sustainable Energy Reviews, Elsevier, vol. 187(C).
    18. Milojević, Staša & Sugimoto, Cassidy R. & Larivière, Vincent & Thelwall, Mike & Ding, Ying, 2014. "The role of handbooks in knowledge creation and diffusion: A case of science and technology studies," Journal of Informetrics, Elsevier, vol. 8(3), pages 693-709.
    19. Xiaofeng Jia & Tao Dai & Xinbiao Guo, 2014. "Comprehensive exploration of urban health by bibliometric analysis: 35 years and 11,299 articles," Scientometrics, Springer;Akadémiai Kiadó, vol. 99(3), pages 881-894, June.
    20. Luciano Barcellos-Paula & Anna María Gil-Lafuente & Aline Castro-Rezende, 2023. "Algorithm Applied to SDG13: A Case Study of Ibero-American Countries," Mathematics, MDPI, vol. 11(2), pages 1-20, January.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jsusta:v:13:y:2021:i:19:p:10856-:d:646836. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.