IDEAS home Printed from https://ideas.repec.org/a/gam/jsusta/v15y2023i4p3402-d1066965.html
   My bibliography  Save this article

An Improved Corpus-Based NLP Method for Facilitating Keyword Extraction: An Example of the COVID-19 Vaccine Hesitancy Corpus

Author

Listed:
  • Liang-Ching Chen

    (Department of Foreign Languages, R.O.C. Military Academy, Kaohsiung 830, Taiwan
    Institute of Education, National Sun Yat-sen University, Kaohsiung 804, Taiwan)

Abstract

In the current COVID-19 post-pandemic era, COVID-19 vaccine hesitancy is hindering the herd immunity generated by widespread vaccination. It is critical to identify the factors that may cause COVID-19 vaccine hesitancy, enabling the relevant authorities to propose appropriate interventions for mitigating such a phenomenon. Keyword extraction, a sub-field of natural language processing (NLP) applications, plays a vital role in modern medical informatics. When traditional corpus-based NLP methods are used to conduct keyword extraction, they only consider a word’s log-likelihood value to determine whether it is a keyword, which leaves room for concerns about the efficiency and accuracy of this keyword extraction technique. These concerns include the fact that the method is unable to (1) optimize the keyword list by the machine-based approach, (2) effectively evaluate the keyword’s importance level, and (3) integrate the variables to conduct data clustering. Thus, to address the aforementioned issues, this study integrated a machine-based word removal technique, the i10-index, and the importance–performance analysis (IPA) technique to develop an improved corpus-based NLP method for facilitating keyword extraction. The top 200 most-cited Science Citation Index (SCI) research articles discussing COVID-19 vaccine hesitancy were adopted as the target corpus for verification. The results showed that the keywords of Quadrant I ( n = 98) reached the highest lexical coverage (9.81%), indicating that the proposed method successfully identified and extracted the most important keywords from the target corpus, thus achieving more domain-oriented and accurate keyword extraction results.

Suggested Citation

  • Liang-Ching Chen, 2023. "An Improved Corpus-Based NLP Method for Facilitating Keyword Extraction: An Example of the COVID-19 Vaccine Hesitancy Corpus," Sustainability, MDPI, vol. 15(4), pages 1-19, February.
  • Handle: RePEc:gam:jsusta:v:15:y:2023:i:4:p:3402-:d:1066965
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2071-1050/15/4/3402/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2071-1050/15/4/3402/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Jung-Fa Tsai & Chin-Po Wang & Kuei-Lun Chang & Yi-Chung Hu, 2021. "Selecting Bloggers for Hotels via an Innovative Mixed MCDM Model," Mathematics, MDPI, vol. 9(13), pages 1-15, July.
    2. Zhong-Lei Wang & Hou-Cai Shen & Jian Zuo, 2019. "Risks in Prefabricated Buildings in China: Importance-Performance Analysis Approach," Sustainability, MDPI, vol. 11(12), pages 1-13, June.
    3. Marcin Kozak & Lutz Bornmann, 2012. "A New Family of Cumulative Indexes for Measuring Scientific Performance," PLOS ONE, Public Library of Science, vol. 7(10), pages 1-4, October.
    4. Phadermrod, Boonyarat & Crowder, Richard M. & Wills, Gary B., 2019. "Importance-Performance Analysis based SWOT analysis," International Journal of Information Management, Elsevier, vol. 44(C), pages 194-203.
    5. Yi-Fang Luo & Heng-Yu Shen & Shu-Ching Yang & Liang-Ching Chen, 2021. "The Relationships among Anxiety, Subjective Well-Being, Media Consumption, and Safety-Seeking Behaviors during the COVID-19 Epidemic," IJERPH, MDPI, vol. 18(24), pages 1-12, December.
    6. Ida Rašovská & Marketa Kubickova & Kateřina Ryglová, 2021. "Importance–performance analysis approach to destination management," Tourism Economics, , vol. 27(4), pages 777-794, June.
    7. Yaoping Zhong & Wenzhong Zhu & Yingying Zhou, 2020. "CSR Image Construction of Chinese Construction Enterprises in Africa Based on Data Mining and Corpus Analysis," Mathematical Problems in Engineering, Hindawi, vol. 2020, pages 1-14, July.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Zheng Yuan & Baohua Wen & Cheng He & Jin Zhou & Zhonghua Zhou & Feng Xu, 2022. "Application of Multi-Criteria Decision-Making Analysis to Rural Spatial Sustainability Evaluation: A Systematic Review," IJERPH, MDPI, vol. 19(11), pages 1-31, May.
    2. Fei-Hsin Huang & Hann Nguyen, 2022. "Selecting Optimal Cultural Tourism for Indigenous Tribes by Fuzzy MCDM," Mathematics, MDPI, vol. 10(17), pages 1-12, August.
    3. Esmailpour, Javad & Aghabayk, Kayvan & Abrari Vajari, Mohammad & De Gruyter, Chris, 2020. "Importance – Performance Analysis (IPA) of bus service attributes: A case study in a developing country," Transportation Research Part A: Policy and Practice, Elsevier, vol. 142(C), pages 129-150.
    4. Uroš Kramar & Dejan Dragan & Darja Topolšek, 2019. "The Holistic Approach to Urban Mobility Planning with a Modified Focus Group, SWOT, and Fuzzy Analytical Hierarchical Process," Sustainability, MDPI, vol. 11(23), pages 1-29, November.
    5. Debasish Roy, 2021. "An Exploration in SWOT Function and Formulation of SWOT Index (SWOTIN): A Cross-Country Empirical Study (2006 – 2015)," Journal of Applied Management and Investments, Department of Business Administration and Corporate Security, International Humanitarian University, vol. 10(2), pages 53-66, December.
    6. Carla M. A. Pinto & Jorge Mendonça & Lurdes Babo & Francisco J. G. Silva & José L. R. Fernandes, 2022. "Analyzing the Implementation of Lean Methodologies and Practices in the Portuguese Industry: A Survey," Sustainability, MDPI, vol. 14(3), pages 1-24, February.
    7. Macias, A. & Kandidayeni, M. & Boulon, L. & Trovão, J.P., 2021. "Fuel cell-supercapacitor topologies benchmark for a three-wheel electric vehicle powertrain," Energy, Elsevier, vol. 224(C).
    8. Rykała Wojciech & Dąbrowska Dominika, 2020. "Risk assessment for groundwater in the region of municipal landfill systems in Tychy-Urbanowice (Southern Poland)," Environmental & Socio-economic Studies, Sciendo, vol. 8(1), pages 9-17, March.
    9. Prince Donkor Ameyaw & Walter Timo de Vries, 2021. "Toward Smart Land Management: Land Acquisition and the Associated Challenges in Ghana. A Look into a Blockchain Digital Land Registry for Prospects," Land, MDPI, vol. 10(3), pages 1-22, March.
    10. Yuan Yuan & Tianhui You & Tian’ai Xu & Xun Yu, 2022. "Customer-Oriented Strategic Planning for Hotel Competitiveness Improvement Based on Online Reviews," Sustainability, MDPI, vol. 14(22), pages 1-30, November.
    11. Poponi, Stefano & Piovesan, Gianluca & Fulco, Irene & Vessella, Federico, 2022. "Geolocation of mountain businesses: Identifying and characterizing clusters by altitude in the Central Apennines," Land Use Policy, Elsevier, vol. 120(C).
    12. Ridoan Karim & Firdaus Muhammad-Sukki & Mina Hemmati & Md Shah Newaz & Haroon Farooq & Mohd Nabil Muhtazaruddin & Muhammad Zulkipli & Jorge Alfredo Ardila-Rey, 2020. "RETRACTED: Paving towards Strategic Investment Decision: A SWOT Analysis of Renewable Energy in Bangladesh," Sustainability, MDPI, vol. 12(24), pages 1-30, December.
    13. Al-Hussein M. H. Al-Aidrous & Nasir Shafiq & Yasser Yahya Al-Ashmori & Al-Baraa Abdulrahman Al-Mekhlafi & Abdullah O. Baarimah, 2022. "Essential Factors Enhancing Industrialized Building Implementation in Malaysian Residential Projects," Sustainability, MDPI, vol. 14(18), pages 1-18, September.
    14. Ruihui Pu & Deimante Teresiene & Ina Pieczulis & Jie Kong & Xiao-Guang Yue, 2021. "The Interaction between Banking Sector and Financial Technology Companies: Qualitative Assessment—A Case of Lithuania," Risks, MDPI, vol. 9(1), pages 1-22, January.
    15. Lorena Bašan & Jelena Kapeš & Ana Kamenečki, 2021. "Tourist Satisfaction as a Driver of Destination Marketing Improvements: The Case of the Opatija Riviera," Tržište/Market, Faculty of Economics and Business, University of Zagreb, vol. 33(1), pages 93-112.
    16. Boonlert Jitmaneeroj, 2024. "Value relevance of multifaceted corporate social performance: how do country-specific factors matter?," Palgrave Communications, Palgrave Macmillan, vol. 11(1), pages 1-20, December.
    17. Zikang Hao & Mengmeng Zhang & Kerui Liu & Xiaodan Zhang & Haoran Jia & Ping Chen, 2022. "Where Is the Way Forward for New Media Empowering Public Health? Development Strategy Options Based on SWOT-AHP Model," IJERPH, MDPI, vol. 19(19), pages 1-19, October.
    18. Ayaz Ahmad Khan & Rongrong Yu & Tingting Liu & Ning Gu & James Walsh, 2023. "Volumetric Modular Construction Risks: A Comprehensive Review and Digital-Technology-Coupled Circular Mitigation Strategies," Sustainability, MDPI, vol. 15(8), pages 1-34, April.
    19. Budsaratragoon, Pornanong & Jitmaneeroj, Boonlert, 2020. "A critique on the Corruption Perceptions Index: An interdisciplinary approach," Socio-Economic Planning Sciences, Elsevier, vol. 70(C).
    20. Sandro Serpa & Carlos Miguel Ferreira & Maria José Sá, 2020. "The Potential of Organisations’ SWOT Diagnostic Assessment," Academic Journal of Interdisciplinary Studies, Richtmann Publishing Ltd, vol. 9, July.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jsusta:v:15:y:2023:i:4:p:3402-:d:1066965. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.