IDEAS home Printed from https://ideas.repec.org/a/ovi/oviste/vxxiiiy2023i1p537-544.html
   My bibliography  Save this article

Understanding Customers' Opinion using Web Scraping and Natural Language Processing

Author

Listed:
  • Alin-Gabriel Vaduva

    (The Bucharest University of Economic Studies, Department of Economic Informatics and Cybernetics, Romania)

  • Simona-Vasilica Oprea

    (The Bucharest University of Economic Studies, Department of Economic Informatics and Cybernetics, Romania)

  • Dragos-Catalin Barbu

    (The Bucharest University of Economic Studies, Department of Economic Informatics and Cybernetics, Romania)

Abstract

The web offers large volumes of data that is unstructured and fails to be further processed if not extracted and organized into local variables or into databases. In this paper, we aim to extract data from the Internet using web scraping and analyse it with Natural Language Processing (NLP). Our purpose is to understand customers’ opinions by extracting reviews and investigating them in Python. The positive or negative insight of the reviews, along with the word cloud offer additional tools to understand the customers, predict their behaviour and underpin problems signalled in the reviews. TextBlob and BERTweet are applied to analyse the reviews. To enhance the comprehension of the outcomes, a comparison is drawn between the classifications generated by the BERTweet model and those provided by the TextBlob API, a widely used Python library for performing various NLP tasks. Furthermore, the reviews are pre-processed to clean them from line breaks, punctuation characters etc. and a n-grams analysis is performed to better understand the positive and negative reviews. The frequency of the reviews displays the concrete problems faced by customers visiting the hotel in various seasons. It helps decision makers to take measures and improve the quality of the hotel services.

Suggested Citation

  • Alin-Gabriel Vaduva & Simona-Vasilica Oprea & Dragos-Catalin Barbu, 2023. "Understanding Customers' Opinion using Web Scraping and Natural Language Processing," Ovidius University Annals, Economic Sciences Series, Ovidius University of Constantza, Faculty of Economic Sciences, vol. 0(1), pages 537-544, August.
  • Handle: RePEc:ovi:oviste:v:xxiii:y:2023:i:1:p:537-544
    as

    Download full text from publisher

    File URL: https://stec.univ-ovidius.ro/html/anale/RO/2023-i1/Section%203/38.pdf
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Yue Kang & Zhao Cai & Chee-Wee Tan & Qian Huang & Hefu Liu, 2020. "Natural language processing (NLP) in management research: A literature review," Journal of Management Analytics, Taylor & Francis Journals, vol. 7(2), pages 139-172, April.
    2. Venkatesh Shankar & Sohil Parsana, 2022. "An overview and empirical comparison of natural language processing (NLP) models and an introduction to and empirical application of autoencoder models in marketing," Journal of the Academy of Marketing Science, Springer, vol. 50(6), pages 1324-1350, November.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Arpan Kumar Kar & P. S. Varsha & Shivakami Rajan, 2023. "Unravelling the Impact of Generative Artificial Intelligence (GAI) in Industrial Applications: A Review of Scientific and Grey Literature," Global Journal of Flexible Systems Management, Springer;Global Institute of Flexible Systems Management, vol. 24(4), pages 659-689, December.
    2. Lu, Qinli & Chesbrough, Henry, 2022. "Measuring open innovation practices through topic modelling: Revisiting their impact on firm financial performance," Technovation, Elsevier, vol. 114(C).
    3. Zimei Liu & Kefan Xie & Ling Li & Yong Chen, 2020. "A paradigm of safety management in Industry 4.0," Systems Research and Behavioral Science, Wiley Blackwell, vol. 37(4), pages 632-645, July.
    4. Jing Li & Daniel Shapiro & Anastasia Ufimtseva, 2024. "Regulating inbound foreign direct investment in a world of hegemonic rivalry: the evolution and diffusion of US policy," Journal of International Business Policy, Palgrave Macmillan, vol. 7(2), pages 147-165, June.
    5. Segun Akinola & Arnesh Telukdarie, 2023. "Sustainable Digital Transformation in Healthcare: Advancing a Digital Vascular Health Innovation Solution," Sustainability, MDPI, vol. 15(13), pages 1-23, July.
    6. Tian, Yu-Xin & Zhang, Chuan, 2023. "An end-to-end deep learning model for solving data-driven newsvendor problem with accessibility to textual review data," International Journal of Production Economics, Elsevier, vol. 265(C).
    7. Mohammad Alamgir Hossain & Md. Maruf Hossan Chowdhury & Ilias O. Pappas & Bhimaraya Metri & Laurie Hughes & Yogesh K. Dwivedi, 2023. "Fake news on Facebook and their impact on supply chain disruption during COVID-19," Annals of Operations Research, Springer, vol. 327(2), pages 683-711, August.
    8. Indu Khurana & Daniel J. Lee, 2023. "Gender bias in high stakes pitching: an NLP approach," Small Business Economics, Springer, vol. 60(2), pages 485-502, February.
    9. Borchert, Philipp & Coussement, Kristof & De Caigny, Arno & De Weerdt, Jochen, 2023. "Extending business failure prediction models with textual website content using deep learning," European Journal of Operational Research, Elsevier, vol. 306(1), pages 348-357.
    10. Schauerte, Nico & Becker, Maren & Imschloss, Monika & Wichmann, Julian R.K. & Reinartz, Werner J., 2023. "The managerial relevance of marketing science: Properties and genesis," International Journal of Research in Marketing, Elsevier, vol. 40(4), pages 801-822.
    11. Tao Shu & Zhiyi Wang & Huading Jia & Wenjin Zhao & Jixian Zhou & Tao Peng, 2022. "Consumers’ Opinions towards Public Health Effects of Online Games: An Empirical Study Based on Social Media Comments in China," IJERPH, MDPI, vol. 19(19), pages 1-19, October.
    12. Chen, Shiuann-Shuoh & Choubey, Bhaskar & Singh, Vinay, 2021. "A neural network based price sensitive recommender model to predict customer choices based on price effect," Journal of Retailing and Consumer Services, Elsevier, vol. 61(C).
    13. Weifeng Jia & Shuo Wang & Yongping Xie & Zifeng Chen & Kaixin Gong, 2022. "Disruptive technology identification of intelligent logistics robots in AIoT industry: Based on attributes and functions analysis," Systems Research and Behavioral Science, Wiley Blackwell, vol. 39(3), pages 557-568, May.
    14. Adrian LUPASC, 2023. "The Potential of Natural Language Technology in Transforming Educational Processes," Economics and Applied Informatics, "Dunarea de Jos" University of Galati, Faculty of Economics and Business Administration, issue 3, pages 142-147.
    15. Wei Zhang & Linhui Sun & Xinping Wang & Anbo Wu, 2022. "The influence of AI word‐of‐mouth system on consumers' purchase behaviour: The mediating effect of risk perception," Systems Research and Behavioral Science, Wiley Blackwell, vol. 39(3), pages 516-530, May.
    16. Jing Ge & Feng Wang & Hongxia Sun & Liuliu Fu & Mingwei Sun, 2020. "Research on the maturity of big data management capability of intelligent manufacturing enterprise," Systems Research and Behavioral Science, Wiley Blackwell, vol. 37(4), pages 646-662, July.
    17. U. M. Fernandes Dimlo & V. Rupesh & Yeligeti Raju, 2024. "The dynamics of natural language processing and text mining under emerging artificial intelligence techniques," International Journal of System Assurance Engineering and Management, Springer;The Society for Reliability, Engineering Quality and Operations Management (SREQOM),India, and Division of Operation and Maintenance, Lulea University of Technology, Sweden, vol. 15(9), pages 4512-4526, September.
    18. Just, Julian, 2024. "Natural language processing for innovation search – Reviewing an emerging non-human innovation intermediary," Technovation, Elsevier, vol. 129(C).
    19. Shanshan Wu & Long Cheng & Changcheng Huang & Yaoyao Chen, 2024. "The impact of open innovation on firms’ performance in bad times: evidence from COVID-19 pandemic," Eurasian Business Review, Springer;Eurasia Business and Economics Society, vol. 14(3), pages 657-694, September.
    20. Kirk Plangger & Dhruv Grewal & Ko Ruyter & Catherine Tucker, 2022. "The future of digital technologies in marketing: A conceptual framework and an overview," Journal of the Academy of Marketing Science, Springer, vol. 50(6), pages 1125-1134, November.

    More about this item

    Keywords

    web scraping; booking; customers opinions; natural language processing;
    All these keywords.

    JEL classification:

    • Z13 - Other Special Topics - - Cultural Economics - - - Economic Sociology; Economic Anthropology; Language; Social and Economic Stratification
    • C55 - Mathematical and Quantitative Methods - - Econometric Modeling - - - Large Data Sets: Modeling and Analysis
    • C81 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - Methodology for Collecting, Estimating, and Organizing Microeconomic Data; Data Access

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:ovi:oviste:v:xxiii:y:2023:i:1:p:537-544. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Gheorghiu Gabriela (email available below). General contact details of provider: https://edirc.repec.org/data/feoviro.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.