IDEAS home Printed from https://ideas.repec.org/a/spr/annopr/v339y2024i1d10.1007_s10479-023-05336-z.html
   My bibliography  Save this article

Incorporating topic membership in review rating prediction from unstructured data: a gradient boosting approach

Author

Listed:
  • Nan Yang

    (University of East Anglia)

  • Nikolaos Korfiatis

    (University of East Anglia)

  • Dimitris Zissis

    (University of East Anglia)

  • Konstantina Spanaki

    (Audencia Business School)

Abstract

Rating prediction is a crucial element of business analytics as it enables decision-makers to assess service performance based on expressive customer feedback. Enhancing rating score predictions and demand forecasting through incorporating performance features from verbatim text fields, particularly in service quality measurement and customer satisfaction modelling is a key objective in various areas of analytics. A range of methods has been identified in the literature for improving the predictability of customer feedback, including simple bag-of-words-based approaches and advanced supervised machine learning models, which are designed to work with response variables such as Likert-based rating scores. This paper presents a dynamic model that incorporates values from topic membership, an outcome variable from Latent Dirichlet Allocation, with sentiment analysis in an Extreme Gradient Boosting (XGBoost) model used for rating prediction. The results show that, by incorporating features from simple unsupervised machine learning approaches (LDA-based), an 86% prediction accuracy (AUC based) can be achieved on objective rating values. At the same time, a combination of polarity and single-topic membership can yield an even higher accuracy when compared with sentiment text detection tasks both at the document and sentence levels. This study carries significant practical implications since sentiment analysis tasks often require dictionary coverage and domain-specific adjustments depending on the task at hand. To further investigate this result, we used Shapley Additive Values to determine the additive predictability of topic membership values in combination with sentiment-based methods using a dataset of customer reviews from food delivery services.

Suggested Citation

  • Nan Yang & Nikolaos Korfiatis & Dimitris Zissis & Konstantina Spanaki, 2024. "Incorporating topic membership in review rating prediction from unstructured data: a gradient boosting approach," Annals of Operations Research, Springer, vol. 339(1), pages 631-662, August.
  • Handle: RePEc:spr:annopr:v:339:y:2024:i:1:d:10.1007_s10479-023-05336-z
    DOI: 10.1007/s10479-023-05336-z
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s10479-023-05336-z
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s10479-023-05336-z?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Chatterjee, Swagato & Goyal, Divesh & Prakash, Atul & Sharma, Jiwan, 2021. "Exploring healthcare/health-product ecommerce satisfaction: A text mining and machine learning application," Journal of Business Research, Elsevier, vol. 131(C), pages 815-825.
    2. Verma, Sanjeev & Yadav, Neha, 2021. "Past, Present, and Future of Electronic Word of Mouth (EWOM)," Journal of Interactive Marketing, Elsevier, vol. 53(C), pages 111-128.
    3. Long Mai & Bac Le, 2021. "Joint sentence and aspect-level sentiment analysis of product comments," Annals of Operations Research, Springer, vol. 300(2), pages 493-513, May.
    4. Al-Natour, Sameh & Turetken, Ozgur, 2020. "A comparative assessment of sentiment analysis and star ratings for consumer reviews," International Journal of Information Management, Elsevier, vol. 54(C).
    5. Ajay Kumar & Ram D. Gopal & Ravi Shankar & Kim Hua Tan, 2022. "Fraudulent review detection model focusing on emotional expressions and explicit aspects : investigating the potential of feature engineering," Post-Print hal-03630420, HAL.
    6. Scott Deerwester & Susan T. Dumais & George W. Furnas & Thomas K. Landauer & Richard Harshman, 1990. "Indexing by latent semantic analysis," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 41(6), pages 391-407, September.
    7. Geetha, M. & Singha, Pratap & Sinha, Sumedha, 2017. "Relationship between customer sentiment and online customer ratings for hotels - An empirical analysis," Tourism Management, Elsevier, vol. 61(C), pages 43-54.
    8. Xu, Xun, 2020. "Examining an asymmetric effect between online customer reviews emphasis and overall satisfaction determinants," Journal of Business Research, Elsevier, vol. 106(C), pages 196-210.
    9. Guo, Yue & Barnes, Stuart J. & Jia, Qiong, 2017. "Mining meaning from online ratings and reviews: Tourist satisfaction analysis using latent dirichlet allocation," Tourism Management, Elsevier, vol. 59(C), pages 467-483.
    10. Wang, Yihan & Zhong, Ke & Liu, Qihua, 2022. "Let criticism take precedence: Effect of side order on consumer attitudes toward a two-sided online review," Journal of Business Research, Elsevier, vol. 140(C), pages 403-419.
    11. Joachim Büschken & Greg M. Allenby, 2016. "Sentence-Based Text Analysis for Customer Reviews," Marketing Science, INFORMS, vol. 35(6), pages 953-975, November.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Kolomoyets, Yuliya & Dickinger, Astrid, 2023. "Understanding value perceptions and propositions: A machine learning approach," Journal of Business Research, Elsevier, vol. 154(C).
    2. Zajadacz Alina & Minkwitz Aleksandra, 2020. "Using Social Media Data to Plan for Tourism," Quaestiones Geographicae, Sciendo, vol. 39(3), pages 125-138, September.
    3. Ahani, Ali & Nilashi, Mehrbakhsh & Yadegaridehkordi, Elaheh & Sanzogni, Louis & Tarik, A. Rashid & Knox, Kathy & Samad, Sarminah & Ibrahim, Othman, 2019. "Revealing customers’ satisfaction and preferences through online review analysis: The case of Canary Islands hotels," Journal of Retailing and Consumer Services, Elsevier, vol. 51(C), pages 331-343.
    4. Xue, Lan & Leung, Xi Y. & Ma, Shihan (David), 2022. "What makes a good “guest”: Evidence from Airbnb hosts' reviews," Annals of Tourism Research, Elsevier, vol. 95(C).
    5. Jin Li & Yulan Zhang & Jianping Li & Jiangze Du, 2023. "The Role of Sentiment Tendency in Affecting Review Helpfulness for Durable Products: Nonlinearity and Complementarity," Information Systems Frontiers, Springer, vol. 25(4), pages 1459-1477, August.
    6. Woohyuk Kim & Sung-Bum Kim & Eunhye Park, 2021. "Mapping Tourists’ Destination (Dis)Satisfaction Attributes with User-Generated Content," Sustainability, MDPI, vol. 13(22), pages 1-16, November.
    7. Mariani, Marcello M. & Borghi, Matteo & Laker, Benjamin, 2023. "Do submission devices influence online review ratings differently across different types of platforms? A big data analysis," Technological Forecasting and Social Change, Elsevier, vol. 189(C).
    8. Enrique Bigne & Carla Ruiz & Carmen Perez-Cabañero & Antonio Cuenca, 2023. "Are customer star ratings and sentiments aligned? A deep learning study of the customer service experience in tourism destinations," Service Business, Springer;Pan-Pacific Business Association, vol. 17(1), pages 281-314, March.
    9. Nohel Zaman & David M. Goldberg & Richard J. Gruss & Alan S. Abrahams & Siriporn Srisawas & Peter Ractham & Michelle M.H. Şeref, 2022. "Cross-Category Defect Discovery from Online Reviews: Supplementing Sentiment with Category-Specific Semantics," Information Systems Frontiers, Springer, vol. 24(4), pages 1265-1285, August.
    10. Lu, Lin & Xu, Pei & Wang, Yen-Yao & Wang, Yu, 2023. "Measuring service quality with text analytics: Considering both importance and performance of consumer opinions on social and non-social online platforms," Journal of Business Research, Elsevier, vol. 169(C).
    11. Feifei Wang & Yang Yang & Geoffrey K. F. Tso & Yang Li, 2019. "Analysis of launch strategy in cross-border e-Commerce market via topic modeling of consumer reviews," Electronic Commerce Research, Springer, vol. 19(4), pages 863-884, December.
    12. Zhang, Min & Sun, Lin & Wang, G. Alan & Li, Yuzhuo & He, Shuguang, 2022. "Using neutral sentiment reviews to improve customer requirement identification and product design strategies," International Journal of Production Economics, Elsevier, vol. 254(C).
    13. Mengqiang Pan & Nao Li & Xiankai Huang, 2022. "Asymmetrical impact of service attribute performance on consumer satisfaction: an asymmetric impact-attention-performance analysis," Information Technology & Tourism, Springer, vol. 24(2), pages 221-243, June.
    14. Zuo, Wenming & Bai, Weijing & Zhu, Wenfeng & He, Xinming & Qiu, Xinxin, 2022. "Changes in service quality of sharing accommodation: Evidence from airbnb," Technology in Society, Elsevier, vol. 71(C).
    15. Md Shamim Hossain & Mst Farjana Rahman, 2023. "Customer Sentiment Analysis and Prediction of Insurance Products’ Reviews Using Machine Learning Approaches," FIIB Business Review, , vol. 12(4), pages 386-402, December.
    16. Gül Yazıcı & Tuğçe Ozansoy Çadırcı, 2024. "Creating meaningful insights from customer reviews: a methodological comparison of topic modeling algorithms and their use in marketing research," Journal of Marketing Analytics, Palgrave Macmillan, vol. 12(4), pages 865-887, December.
    17. Han, Chunjia & Yang, Mu, 2021. "Revealing Airbnb user concerns on different room types," Annals of Tourism Research, Elsevier, vol. 89(C).
    18. Susan (Sixue) Jia, 2021. "Analyzing Restaurant Customers’ Evolution of Dining Patterns and Satisfaction during COVID-19 for Sustainable Business Insights," Sustainability, MDPI, vol. 13(9), pages 1-15, April.
    19. Zaman, Mustafeed & Vo-Thanh, Tan & Nguyen, Chi T.K. & Hasan, Rajibul & Akter, Shahriar & Mariani, Marcello & Hikkerova, Lubica, 2023. "Motives for posting fake reviews: Evidence from a cross-cultural comparison," Journal of Business Research, Elsevier, vol. 154(C).
    20. Alzate, Miriam & Arce-Urriza, Marta & Cebollada, Javier, 2022. "Mining the text of online consumer reviews to analyze brand image and brand positioning," Journal of Retailing and Consumer Services, Elsevier, vol. 67(C).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:annopr:v:339:y:2024:i:1:d:10.1007_s10479-023-05336-z. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.