IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v10y2022i18p3260-d909490.html
   My bibliography  Save this article

Identification of Review Helpfulness Using Novel Textual and Language-Context Features

Author

Listed:
  • Muhammad Shehrayar Khan

    (Department of Computer Science, COMSATS University Islamabad, Attock Campus, Islamabad 43600, Pakistan)

  • Atif Rizwan

    (Department of Computer Engineering, Jeju National University, Jeju-si 63243, Korea)

  • Muhammad Shahzad Faisal

    (Department of Computer Science, COMSATS University Islamabad, Attock Campus, Islamabad 43600, Pakistan)

  • Tahir Ahmad

    (Department of Computer Science, COMSATS University Islamabad, Attock Campus, Islamabad 43600, Pakistan)

  • Muhammad Saleem Khan

    (Department of Computer Science, COMSATS University Islamabad, Attock Campus, Islamabad 43600, Pakistan)

  • Ghada Atteia

    (Department of Information Technology, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia)

Abstract

With the increase in users of social media websites such as IMDb, a movie website, and the rise of publicly available data, opinion mining is more accessible than ever. In the research field of language understanding, categorization of movie reviews can be challenging because human language is complex, leading to scenarios where connotation words exist. Connotation words have a different meaning than their literal meanings. While representing a word, the context in which the word is used changes the semantics of words. In this research work, categorizing movie reviews with good F-Measure scores has been investigated with Word2Vec and three different aspects of proposed features have been inspected. First, psychological features are extracted from reviews positive emotion, negative emotion, anger, sadness, clout (confidence level) and dictionary words. Second, readablility features are extracted; the Automated Readability Index (ARI), the Coleman Liau Index (CLI) and Word Count (WC) are calculated to measure the review’s understandability score and their impact on review classification performance is measured. Lastly, linguistic features are also extracted from reviews adjectives and adverbs. The Word2Vec model is trained on collecting 50,000 reviews related to movies. A self-trained Word2Vec model is used for the contextualized embedding of words into vectors with 50, 100, 150 and 300 dimensions.The pretrained Word2Vec model converts words into vectors with 150 and 300 dimensions. Traditional and advanced machine-learning (ML) algorithms are applied and evaluated according to performance measures: accuracy, precision, recall and F-Measure. The results indicate Support Vector Machine (SVM) using self-trained Word2Vec achieved 86% F-Measure and using psychological, linguistic and readability features with concatenation of Word2Vec features SVM achieved 87.93% F-Measure.

Suggested Citation

  • Muhammad Shehrayar Khan & Atif Rizwan & Muhammad Shahzad Faisal & Tahir Ahmad & Muhammad Saleem Khan & Ghada Atteia, 2022. "Identification of Review Helpfulness Using Novel Textual and Language-Context Features," Mathematics, MDPI, vol. 10(18), pages 1-20, September.
  • Handle: RePEc:gam:jmathe:v:10:y:2022:i:18:p:3260-:d:909490
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/10/18/3260/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/10/18/3260/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Fink, Lior & Rosenfeld, Liron & Ravid, Gilad, 2018. "Longer online reviews are not necessarily better," International Journal of Information Management, Elsevier, vol. 39(C), pages 30-37.
    2. Christophe Croux & Catherine Dehon, 2010. "Influence functions of the Spearman and Kendall correlation measures," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 19(4), pages 497-515, November.
    3. Gupta, Shivam & Kar, Arpan Kumar & Baabdullah, Abdullah & Al-Khowaiter, Wassan A.A., 2018. "Big data with cognitive computing: A review for the future," International Journal of Information Management, Elsevier, vol. 42(C), pages 78-89.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. de Camargo Fiorini, Paula & Roman Pais Seles, Bruno Michel & Chiappetta Jabbour, Charbel Jose & Barberio Mariano, Enzo & de Sousa Jabbour, Ana Beatriz Lopes, 2018. "Management theory and big data literature: From a review to a research agenda," International Journal of Information Management, Elsevier, vol. 43(C), pages 112-129.
    2. Pablo Aragonés‐Beltrán & Mª. Carmen González‐Cruz & Astrid León‐Camargo & Rosario Viñoles‐Cebolla, 2023. "Assessment of regional development needs according to criteria based on the Sustainable Development Goals in the Meta Region (Colombia)," Sustainable Development, John Wiley & Sons, Ltd., vol. 31(2), pages 1101-1121, April.
    3. Lutfi, Abdalwali & Alrawad, Mahmaod & Alsyouf, Adi & Almaiah, Mohammed Amin & Al-Khasawneh, Ahmad & Al-Khasawneh, Akif Lutfi & Alshira'h, Ahmad Farhan & Alshirah, Malek Hamed & Saad, Mohamed & Ibrahim, 2023. "Drivers and impact of big data analytic adoption in the retail industry: A quantitative investigation applying structural equation modeling," Journal of Retailing and Consumer Services, Elsevier, vol. 70(C).
    4. Pelau Corina & Barbul Maria, 2021. "Consumers’ perception on the use of cognitive computing," Proceedings of the International Conference on Business Excellence, Sciendo, vol. 15(1), pages 639-649, December.
    5. Taiga Saito & Shivam Gupta, 2022. "Big Data Applications with Theoretical Models and Social Media in Financial Management," CIRJE F-Series CIRJE-F-1205, CIRJE, Faculty of Economics, University of Tokyo.
    6. Taiga Saito & Shivam Gupta, 2022. "Big data applications with theoretical models and social media in financial management," CARF F-Series CARF-F-550, Center for Advanced Research in Finance, Faculty of Economics, The University of Tokyo.
    7. Liang Wu & Lin Guan & Feng Li & Qi Zhao & Yingjun Zhuo & Peng Chen & Yaotang Lv, 2018. "Optimal Dynamic Reactive Power Reserve for Wind Farms Addressing Short-Term Voltage Issues Caused by Wind Turbines Tripping," Energies, MDPI, vol. 11(7), pages 1-15, July.
    8. Umut Asan & Ayberk Soyer, 2022. "A Weighted Bonferroni-OWA Operator Based Cumulative Belief Degree Approach to Personnel Selection Based on Automated Video Interview Assessment Data," Mathematics, MDPI, vol. 10(9), pages 1-33, May.
    9. Michael Pfarrhofer, 2024. "Forecasts with Bayesian vector autoregressions under real time conditions," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 43(3), pages 771-801, April.
    10. Alvarez, Agustín & Boente, Graciela & Kudraszow, Nadia, 2019. "Robust sieve estimators for functional canonical correlation analysis," Journal of Multivariate Analysis, Elsevier, vol. 170(C), pages 46-62.
    11. Yi Feng & Yunqiang Yin & Dujuan Wang & Lalitha Dhamotharan & Joshua Ignatius & Ajay Kumar, 2023. "Diabetic patient review helpfulness: unpacking online drug treatment reviews by text analytics and design science approach," Annals of Operations Research, Springer, vol. 328(1), pages 387-418, September.
    12. Tarsitano Agostino & Lombardo Rosetta, 2013. "A Coefficient of Correlation Based on Ratios of Ranks and Anti-ranks," Journal of Economics and Statistics (Jahrbuecher fuer Nationaloekonomie und Statistik), De Gruyter, vol. 233(2), pages 206-224, April.
    13. Wied, Dominik & Dehling, Herold & van Kampen, Maarten & Vogel, Daniel, 2014. "A fluctuation test for constant Spearman’s rho with nuisance-free limit distribution," Computational Statistics & Data Analysis, Elsevier, vol. 76(C), pages 723-736.
    14. Chae, Bongsug (Kevin), 2019. "A General framework for studying the evolution of the digital innovation ecosystem: The case of big data," International Journal of Information Management, Elsevier, vol. 45(C), pages 83-94.
    15. Yu Cao & Liyan Huang & Nur Mardhiyah Aziz & Syahrul Nizam Kamaruzzaman, 2022. "Building Information Modelling (BIM) Capabilities in the Design and Planning of Rural Settlements in China: A Systematic Review," Land, MDPI, vol. 11(10), pages 1-34, October.
    16. Arthur Lehner & Christoph Erlacher & Matthias Schlögl & Jacob Wegerer & Thomas Blaschke & Klaus Steinnocher, 2018. "Can ISO-Defined Urban Sustainability Indicators Be Derived from Remote Sensing: An Expert Weighting Approach," Sustainability, MDPI, vol. 10(4), pages 1-31, April.
    17. Vanderford Courtney & Sang Yongli & Dang Xin, 2020. "Two symmetric and computationally efficient Gini correlations," Dependence Modeling, De Gruyter, vol. 8(1), pages 373-395, January.
    18. Markus Jäntti & Eva M. Sierminska & Philippe Van Kerm, 2015. "Modeling the Joint Distribution of Income and Wealth," Research on Economic Inequality, in: Measurement of Poverty, Deprivation, and Economic Mobility, volume 23, pages 301-327, Emerald Group Publishing Limited.
    19. repec:cte:wsrepe:es142416 is not listed on IDEAS
    20. Juan Daniel Hernandez & Fernando Jaramillo & Hubert Kempf & Fabien Moizeau & Thomas Vendryes, 2023. "Limited Commitment, Social Control and Risk-Sharing Coalitions in Village Economies," Documents de recherche 23-03, Centre d'Études des Politiques Économiques (EPEE), Université d'Evry Val d'Essonne.
    21. Hassani, Abdeslam & Mosconi, Elaine, 2022. "Social media analytics, competitive intelligence, and dynamic capabilities in manufacturing SMEs," Technological Forecasting and Social Change, Elsevier, vol. 175(C).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:10:y:2022:i:18:p:3260-:d:909490. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.