IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v10y2022i18p3260-d909490.html
   My bibliography  Save this article

Identification of Review Helpfulness Using Novel Textual and Language-Context Features

Author

Listed:
  • Muhammad Shehrayar Khan

    (Department of Computer Science, COMSATS University Islamabad, Attock Campus, Islamabad 43600, Pakistan)

  • Atif Rizwan

    (Department of Computer Engineering, Jeju National University, Jeju-si 63243, Korea)

  • Muhammad Shahzad Faisal

    (Department of Computer Science, COMSATS University Islamabad, Attock Campus, Islamabad 43600, Pakistan)

  • Tahir Ahmad

    (Department of Computer Science, COMSATS University Islamabad, Attock Campus, Islamabad 43600, Pakistan)

  • Muhammad Saleem Khan

    (Department of Computer Science, COMSATS University Islamabad, Attock Campus, Islamabad 43600, Pakistan)

  • Ghada Atteia

    (Department of Information Technology, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia)

Abstract

With the increase in users of social media websites such as IMDb, a movie website, and the rise of publicly available data, opinion mining is more accessible than ever. In the research field of language understanding, categorization of movie reviews can be challenging because human language is complex, leading to scenarios where connotation words exist. Connotation words have a different meaning than their literal meanings. While representing a word, the context in which the word is used changes the semantics of words. In this research work, categorizing movie reviews with good F-Measure scores has been investigated with Word2Vec and three different aspects of proposed features have been inspected. First, psychological features are extracted from reviews positive emotion, negative emotion, anger, sadness, clout (confidence level) and dictionary words. Second, readablility features are extracted; the Automated Readability Index (ARI), the Coleman Liau Index (CLI) and Word Count (WC) are calculated to measure the review’s understandability score and their impact on review classification performance is measured. Lastly, linguistic features are also extracted from reviews adjectives and adverbs. The Word2Vec model is trained on collecting 50,000 reviews related to movies. A self-trained Word2Vec model is used for the contextualized embedding of words into vectors with 50, 100, 150 and 300 dimensions.The pretrained Word2Vec model converts words into vectors with 150 and 300 dimensions. Traditional and advanced machine-learning (ML) algorithms are applied and evaluated according to performance measures: accuracy, precision, recall and F-Measure. The results indicate Support Vector Machine (SVM) using self-trained Word2Vec achieved 86% F-Measure and using psychological, linguistic and readability features with concatenation of Word2Vec features SVM achieved 87.93% F-Measure.

Suggested Citation

  • Muhammad Shehrayar Khan & Atif Rizwan & Muhammad Shahzad Faisal & Tahir Ahmad & Muhammad Saleem Khan & Ghada Atteia, 2022. "Identification of Review Helpfulness Using Novel Textual and Language-Context Features," Mathematics, MDPI, vol. 10(18), pages 1-20, September.
  • Handle: RePEc:gam:jmathe:v:10:y:2022:i:18:p:3260-:d:909490
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/10/18/3260/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/10/18/3260/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Fink, Lior & Rosenfeld, Liron & Ravid, Gilad, 2018. "Longer online reviews are not necessarily better," International Journal of Information Management, Elsevier, vol. 39(C), pages 30-37.
    2. Christophe Croux & Catherine Dehon, 2010. "Influence functions of the Spearman and Kendall correlation measures," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 19(4), pages 497-515, November.
    3. Gupta, Shivam & Kar, Arpan Kumar & Baabdullah, Abdullah & Al-Khowaiter, Wassan A.A., 2018. "Big data with cognitive computing: A review for the future," International Journal of Information Management, Elsevier, vol. 42(C), pages 78-89.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. de Camargo Fiorini, Paula & Roman Pais Seles, Bruno Michel & Chiappetta Jabbour, Charbel Jose & Barberio Mariano, Enzo & de Sousa Jabbour, Ana Beatriz Lopes, 2018. "Management theory and big data literature: From a review to a research agenda," International Journal of Information Management, Elsevier, vol. 43(C), pages 112-129.
    2. Barati, Hojjat & Yazici, Anil & Almotahari, Amirmasoud, 2024. "A methodology for ranking of critical links in transportation networks based on criticality score distributions," Reliability Engineering and System Safety, Elsevier, vol. 251(C).
    3. Pelau Corina & Barbul Maria, 2021. "Consumers’ perception on the use of cognitive computing," Proceedings of the International Conference on Business Excellence, Sciendo, vol. 15(1), pages 639-649, December.
    4. Michael Pfarrhofer, 2024. "Forecasts with Bayesian vector autoregressions under real time conditions," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 43(3), pages 771-801, April.
    5. Alvarez, Agustín & Boente, Graciela & Kudraszow, Nadia, 2019. "Robust sieve estimators for functional canonical correlation analysis," Journal of Multivariate Analysis, Elsevier, vol. 170(C), pages 46-62.
    6. Yu Cao & Liyan Huang & Nur Mardhiyah Aziz & Syahrul Nizam Kamaruzzaman, 2022. "Building Information Modelling (BIM) Capabilities in the Design and Planning of Rural Settlements in China: A Systematic Review," Land, MDPI, vol. 11(10), pages 1-34, October.
    7. Arthur Lehner & Christoph Erlacher & Matthias Schlögl & Jacob Wegerer & Thomas Blaschke & Klaus Steinnocher, 2018. "Can ISO-Defined Urban Sustainability Indicators Be Derived from Remote Sensing: An Expert Weighting Approach," Sustainability, MDPI, vol. 10(4), pages 1-31, April.
    8. Vanderford Courtney & Sang Yongli & Dang Xin, 2020. "Two symmetric and computationally efficient Gini correlations," Dependence Modeling, De Gruyter, vol. 8(1), pages 373-395, January.
    9. Daniel J. Hernandez & Fernando Jaramillo & Hubert Kempf & Fabien Moizeau & Thomas Vendryes, 2023. "Limited Commitment, Social Control and Risk-Sharing Coalitions in Village Economies," Economics Working Paper Archive (University of Rennes & University of Caen) 2023-03, Center for Research in Economics and Management (CREM), University of Rennes, University of Caen and CNRS.
    10. Hassani, Abdeslam & Mosconi, Elaine, 2022. "Social media analytics, competitive intelligence, and dynamic capabilities in manufacturing SMEs," Technological Forecasting and Social Change, Elsevier, vol. 175(C).
    11. Gerald Oeser & Pietro Romano, 2021. "Exploring risk pooling in hospitals to reduce demand and lead time uncertainty," Operations Management Research, Springer, vol. 14(1), pages 78-94, June.
    12. Nayak, Purusottam & Mishra, Sudhanshu K, 2014. "A State Level Analysis of the Status of Social Sector in India," MPRA Paper 58136, University Library of Munich, Germany.
    13. Stephanou, Michael & Varughese, Melvin, 2021. "Sequential estimation of Spearman rank correlation using Hermite series estimators," Journal of Multivariate Analysis, Elsevier, vol. 186(C).
    14. Linda Menk & Christian Neuwirth & Stefan Kienberger, 2020. "Mapping the Structure of Social Vulnerability Systems for Malaria in East Africa," Sustainability, MDPI, vol. 12(12), pages 1-19, June.
    15. Rokhsareh Khashtabeh & Morteza Akbari & Mahdi Kolahi & Ali Talebanfard, 2021. "Assessing the effects of desertification control projects using socio-economic indicators in the arid regions of eastern Iran," Environment, Development and Sustainability: A Multidisciplinary Approach to the Theory and Practice of Sustainable Development, Springer, vol. 23(7), pages 10455-10469, July.
    16. Vincent Nzabarinda & Anming Bao & Wenqiang Xu & Solange Uwamahoro & Liangliang Jiang & Yongchao Duan & Lamek Nahayo & Tao Yu & Ting Wang & Gang Long, 2021. "Assessment and Evaluation of the Response of Vegetation Dynamics to Climate Variability in Africa," Sustainability, MDPI, vol. 13(3), pages 1-22, January.
    17. Feng, Yi & Yin, Yunqiang & Wang, Dujuan & Ignatius, Joshua & Cheng, T.C.E. & Marra, Marianna & Guo, Yihan, 2024. "Enhancing e-commerce customer churn management with a profit- and AUC-focused prescriptive analytics approach," Journal of Business Research, Elsevier, vol. 184(C).
    18. Gijbels, Irène & Kika, Vojtěch & Omelka, Marek, 2021. "On the specification of multivariate association measures and their behaviour with increasing dimension," Journal of Multivariate Analysis, Elsevier, vol. 182(C).
    19. Ariyaluran Habeeb, Riyaz Ahamed & Nasaruddin, Fariza & Gani, Abdullah & Targio Hashem, Ibrahim Abaker & Ahmed, Ejaz & Imran, Muhammad, 2019. "Real-time big data processing for anomaly detection: A Survey," International Journal of Information Management, Elsevier, vol. 45(C), pages 289-307.
    20. Aparisi-Cerdá, I. & Ribó-Pérez, D. & Gomar-Pascual, J. & Pineda-Soler, J. & Poveda-Bautista, R. & García-Melón, M., 2024. "Assessing gender and climate objectives interactions in urban decarbonisation policies," Renewable and Sustainable Energy Reviews, Elsevier, vol. 189(PA).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:10:y:2022:i:18:p:3260-:d:909490. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.