IDEAS home Printed from https://ideas.repec.org/h/spr/sprchp/978-981-97-2902-9_2.html
   My bibliography  Save this book chapter

Investigation of Predictive Power of Sentiment Analysis Model Developed Using Different Word Embedding Techniques

In: Data-Driven Decision Making

Author

Listed:
  • Sudhanshu Kumar Guru

    (Micron Technology)

  • Lov Kumar

    (NIT Kurukshetra)

Abstract

In the area of text mining, sentiment analysis is very powerful technique to sense the overall emotion or sentiment behind huge set of text. Sentiment analysis helps in observing opinion about any product, topic, policy, etc., from thousands and thousands of online reviews, twits, social media comments, hashtags, etc. In the area of Software Engineering (SE) also this technique is being explored and found to be an interesting way to observe the opinion of developers regarding new set of APIs, code library or even a bug on blogging websites like StackOverflow.com or bug tracking tool like jira. There are already some popular tools available to perform the sentiment analysis on SE texts like SentiStrength, EmoTxt, Vader (NLTK), etc. Most of these use word dictionary which gives positive/negative score for the words. In this project/paper empirical analysis of various word embedding techniques in SE text is performed which are collected from 3 different sources StackOverflow.com, jira and app reviews. Since algorithms take vectors of numbers therefore, SE text has to be converted into vectors of numbers. There are 6 different word embedding techniques (Count Vectorization, TF-IDF, Word2Vec-CBOW & Skip-gram, Glove and Word2Vec pretrained on google news feed) used to convert the input texts into vectors and compared the results and found Word2Vec (pretrained on Google News corpus feed) and Glove are performing almost similar and better than other techniques. In this paper 3 different feature selection/reduction techniques are used: Significant Feature (SF) Selection, Significant Predictor Feature (SPF) Selection and Principal Component Analysis (PCA) and again comparative analysis is performed and found SPF and SF are producing very close result. Finally, 8 different Machine Learning model techniques are used to study the sentiment analysis and an empirical analysis has been performed to identify the best ML method in terms of accuracy and cost. Through this study our motive is to explore which word embedding technique in combination with feature reduction and ML model is best suitable for SE-related text’s sentiment analysis.

Suggested Citation

  • Sudhanshu Kumar Guru & Lov Kumar, 2024. "Investigation of Predictive Power of Sentiment Analysis Model Developed Using Different Word Embedding Techniques," Springer Books, in: Jeanne Poulose & Vinod Sharma & Chandan Maheshkar (ed.), Data-Driven Decision Making, chapter 0, pages 27-58, Springer.
  • Handle: RePEc:spr:sprchp:978-981-97-2902-9_2
    DOI: 10.1007/978-981-97-2902-9_2
    as

    Download full text from publisher

    To our knowledge, this item is not available for download. To find whether it is available, there are three options:
    1. Check below whether another version of this item is available online.
    2. Check on the provider's web page whether it is in fact available.
    3. Perform a search for a similarly titled item that would be available.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:sprchp:978-981-97-2902-9_2. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.