IDEAS home Printed from https://ideas.repec.org/h/elg/eechap/21469_3.html
   My bibliography  Save this book chapter

Knowing what you get when seeking semantic similarity: exploring classic NLP method biases

In: Handbook of Social Computing

Author

Listed:
  • Johanne Saint-Charles
  • Pierre Mongeau
  • Louis Renaud-Desjardins

Abstract

Various Natural Language Processing (NLP) methods are called upon to establish similarity between texts in the context of socio-semantic studies. This chapter addresses the methodological diversity in the field by asking to what extent classical NLP methods converge in their identification of similarity between various texts. We compare the results of well-known (and often used) NLP methods in social sciences and humanities: Jaccard, LDA, LSA and TF–IDF, on corpora with different characteristics. Results show that these methods have specific bias and cannot be substituted for one another. Our observations invite social sciences and humanities scholars to consider new criteria for the selection of an NLP method suited to their research objectives.

Suggested Citation

  • Johanne Saint-Charles & Pierre Mongeau & Louis Renaud-Desjardins, 2024. "Knowing what you get when seeking semantic similarity: exploring classic NLP method biases," Chapters, in: Peter A. Gloor & Francesca Grippa & Andrea Fronzetti Colladon & Aleksandra Przegalinska (ed.), Handbook of Social Computing, chapter 3, pages 27-46, Edward Elgar Publishing.
  • Handle: RePEc:elg:eechap:21469_3
    as

    Download full text from publisher

    File URL: https://www.elgaronline.com/doi/10.4337/9781803921259.00009
    Download Restriction: no
    ---><---

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:elg:eechap:21469_3. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Darrel McCalla (email available below). General contact details of provider: http://www.e-elgar.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.