IDEAS home Printed from https://ideas.repec.org/p/ete/ecoomp/633963.html
   My bibliography  Save this paper

Paragraph-based intra- and inter- document similarity using neural vector paragraph embeddings

Author

Listed:
  • Bart Thijs

Abstract

Science mapping using document networks is based on the assumption that scientific papers are indivisible units with unique links to neighbour documents. Research on proximity in co-citation analysis and the study of lexical properties of sections and citation contexts indicate that this assumption is questionable. Moreover, the meaning of words and co-words depends on the context in which they appear. This study proposes the use of a neural network architecture for word and paragraph embeddings (Doc2Vec) for the measurement of similarity among those smaller units of analysis. It is shown that paragraphs in the ‘Introduction’ and the ‘Discussion’ section are more similar to the abstract, that the similarity among paragraphs is related to -but not linearly- the distance between the paragraphs. The ‘Methodology’ section is least similar to the other sections. Abstracts of citing-cited documents are more similar than random pairs and the context in which a reference appears is most similar to the abstract of the cited document. This novel approach with higher granularity can be used for bibliometric aided retrieval and to assist in measuring interdisciplinarity through the application of network-based centrality measures.

Suggested Citation

  • Bart Thijs, 2019. "Paragraph-based intra- and inter- document similarity using neural vector paragraph embeddings," Working Papers of ECOOM - Centre for Research and Development Monitoring 633963, KU Leuven, Faculty of Economics and Business (FEB), ECOOM - Centre for Research and Development Monitoring.
  • Handle: RePEc:ete:ecoomp:633963
    Note: paper number MSI_1901
    as

    Download full text from publisher

    File URL: https://lirias.kuleuven.be/retrieve/531525
    File Function: Published version
    Download Restriction: no
    ---><---

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:ete:ecoomp:633963. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: library EBIB (email available below). General contact details of provider: https://feb.kuleuven.be/centers/ecoom .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.