IDEAS home Printed from https://ideas.repec.org/p/hal/spmain/hal-04911147.html
   My bibliography  Save this paper

Evaluating End-to-End Entity Linking on Domain-Specific Knowledge Bases: Learning about Ancient Technologies from Museum Collections

Author

Listed:
  • Sebastian Cadavid-Sanchez

    (ECON - Département d'économie (Sciences Po) - Sciences Po - Sciences Po - CNRS - Centre National de la Recherche Scientifique)

  • Khalil Kacem

    (ECON - Département d'économie (Sciences Po) - Sciences Po - Sciences Po - CNRS - Centre National de la Recherche Scientifique)

  • Rafael Aparecido Martins Frade

    (ECON - Département d'économie (Sciences Po) - Sciences Po - Sciences Po - CNRS - Centre National de la Recherche Scientifique)

  • Johannes Boehm

    (ECON - Département d'économie (Sciences Po) - Sciences Po - Sciences Po - CNRS - Centre National de la Recherche Scientifique)

  • Thomas Chaney

    (USC - University of Southern California, ECON - Département d'économie (Sciences Po) - Sciences Po - Sciences Po - CNRS - Centre National de la Recherche Scientifique)

  • Danial Lashkari

    (BC - Boston College)

  • Daniel Simig

Abstract

To study social, economic, and historical questions, researchers in the social sciences and humanities have started to use increasingly large unstructured textual datasets. While recent advances in NLP provide many tools to efficiently process such data, most existing approaches rely on generic solutions whose performance and suitability for domain-specific tasks is not well understood. This work presents an attempt to bridge this domain gap by exploring the use of modern Entity Linking approaches for the enrichment of museum collection data. We collect a dataset comprising of more than 1700 texts annotated with 7,510 mention-entity pairs, evaluate some off-the-shelf solutions in detail using this dataset and finally fine-tune a recent end-to-end EL model on this data. We show that our fine-tuned model significantly outperforms other approaches currently available in this domain and present a proof-of-concept use case of this model. We release our dataset and our best model.

Suggested Citation

  • Sebastian Cadavid-Sanchez & Khalil Kacem & Rafael Aparecido Martins Frade & Johannes Boehm & Thomas Chaney & Danial Lashkari & Daniel Simig, 2023. "Evaluating End-to-End Entity Linking on Domain-Specific Knowledge Bases: Learning about Ancient Technologies from Museum Collections," SciencePo Working papers Main hal-04911147, HAL.
  • Handle: RePEc:hal:spmain:hal-04911147
    DOI: 10.48550/arXiv.2305.14588
    Note: View the original document on HAL open archive server: https://sciencespo.hal.science/hal-04911147v1
    as

    Download full text from publisher

    File URL: https://sciencespo.hal.science/hal-04911147v1/document
    Download Restriction: no

    File URL: https://libkey.io/10.48550/arXiv.2305.14588?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    Other versions of this item:

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:hal:spmain:hal-04911147. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Contact - Sciences Po Departement of Economics (email available below). General contact details of provider: https://hal.archives-ouvertes.fr/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.