IDEAS home Printed from https://ideas.repec.org/a/spr/scient/v129y2024i11d10.1007_s11192-023-04817-z.html
   My bibliography  Save this article

2SCE-4SL: a 2-stage causality extraction framework for scientific literature

Author

Listed:
  • Yujie Zhang

    (Nanjing University
    Nanjing University)

  • Rujiang Bai

    (Shandong University of Technology)

  • Ling Kong

    (Nanjing Agricultural University)

  • Xiaoyue Wang

    (Shandong University of Technology)

Abstract

Extracting causality from scientific literature is a crucial task that underpins many downstream knowledge-driven applications. To this end, this paper presents a novel causality extraction framework for scientific literature, called 2-Stage Causality Extraction for Scientific Literature (2SCE-4SL). The framework consists of two stages: in the stage 1, terms and causal trigger words are identified from causal sentences in the literature, and noisy causal triplets are then collocated. In the stage 2, we propose a Denoising AutoEncoder based on Transformer to represent the causal sentences. This approach is used to learn the causal dependency and contextual information of sentences, incorporating causal trigger word tagging and noise elimination, as well as injecting domain-specific knowledge. By combining the causality structure of stage 1 and the causality representation of stage 2, the true causal triplets are identified from the noisy causal triplets. We conducted experiments on an open access scientific literature dataset, comparing the performance of different disciplines, different training data volume, different document length and whether causality representation. We found that the average precision of 2SCE-4SL was 0.8146, and the average F1 was 0.8308, with the best performance achieved on full-text data. We also verified the effectiveness of the causality representation in stage 2, demonstrating that the architecture can capture the causal dependency of sentences and achieve good performance on two related tasks. Overall, detailed comparative and ablation experiments revealed that 2SCE-4SL requires only a small amount of annotated data to achieve better performance and domain adaptability in scientific literature.

Suggested Citation

  • Yujie Zhang & Rujiang Bai & Ling Kong & Xiaoyue Wang, 2024. "2SCE-4SL: a 2-stage causality extraction framework for scientific literature," Scientometrics, Springer;Akadémiai Kiadó, vol. 129(11), pages 7175-7195, November.
  • Handle: RePEc:spr:scient:v:129:y:2024:i:11:d:10.1007_s11192-023-04817-z
    DOI: 10.1007/s11192-023-04817-z
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11192-023-04817-z
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11192-023-04817-z?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:scient:v:129:y:2024:i:11:d:10.1007_s11192-023-04817-z. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.