IDEAS home Printed from https://ideas.repec.org/a/spr/scient/v129y2024i11d10.1007_s11192-023-04817-z.html
   My bibliography  Save this article

2SCE-4SL: a 2-stage causality extraction framework for scientific literature

Author

Listed:
  • Yujie Zhang

    (Nanjing University
    Nanjing University)

  • Rujiang Bai

    (Shandong University of Technology)

  • Ling Kong

    (Nanjing Agricultural University)

  • Xiaoyue Wang

    (Shandong University of Technology)

Abstract

Extracting causality from scientific literature is a crucial task that underpins many downstream knowledge-driven applications. To this end, this paper presents a novel causality extraction framework for scientific literature, called 2-Stage Causality Extraction for Scientific Literature (2SCE-4SL). The framework consists of two stages: in the stage 1, terms and causal trigger words are identified from causal sentences in the literature, and noisy causal triplets are then collocated. In the stage 2, we propose a Denoising AutoEncoder based on Transformer to represent the causal sentences. This approach is used to learn the causal dependency and contextual information of sentences, incorporating causal trigger word tagging and noise elimination, as well as injecting domain-specific knowledge. By combining the causality structure of stage 1 and the causality representation of stage 2, the true causal triplets are identified from the noisy causal triplets. We conducted experiments on an open access scientific literature dataset, comparing the performance of different disciplines, different training data volume, different document length and whether causality representation. We found that the average precision of 2SCE-4SL was 0.8146, and the average F1 was 0.8308, with the best performance achieved on full-text data. We also verified the effectiveness of the causality representation in stage 2, demonstrating that the architecture can capture the causal dependency of sentences and achieve good performance on two related tasks. Overall, detailed comparative and ablation experiments revealed that 2SCE-4SL requires only a small amount of annotated data to achieve better performance and domain adaptability in scientific literature.

Suggested Citation

  • Yujie Zhang & Rujiang Bai & Ling Kong & Xiaoyue Wang, 2024. "2SCE-4SL: a 2-stage causality extraction framework for scientific literature," Scientometrics, Springer;Akadémiai Kiadó, vol. 129(11), pages 7175-7195, November.
  • Handle: RePEc:spr:scient:v:129:y:2024:i:11:d:10.1007_s11192-023-04817-z
    DOI: 10.1007/s11192-023-04817-z
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11192-023-04817-z
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11192-023-04817-z?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Xiaoling Sun & Kun Ding, 2018. "Identifying and tracking scientific and technological knowledge memes from citation networks of publications and patents," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(3), pages 1735-1748, September.
    2. Lutz Bornmann & Rüdiger Mutz, 2015. "Growth rates of modern science: A bibliometric analysis based on the number of publications and cited references," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 66(11), pages 2215-2222, November.
    3. Vahe Tshitoyan & John Dagdelen & Leigh Weston & Alexander Dunn & Ziqin Rong & Olga Kononova & Kristin A. Persson & Gerbrand Ceder & Anubhav Jain, 2019. "Unsupervised word embeddings capture latent knowledge from materials science literature," Nature, Nature, vol. 571(7763), pages 95-98, July.
    4. Yuzhuo Wang & Chengzhi Zhang & Kai Li, 2022. "A review on method entities in the academic literature: extraction, evaluation, and application," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(5), pages 2479-2520, May.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Chengzhi Zhang & Philipp Mayr & Wei Lu & Yi Zhang, 2024. "An editorial note on extraction and evaluation of knowledge entities from scientific documents," Scientometrics, Springer;Akadémiai Kiadó, vol. 129(11), pages 7169-7174, November.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Martín de Diego, Isaac & González-Fernández, César & Fernández-Isabel, Alberto & Fernández, Rubén R. & Cabezas, Javier, 2021. "System for evaluating the reliability and novelty of medical scientific papers," Journal of Informetrics, Elsevier, vol. 15(4).
    2. Mohammed Azmi Al-Betar & Ammar Kamal Abasi & Ghazi Al-Naymat & Kamran Arshad & Sharif Naser Makhadmeh, 2023. "Optimization of scientific publications clustering with ensemble approach for topic extraction," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(5), pages 2819-2877, May.
    3. Jia-Min Lu & Hui-Feng Wang & Qi-Hang Guo & Jian-Wei Wang & Tong-Tong Li & Ke-Xin Chen & Meng-Ting Zhang & Jian-Bo Chen & Qian-Nuan Shi & Yi Huang & Shao-Wen Shi & Guang-Yong Chen & Jian-Zhang Pan & Zh, 2024. "Roboticized AI-assisted microfluidic photocatalytic synthesis and screening up to 10,000 reactions per day," Nature Communications, Nature, vol. 15(1), pages 1-13, December.
    4. Ananthan Nambiar & Tobias Rubel & James McCaull & Jon deVries & Mark Bedau, 2021. "Dropping diversity of products of large US firms: Models and measures," Papers 2110.08367, arXiv.org.
    5. Ramona Weinrich, 2019. "Opportunities for the Adoption of Health-Based Sustainable Dietary Patterns: A Review on Consumer Research of Meat Substitutes," Sustainability, MDPI, vol. 11(15), pages 1-15, July.
    6. Jason Youn & Navneet Rai & Ilias Tagkopoulos, 2022. "Knowledge integration and decision support for accelerated discovery of antibiotic resistance genes," Nature Communications, Nature, vol. 13(1), pages 1-11, December.
    7. Piers Steel & Sjoerd Beugelsdijk & Herman Aguinis, 2021. "The anatomy of an award-winning meta-analysis: Recommendations for authors, reviewers, and readers of meta-analytic reviews," Journal of International Business Studies, Palgrave Macmillan;Academy of International Business, vol. 52(1), pages 23-44, February.
    8. Dunaiski, Marcel & Geldenhuys, Jaco & Visser, Willem, 2019. "On the interplay between normalisation, bias, and performance of paper impact metrics," Journal of Informetrics, Elsevier, vol. 13(1), pages 270-290.
    9. Augusteijn, Hilde Elisabeth Maria & van Aert, Robbie Cornelis Maria & van Assen, Marcel A. L. M., 2021. "Posterior Probabilities of Effect Sizes and Heterogeneity in Meta-Analysis: An Intuitive Approach of Dealing with Publication Bias," OSF Preprints avkgj, Center for Open Science.
    10. Ruhua Huang & Yuting Huang & Fan Qi & Leyi Shi & Baiyang Li & Wei Yu, 2022. "Exploring the characteristics of special issues: distribution, topicality, and citation impact," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(9), pages 5233-5256, September.
    11. Neal R. Haddaway & Max W. Callaghan & Alexandra M. Collins & William F. Lamb & Jan C. Minx & James Thomas & Denny John, 2020. "On the use of computer‐assistance to facilitate systematic mapping," Campbell Systematic Reviews, John Wiley & Sons, vol. 16(4), December.
    12. Wu, Lingfei & Kittur, Aniket & Youn, Hyejin & Milojević, Staša & Leahey, Erin & Fiore, Stephen M. & Ahn, Yong-Yeol, 2022. "Metrics and mechanisms: Measuring the unmeasurable in the science of science," Journal of Informetrics, Elsevier, vol. 16(2).
    13. Vincent Raoult, 2020. "How Many Papers Should Scientists Be Reviewing? An Analysis Using Verified Peer Review Reports," Publications, MDPI, vol. 8(1), pages 1-9, January.
    14. Eloy López-Meneses & Esteban Vázquez-Cano & Mariana-Daniela González-Zamar & Emilio Abad-Segura, 2020. "Socioeconomic Effects in Cyberbullying: Global Research Trends in the Educational Context," IJERPH, MDPI, vol. 17(12), pages 1-31, June.
    15. Gordana Ispirova & Tome Eftimov & Barbara Koroušić Seljak, 2020. "P-NUT: Predicting NUTrient Content from Short Text Descriptions," Mathematics, MDPI, vol. 8(10), pages 1-21, October.
    16. Dongin Nam & Jiwon Kim & Jeeyoung Yoon & Chaemin Song & Seongdeok Kim & Min Song, 2024. "Examining knowledge entities and its relationships based on citation sentences using a multi-anchor bipartite network," Scientometrics, Springer;Akadémiai Kiadó, vol. 129(11), pages 7197-7228, November.
    17. Jinseok Kim & Jinmo Kim & Jason Owen-Smith, 2019. "Generating automatically labeled data for author name disambiguation: an iterative clustering method," Scientometrics, Springer;Akadémiai Kiadó, vol. 118(1), pages 253-280, January.
    18. Lu Huang & Xiang Chen & Yi Zhang & Changtian Wang & Xiaoli Cao & Jiarun Liu, 2022. "Identification of topic evolution: network analytics with piecewise linear representation and word embedding," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(9), pages 5353-5383, September.
    19. Soo Jeung Lee & Christian Schneijderberg & Yangson Kim & Isabel Steinhardt, 2021. "Have Academics’ Citation Patterns Changed in Response to the Rise of World University Rankings? A Test Using First-Citation Speeds," Sustainability, MDPI, vol. 13(17), pages 1-19, August.
    20. Lutz Bornmann & Robin Haunschild & Rüdiger Mutz, 2021. "Growth rates of modern science: a latent piecewise growth curve approach to model publication numbers from established and new literature databases," Palgrave Communications, Palgrave Macmillan, vol. 8(1), pages 1-15, December.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:scient:v:129:y:2024:i:11:d:10.1007_s11192-023-04817-z. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.