IDEAS home Printed from https://ideas.repec.org/a/spr/scient/v120y2019i3d10.1007_s11192-019-03167-z.html
   My bibliography  Save this article

Cited text spans identification with an improved balanced ensemble model

Author

Listed:
  • Pancheng Wang

    (National University of Defense Technology)

  • Shasha Li

    (National University of Defense Technology)

  • Haifang Zhou

    (National University of Defense Technology)

  • Jintao Tang

    (National University of Defense Technology)

  • Ting Wang

    (National University of Defense Technology)

Abstract

Scientific summarization aims to provide condensed summary of important contributions of scientific papers. This problem has been extensively explored and recent interest has been aroused to taking advantage of the cited text spans to generate summaries. Cited text spans are the texts in the cited paper that most accurately reflect the citation. They can be viewed as important aspects of the cited paper which are annotated by academic community. Hence, identifying cited text spans is of vital importance for providing a different scientific summarization. In this paper, we explore three potential improvements towards our previous work which is a two-layer ensemble model to tackle the cited text spans identification problem. We first view cited text spans identification as an imbalanced classification problem and carry out comparison on preprocessing methods to handle the imbalanced dataset. Then we propose RANdom Sampling Aggregating (RANSA) algorithm to train classifiers in the first ensemble layer model. Finally, an improved stacking framework Hybrid-Stacking is applied to combine the models of the first layer. Our new ensemble model overcomes flaws of the previous work, and shows improved performance on cited text spans identification.

Suggested Citation

  • Pancheng Wang & Shasha Li & Haifang Zhou & Jintao Tang & Ting Wang, 2019. "Cited text spans identification with an improved balanced ensemble model," Scientometrics, Springer;Akadémiai Kiadó, vol. 120(3), pages 1111-1145, September.
  • Handle: RePEc:spr:scient:v:120:y:2019:i:3:d:10.1007_s11192-019-03167-z
    DOI: 10.1007/s11192-019-03167-z
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11192-019-03167-z
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11192-019-03167-z?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Qing Cheng & Xin Lu & Zhong Liu & Jincai Huang, 2015. "Mining research trends with anomaly detection models: the case of social computing research," Scientometrics, Springer;Akadémiai Kiadó, vol. 103(2), pages 453-469, May.
    2. Shutian Ma & Jin Xu & Chengzhi Zhang, 2018. "Automatic identification of cited text spans: a multi-classifier approach over imbalanced dataset," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(2), pages 1303-1330, August.
    3. Unknown, 2016. "Proceedings Of Abstracts," 152nd Seminar, August 30 - September 1, 2016, Novi Sad, Serbia 244068, European Association of Agricultural Economists.
    4. Qiang Yang & Xindong Wu, 2006. "10 Challenging Problems In Data Mining Research," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 5(04), pages 597-604.
    5. Dragomir R. Radev & Mark Thomas Joseph & Bryan Gibson & Pradeep Muthukrishnan, 2016. "A bibliometric and network analysis of the field of computational linguistics," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 67(3), pages 683-706, March.
    6. Aaron Elkiss & Siwei Shen & Anthony Fader & Güneş Erkan & David States & Dragomir Radev, 2008. "Blind men and elephants: What do citation summaries tell us about a research article?," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 59(1), pages 51-62, January.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Li Zhang & Ming Liu & Bo Wang & Bo Lang & Peng Yang, 2021. "Discovering communities based on mention distance," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(3), pages 1945-1967, March.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Wang, Shiyun & Mao, Jin & Lu, Kun & Cao, Yujie & Li, Gang, 2021. "Understanding interdisciplinary knowledge integration through citance analysis: A case study on eHealth," Journal of Informetrics, Elsevier, vol. 15(4).
    2. Kokil Jaidka & Christopher S. G. Khoo & Jin-Cheon Na, 2019. "Characterizing human summarization strategies for text reuse and transformation in literature review writing," Scientometrics, Springer;Akadémiai Kiadó, vol. 121(3), pages 1563-1582, December.
    3. Samaneh Karimi & Luis Moraes & Avisha Das & Azadeh Shakery & Rakesh Verma, 2018. "Citance-based retrieval and summarization using IR and machine learning," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(2), pages 1331-1366, August.
    4. Sehrish Iqbal & Saeed-Ul Hassan & Naif Radi Aljohani & Salem Alelyani & Raheel Nawaz & Lutz Bornmann, 2021. "A decade of in-text citation analysis based on natural language processing and machine learning techniques: an overview of empirical studies," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(8), pages 6551-6599, August.
    5. Rey-Long Liu, 2017. "A new bibliographic coupling measure with descriptive capability," Scientometrics, Springer;Akadémiai Kiadó, vol. 110(2), pages 915-935, February.
    6. Ana Gouveia & Sílvia Santos & Inês Gonçalves, 2017. "The short-term impact of structural reforms on productivity growth: beyond direct effects," GEE Papers 0065, Gabinete de Estratégia e Estudos, Ministério da Economia, revised Feb 2017.
    7. Masaki Eto, 2013. "Evaluations of context-based co-citation searching," Scientometrics, Springer;Akadémiai Kiadó, vol. 94(2), pages 651-673, February.
    8. DE CNUDDE, Sofie & MARTENS, David & EVGENIOU, Theodoros & PROVOST, Foster, 2017. "A benchmarking study of classification techniques for behavioral data," Working Papers 2017005, University of Antwerp, Faculty of Business and Economics.
    9. Wen Gao & Xinhong Hei & Yichuan Wang, 2023. "The Data Privacy Protection Method for Hyperledger Fabric Based on Trustzone," Mathematics, MDPI, vol. 11(6), pages 1-16, March.
    10. Kai Lu & Alireza Khani & Baoming Han, 2018. "A Trip Purpose-Based Data-Driven Alighting Station Choice Model Using Transit Smart Card Data," Complexity, Hindawi, vol. 2018, pages 1-14, August.
    11. Michel Zitt, 2015. "Meso-level retrieval: IR-bibliometrics interplay and hybrid citation-words methods in scientific fields delineation," Scientometrics, Springer;Akadémiai Kiadó, vol. 102(3), pages 2223-2245, March.
    12. Naif Radi Aljohani & Ayman Fayoumi & Saeed-Ul Hassan, 2021. "An in-text citation classification predictive model for a scholarly search system," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(7), pages 5509-5529, July.
    13. Dan Andrews & Filippos Petroulakis, 2017. "Breaking the Shackles: Zombie Firms, Weak Banks and Depressed Restructuring in Europe," OECD Economics Department Working Papers 1433, OECD Publishing.
    14. Annarelli, Alessandro & Battistella, Cinzia & Nonino, Fabio & Parida, Vinit & Pessot, Elena, 2021. "Literature review on digitalization capabilities: Co-citation analysis of antecedents, conceptualization and consequences," Technological Forecasting and Social Change, Elsevier, vol. 166(C).
    15. Kiran Sharma, 2021. "Team size and retracted citations reveal the patterns of retractions from 1981 to 2020," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(10), pages 8363-8374, October.
    16. Liao, Jui-Jung & Shih, Ching-Hui & Chen, Tai-Feng & Hsu, Ming-Fu, 2014. "An ensemble-based model for two-class imbalanced financial problem," Economic Modelling, Elsevier, vol. 37(C), pages 175-183.
    17. OKADA Yoshimi & NAITO Yusuke & NAGAOKA Sadao, 2016. "Contribution of Patent Examination to Making the Patent Scope Consistent with the Invention: Evidence from Japan," Discussion papers 16092, Research Institute of Economy, Trade and Industry (RIETI).
    18. Mariam Camarero & Jesús Peiró-Palomino & Cecilio Tamarit, 2017. "External imbalances and growth," Working Papers 2017/02, Economics Department, Universitat Jaume I, Castellón (Spain).
    19. Beril T. Arik & Engin Arik, 2017. "“Second Language Writing” Publications in Web of Science: A Bibliometric Analysis," Publications, MDPI, vol. 5(1), pages 1-12, March.
    20. Steff De Visscher & Markus Eberhardt & Gerdie Everaert, 2017. "Measuring productivity and absorptive capacity evolution," Discussion Papers 2017-11, University of Nottingham, GEP.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:scient:v:120:y:2019:i:3:d:10.1007_s11192-019-03167-z. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.