IDEAS home Printed from https://ideas.repec.org/a/spr/scient/v117y2018i3d10.1007_s11192-018-2944-y.html
   My bibliography  Save this article

Deep context of citations using machine-learning models in scholarly full-text articles

Author

Listed:
  • Saeed-Ul Hassan

    (Information Technology University)

  • Mubashir Imran

    (Information Technology University)

  • Sehrish Iqbal

    (Information Technology University)

  • Naif Radi Aljohani

    (King Abdulaziz University)

  • Raheel Nawaz

    (Manchester Metropolitan University)

Abstract

Information retrieval systems for scholarly literature rely heavily not only on text matching but on semantic- and context-based features. Readers nowadays are deeply interested in how important an article is, its purpose and how influential it is in follow-up research work. Numerous techniques to tap the power of machine learning and artificial intelligence have been developed to enhance retrieval of the most influential scientific literature. In this paper, we compare and improve on four existing state-of-the-art techniques designed to identify influential citations. We consider 450 citations from the Association for Computational Linguistics corpus, classified by experts as either important or unimportant, and further extract 64 features based on the methodology of four state-of-the-art techniques. We apply the Extra-Trees classifier to select 29 best features and apply the Random Forest and Support Vector Machine classifiers to all selected techniques. Using the Random Forest classifier, our supervised model improves on the state-of-the-art method by 11.25%, with 89% Precision-Recall area under the curve. Finally, we present our deep-learning model, the Long Short-Term Memory network, that uses all 64 features to distinguish important and unimportant citations with 92.57% accuracy.

Suggested Citation

  • Saeed-Ul Hassan & Mubashir Imran & Sehrish Iqbal & Naif Radi Aljohani & Raheel Nawaz, 2018. "Deep context of citations using machine-learning models in scholarly full-text articles," Scientometrics, Springer;Akadémiai Kiadó, vol. 117(3), pages 1645-1662, December.
  • Handle: RePEc:spr:scient:v:117:y:2018:i:3:d:10.1007_s11192-018-2944-y
    DOI: 10.1007/s11192-018-2944-y
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11192-018-2944-y
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11192-018-2944-y?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Alexandru T. Balaban, 2012. "Positive and negative aspects of citation indices and journal impact factors," Scientometrics, Springer;Akadémiai Kiadó, vol. 92(2), pages 241-247, August.
    2. J. E. Hirsch, 2010. "An index to quantify an individual’s scientific research output that takes into account the effect of multiple coauthorship," Scientometrics, Springer;Akadémiai Kiadó, vol. 85(3), pages 741-754, December.
    3. Saeed-Ul Hassan & Iqra Safder & Anam Akram & Faisal Kamiran, 2018. "A novel machine-learning approach to measuring scientific knowledge flows using citation context analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(2), pages 973-996, August.
    4. Laura Auria & Rouslan A. Moro, 2008. "Support Vector Machines (SVM) as a Technique for Solvency Analysis," Discussion Papers of DIW Berlin 811, DIW Berlin, German Institute for Economic Research.
    5. Laurens De Vocht & Selver Softic & Ruben Verborgh & Erik Mannens & Martin Ebner, 2017. "Social Semantic Search: A Case Study on Web 2.0 for Science," International Journal on Semantic Web and Information Systems (IJSWIS), IGI Global, vol. 13(4), pages 155-180, October.
    6. Ying Ding & Guo Zhang & Tamy Chambers & Min Song & Xiaolong Wang & Chengxiang Zhai, 2014. "Content-based citation analysis: The next generation of citation analysis," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 65(9), pages 1820-1833, September.
    7. Leo Egghe, 2006. "Theory and practise of the g-index," Scientometrics, Springer;Akadémiai Kiadó, vol. 69(1), pages 131-152, October.
    8. Yuncheng Jiang & Mingxuan Yang, 2018. "Semantic Search Exploiting Formal Concept Analysis, Rough Sets, and Wikipedia," International Journal on Semantic Web and Information Systems (IJSWIS), IGI Global, vol. 14(3), pages 99-119, July.
    9. Zehra Taşkın & Umut Al, 2018. "A content-based citation analysis study based on text categorization," Scientometrics, Springer;Akadémiai Kiadó, vol. 114(1), pages 335-357, January.
    10. Waltman, Ludo & van Eck, Nees Jan & van Leeuwen, Thed N. & Visser, Martijn S., 2013. "Some modifications to the SNIP journal impact indicator," Journal of Informetrics, Elsevier, vol. 7(2), pages 272-285.
    11. Charles Oppenheim & Susan P. Renn, 1978. "Highly cited old papers and the reasons why they continue to be cited," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 29(5), pages 225-231, September.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Xin An & Xin Sun & Shuo Xu, 2022. "Important citations identification with semi-supervised classification model," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(11), pages 6533-6555, November.
    2. Chaker Jebari & Enrique Herrera-Viedma & Manuel Jesus Cobo, 2023. "Context-aware citation recommendation of scientific papers: comparative study, gaps and trends," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(8), pages 4243-4268, August.
    3. Yuan Zhou & Fang Dong & Yufei Liu & Liang Ran, 2021. "A deep learning framework to early identify emerging technologies in large-scale outlier patents: an empirical study of CNC machine tool," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(2), pages 969-994, February.
    4. Yuan Zhou & Fang Dong & Yufei Liu & Zhaofu Li & JunFei Du & Li Zhang, 2020. "Forecasting emerging technologies using data augmentation and deep learning," Scientometrics, Springer;Akadémiai Kiadó, vol. 123(1), pages 1-29, April.
    5. Zhai, Dongsheng & Zhai, Liang & Li, Mengyang & He, Xijun & Xu, Shuo & Wang, Feifei, 2022. "Patent representation learning with a novel design of patent ontology: Case study on PEM patents," Technological Forecasting and Social Change, Elsevier, vol. 183(C).
    6. Mahira Ahmad & Amina Muazzam & Ambreen Anjum & Anna Visvizi & Raheel Nawaz, 2020. "Linking Work-Family Conflict (WFC) and Talent Management: Insights from a Developing Country," Sustainability, MDPI, vol. 12(7), pages 1-17, April.
    7. Ayesha Ali & Ateeq Ur Rehman & Ahmad Almogren & Elsayed Tag Eldin & Muhammad Kaleem, 2022. "Application of Deep Learning Gated Recurrent Unit in Hybrid Shunt Active Power Filter for Power Quality Enhancement," Energies, MDPI, vol. 15(20), pages 1-21, October.
    8. Iqra Safder & Saeed-Ul Hassan, 2019. "Bibliometric-enhanced information retrieval: a novel deep feature engineering approach for algorithm searching from full-text publications," Scientometrics, Springer;Akadémiai Kiadó, vol. 119(1), pages 257-277, April.
    9. Natinai Jinsakul & Cheng-Fa Tsai & Chia-En Tsai & Pensee Wu, 2019. "Enhancement of Deep Learning in Image Classification Performance Using Xception with the Swish Activation Function for Colorectal Polyp Preliminary Screening," Mathematics, MDPI, vol. 7(12), pages 1-21, December.
    10. Xue Wang & Xuemei Yang & Jian Du & Xuwen Wang & Jiao Li & Xiaoli Tang, 2021. "A deep learning approach for identifying biomedical breakthrough discoveries using context analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(7), pages 5531-5549, July.
    11. Setio Basuki & Masatoshi Tsuchiya, 2022. "SDCF: semi-automatically structured dataset of citation functions," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(8), pages 4569-4608, August.
    12. Dorte Drongstrup & Shafaq Malik & Naif Radi Aljohani & Salem Alelyani & Iqra Safder & Saeed-Ul Hassan, 2020. "Can social media usage of scientific literature predict journal indices of AJG, SNIP and JCR? An altmetric study of economics," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(2), pages 1541-1558, November.
    13. Sehrish Iqbal & Saeed-Ul Hassan & Naif Radi Aljohani & Salem Alelyani & Raheel Nawaz & Lutz Bornmann, 2021. "A decade of in-text citation analysis based on natural language processing and machine learning techniques: an overview of empirical studies," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(8), pages 6551-6599, August.
    14. Chrysoula Zerva & Minh-Quoc Nghiem & Nhung T. H. Nguyen & Sophia Ananiadou, 2020. "Cited text span identification for scientific summarisation using pre-trained encoders," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(3), pages 3109-3137, December.
    15. Naif Radi Aljohani & Ayman Fayoumi & Saeed-Ul Hassan, 2021. "An in-text citation classification predictive model for a scholarly search system," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(7), pages 5509-5529, July.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Wang, Shiyun & Mao, Jin & Lu, Kun & Cao, Yujie & Li, Gang, 2021. "Understanding interdisciplinary knowledge integration through citance analysis: A case study on eHealth," Journal of Informetrics, Elsevier, vol. 15(4).
    2. Naif Radi Aljohani & Ayman Fayoumi & Saeed-Ul Hassan, 2021. "An in-text citation classification predictive model for a scholarly search system," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(7), pages 5509-5529, July.
    3. Waltman, Ludo, 2016. "A review of the literature on citation impact indicators," Journal of Informetrics, Elsevier, vol. 10(2), pages 365-391.
    4. Mingyang Wang & Jiaqi Zhang & Shijia Jiao & Xiangrong Zhang & Na Zhu & Guangsheng Chen, 2020. "Important citation identification by exploiting the syntactic and contextual information of citations," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(3), pages 2109-2129, December.
    5. Dongqing Lyu & Xuanmin Ruan & Juan Xie & Ying Cheng, 2021. "The classification of citing motivations: a meta-synthesis," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(4), pages 3243-3264, April.
    6. Zhang, Chengzhi & Liu, Lifan & Wang, Yuzhuo, 2021. "Characterizing references from different disciplines: A perspective of citation content analysis," Journal of Informetrics, Elsevier, vol. 15(2).
    7. Sehrish Iqbal & Saeed-Ul Hassan & Naif Radi Aljohani & Salem Alelyani & Raheel Nawaz & Lutz Bornmann, 2021. "A decade of in-text citation analysis based on natural language processing and machine learning techniques: an overview of empirical studies," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(8), pages 6551-6599, August.
    8. Deming Lin & Tianhui Gong & Wenbin Liu & Martin Meyer, 2020. "An entropy-based measure for the evolution of h index research," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(3), pages 2283-2298, December.
    9. Ash Mohammad Abbas, 2011. "Weighted indices for evaluating the quality of research with multiple authorship," Scientometrics, Springer;Akadémiai Kiadó, vol. 88(1), pages 107-131, July.
    10. Lathabai, Hiran H., 2020. "ψ-index: A new overall productivity index for actors of science and technology," Journal of Informetrics, Elsevier, vol. 14(4).
    11. Serge Galam, 2011. "Tailor based allocations for multiple authorship: a fractional gh-index," Scientometrics, Springer;Akadémiai Kiadó, vol. 89(1), pages 365-379, October.
    12. Walters, William H., 2014. "Do Article Influence scores overestimate the citation impact of social science journals in subfields that are related to higher-impact natural science disciplines?," Journal of Informetrics, Elsevier, vol. 8(2), pages 421-430.
    13. Tingcan Ma & Gui-Fang Wang & Ke Dong & Mukun Cao, 2012. "The Journal’s Integrated Impact Index: a new indicator for journal evaluation," Scientometrics, Springer;Akadémiai Kiadó, vol. 90(2), pages 649-658, February.
    14. Zhang, Fang & Wu, Shengli, 2020. "Predicting future influence of papers, researchers, and venues in a dynamic academic network," Journal of Informetrics, Elsevier, vol. 14(2).
    15. Heng Huang & Donghua Zhu & Xuefeng Wang, 2022. "Evaluating scientific impact of publications: combining citation polarity and purpose," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(9), pages 5257-5281, September.
    16. Thomas A. Hamrick & Ronald D. Fricker & Gerald G. Brown, 2010. "Assessing What Distinguishes Highly Cited from Less-Cited Papers Published in Interfaces," Interfaces, INFORMS, vol. 40(6), pages 454-464, December.
    17. Liu, John S. & Lu, Louis Y.Y. & Ho, Mei Hsiu-Ching, 2012. "Total influence and mainstream measures for scientific researchers," Journal of Informetrics, Elsevier, vol. 6(4), pages 496-504.
    18. Kai Nishikawa, 2023. "How and why are citations between disciplines made? A citation context analysis focusing on natural sciences and social sciences and humanities," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(5), pages 2975-2997, May.
    19. Wu, Jiang, 2013. "Investigating the universal distributions of normalized indicators and developing field-independent index," Journal of Informetrics, Elsevier, vol. 7(1), pages 63-71.
    20. Fang Zhang & Shengli Wu, 2021. "Measuring academic entities’ impact by content-based citation analysis in a heterogeneous academic network," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(8), pages 7197-7222, August.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:scient:v:117:y:2018:i:3:d:10.1007_s11192-018-2944-y. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.