IDEAS home Printed from https://ideas.repec.org/a/eee/infome/v9y2015i3p455-465.html
   My bibliography  Save this article

Identifying entities from scientific publications: A comparison of vocabulary- and model-based methods

Author

Listed:
  • Yan, Erjia
  • Zhu, Yongjun

Abstract

The objective of this study is to evaluate the performance of five entity extraction methods for the task of identifying entities from scientific publications, including two vocabulary-based methods (a keyword-based and a Wikipedia-based) and three model-based methods (conditional random fields (CRF), CRF with keyword-based dictionary, and CRF with Wikipedia-based dictionary). These methods are applied to an annotated test set of publications in computer science. Precision, recall, accuracy, area under the ROC curve, and area under the precision-recall curve are employed as the evaluative indicators. Results show that the model-based methods outperform the vocabulary-based ones, among which CRF with keyword-based dictionary has the best performance. Between the two vocabulary-based methods, the keyword-based one has a higher recall and the Wikipedia-based one has a higher precision. The findings of this study help inform the understanding of informetric research at a more granular level.

Suggested Citation

  • Yan, Erjia & Zhu, Yongjun, 2015. "Identifying entities from scientific publications: A comparison of vocabulary- and model-based methods," Journal of Informetrics, Elsevier, vol. 9(3), pages 455-465.
  • Handle: RePEc:eee:infome:v:9:y:2015:i:3:p:455-465
    DOI: 10.1016/j.joi.2015.04.003
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S1751157715000474
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.joi.2015.04.003?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Staša Milojević & Cassidy R. Sugimoto & Erjia Yan & Ying Ding, 2011. "The cognitive structure of Library and Information Science: Analysis of article title words," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 62(10), pages 1933-1953, October.
    2. Don R. Swanson & Neil R. Smalheiser & Vetle I. Torvik, 2006. "Ranking indirect connections in literature‐based discovery: The role of medical subject headings," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 57(11), pages 1427-1439, September.
    3. Erjia Yan & Ying Ding & Elin K. Jacob, 2012. "Overlaying communities and topics: an analysis on publication networks," Scientometrics, Springer;Akadémiai Kiadó, vol. 90(2), pages 499-513, February.
    4. Erjia Yan & Ying Ding, 2012. "Scholarly network similarities: How bibliographic coupling networks, citation networks, cocitation networks, topical networks, coauthorship networks, and coword networks relate to each other," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 63(7), pages 1313-1326, July.
    5. Erjia Yan & Ying Ding, 2012. "Scholarly network similarities: How bibliographic coupling networks, citation networks, cocitation networks, topical networks, coauthorship networks, and coword networks relate to each other," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 63(7), pages 1313-1326, July.
    6. Graeme Hirst, 1978. "Discipline impact factors: A method for determining core journal lists," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 29(4), pages 171-172, July.
    7. Staša Milojević & Cassidy R. Sugimoto & Erjia Yan & Ying Ding, 2011. "The cognitive structure of Library and Information Science: Analysis of article title words," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 62(10), pages 1933-1953, October.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Ma, Jing & Abrams, Natalie F. & Porter, Alan L. & Zhu, Donghua & Farrell, Dorothy, 2019. "Identifying translational indicators and technology opportunities for nanomedical research using tech mining: The case of gold nanostructures," Technological Forecasting and Social Change, Elsevier, vol. 146(C), pages 767-775.
    2. Erjia Yan & Chaojiang Wu & Min Song, 2018. "The funding factor: a cross-disciplinary examination of the association between research funding and citation impact," Scientometrics, Springer;Akadémiai Kiadó, vol. 115(1), pages 369-384, April.
    3. Yongjun Zhu & Min Song & Erjia Yan, 2016. "Identifying Liver Cancer and Its Relations with Diseases, Drugs, and Genes: A Literature-Based Approach," PLOS ONE, Public Library of Science, vol. 11(5), pages 1-14, May.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Yang, Siluo & Han, Ruizhen & Wolfram, Dietmar & Zhao, Yuehua, 2016. "Visualizing the intellectual structure of information science (2006–2015): Introducing author keyword coupling analysis," Journal of Informetrics, Elsevier, vol. 10(1), pages 132-150.
    2. Chaoqun Ni & Cassidy R. Sugimoto & Blaise Cronin, 2013. "Visualizing and comparing four facets of scholarly communication: producers, artifacts, concepts, and gatekeepers," Scientometrics, Springer;Akadémiai Kiadó, vol. 94(3), pages 1161-1173, March.
    3. Yuen-Hsien Tseng & Ming-Yueh Tsay, 2013. "Journal clustering of library and information science for subfield delineation using the bibliometric analysis toolkit: CATAR," Scientometrics, Springer;Akadémiai Kiadó, vol. 95(2), pages 503-528, May.
    4. Erjia Yan, 2014. "Topic-based Pagerank: toward a topic-level scientific evaluation," Scientometrics, Springer;Akadémiai Kiadó, vol. 100(2), pages 407-437, August.
    5. María Pinto & Rosaura Fernández-Pascual & David Caballero-Mariscal & Dora Sales, 2020. "Information literacy trends in higher education (2006–2019): visualizing the emerging field of mobile information literacy," Scientometrics, Springer;Akadémiai Kiadó, vol. 124(2), pages 1479-1510, August.
    6. Guan-Can Yang & Gang Li & Chun-Ya Li & Yun-Hua Zhao & Jing Zhang & Tong Liu & Dar-Zen Chen & Mu-Hsuan Huang, 2015. "Using the comprehensive patent citation network (CPC) to evaluate patent value," Scientometrics, Springer;Akadémiai Kiadó, vol. 105(3), pages 1319-1346, December.
    7. An, Lu & Yu, Chuanming & Li, Gang, 2014. "Visual topical analysis of Chinese and American Library and Information Science research institutions," Journal of Informetrics, Elsevier, vol. 8(1), pages 217-233.
    8. Yi Bu & Binglu Wang & Win-bin Huang & Shangkun Che & Yong Huang, 2018. "Using the appearance of citations in full text on author co-citation analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(1), pages 275-289, July.
    9. Yu-Wei Chang & Mu-Hsuan Huang & Chiao-Wen Lin, 2015. "Evolution of research subjects in library and information science based on keyword, bibliographical coupling, and co-citation analyses," Scientometrics, Springer;Akadémiai Kiadó, vol. 105(3), pages 2071-2087, December.
    10. Carlos Olmeda-Gómez & Maria-Antonia Ovalle-Perandones & Antonio Perianes-Rodríguez, 2017. "Co-word analysis and thematic landscapes in Spanish information science literature, 1985–2014," Scientometrics, Springer;Akadémiai Kiadó, vol. 113(1), pages 195-217, October.
    11. Jimi Adams & Ryan Light, 2014. "Mapping Interdisciplinary Fields: Efficiencies, Gaps and Redundancies in HIV/AIDS Research," PLOS ONE, Public Library of Science, vol. 9(12), pages 1-13, December.
    12. Jun-Ping Qiu & Ke Dong & Hou-Qiang Yu, 2014. "Comparative study on structure and correlation among author co-occurrence networks in bibliometrics," Scientometrics, Springer;Akadémiai Kiadó, vol. 101(2), pages 1345-1360, November.
    13. Shesen Guo & Ganzhou Zhang, 2017. "Analyzing concept complexity, knowledge ageing and diffusion pattern of Mooc," Scientometrics, Springer;Akadémiai Kiadó, vol. 112(1), pages 413-430, July.
    14. Sabrina Petersohn & Thomas Heinze, 2018. "Professionalization of bibliometric research assessment. Insights from the history of the Leiden Centre for Science and Technology Studies (CWTS)," Science and Public Policy, Oxford University Press, vol. 45(4), pages 565-578.
    15. Hao Wang & Sanhong Deng & Xinning Su, 2016. "A study on construction and analysis of discipline knowledge structure of Chinese LIS based on CSSCI," Scientometrics, Springer;Akadémiai Kiadó, vol. 109(3), pages 1725-1759, December.
    16. Dorsa Alipour & Hussein Dia, 2023. "A Systematic Review of the Role of Land Use, Transport, and Energy-Environment Integration in Shaping Sustainable Cities," Sustainability, MDPI, vol. 15(8), pages 1-29, April.
    17. Lin, Yiling & Evans, James A. & Wu, Lingfei, 2022. "New directions in science emerge from disconnection and discord," Journal of Informetrics, Elsevier, vol. 16(1).
    18. Chakresh Kumar Singh & Demival Vasques Filho & Shivakumar Jolad & Dion R. J. O’Neale, 2020. "Evolution of interdependent co-authorship and citation networks," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(1), pages 385-404, October.
    19. A. Abrizah & A. Noorhidawati & A. N. Zainab, 2015. "LIS journals categorization in the Journal Citation Report: a stated preference study," Scientometrics, Springer;Akadémiai Kiadó, vol. 102(2), pages 1083-1099, February.
    20. van der Have, Robert P. & Rubalcaba, Luis, 2016. "Social innovation research: An emerging area of innovation studies?," Research Policy, Elsevier, vol. 45(9), pages 1923-1935.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:infome:v:9:y:2015:i:3:p:455-465. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/joi .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.