IDEAS home Printed from https://ideas.repec.org/a/spr/scient/v114y2018i3d10.1007_s11192-017-2611-8.html
   My bibliography  Save this article

A novel multiple layers name disambiguation framework for digital libraries using dynamic clustering

Author

Listed:
  • Jia Zhu

    (South China Normal University)

  • Xingcheng Wu

    (South China Normal University)

  • Xueqin Lin

    (South China Normal University)

  • Changqin Huang

    (South China Normal University)

  • Gabriel Pui Cheong Fung

    (The Chinese University of Hong Kong)

  • Yong Tang

    (South China Normal University)

Abstract

In many types of databases, such as a science bibliography database, the name attribute is the most commonly used identifier to recognize entities. However, names are frequently ambiguous and not always unique, thereby causing problems in various fields. Name disambiguation is a data management task that aims to properly distinguish different entities that share the same name, particularly for large databases such as digital libraries, because the information that can be used to identify author’s name is limited. In digital libraries, the issue of ambiguous author names occurs due to the existence of multiple authors with the same name or different name variations for the same author. Most previous works conducted to solve this issue frequently used hierarchical clustering approaches based on information within citation records, e.g., co-authors and publication titles. In the present study, we propose a multiple layers name disambiguation framework that is not only applicable to digital libraries but can also be easily extended to other applications. Our framework adopts a dynamic clustering mechanism to minimize clustering errors. We evaluated our approach on real world corpora, and favorable experiment results indicated that our proposed framework was feasible.

Suggested Citation

  • Jia Zhu & Xingcheng Wu & Xueqin Lin & Changqin Huang & Gabriel Pui Cheong Fung & Yong Tang, 2018. "A novel multiple layers name disambiguation framework for digital libraries using dynamic clustering," Scientometrics, Springer;Akadémiai Kiadó, vol. 114(3), pages 781-794, March.
  • Handle: RePEc:spr:scient:v:114:y:2018:i:3:d:10.1007_s11192-017-2611-8
    DOI: 10.1007/s11192-017-2611-8
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11192-017-2611-8
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11192-017-2611-8?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Yu Liu & Weijia Li & Zhen Huang & Qiang Fang, 2015. "A fast method based on multiple clustering for name disambiguation in bibliographic citations," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 66(3), pages 634-644, March.
    2. Dongwook Shin & Taehwan Kim & Joongmin Choi & Jungsun Kim, 2014. "Author name disambiguation using a graph model with node splitting and merging based on bibliographic information," Scientometrics, Springer;Akadémiai Kiadó, vol. 100(1), pages 15-50, July.
    3. Jiang Wu & Xiu-Hao Ding, 2013. "Author name disambiguation in scientific collaboration and mobility cases," Scientometrics, Springer;Akadémiai Kiadó, vol. 96(3), pages 683-697, September.
    4. Jia Zhu & Yi Yang & Qing Xie & Liwei Wang & Saeed-Ul Hassan, 2014. "Robust hybrid name disambiguation framework for large databases," Scientometrics, Springer;Akadémiai Kiadó, vol. 98(3), pages 2255-2274, March.
    5. Diego R. Amancio & Osvaldo N. Oliveira jr & Luciano F. Costa, 2015. "Topological-collaborative approach for disambiguating authors’ names in collaborative networks," Scientometrics, Springer;Akadémiai Kiadó, vol. 102(1), pages 465-485, January.
    6. Gabor J. Szekely & Maria L. Rizzo, 2005. "Hierarchical Clustering via Joint Between-Within Distances: Extending Ward's Minimum Variance Method," Journal of Classification, Springer;The Classification Society, vol. 22(2), pages 151-183, September.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Ali Tosyali & Behnam Tavakkol, 2024. "A node-based index for clustering validation of graph data," Annals of Operations Research, Springer, vol. 341(1), pages 197-221, October.
    2. Humaira Waqas & Abdul Qadir, 2022. "Completing features for author name disambiguation (AND): an empirical analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(2), pages 1039-1063, February.
    3. Jinseok Kim, 2019. "A fast and integrative algorithm for clustering performance evaluation in author name disambiguation," Scientometrics, Springer;Akadémiai Kiadó, vol. 120(2), pages 661-681, August.
    4. Humaira Waqas & Muhammad Abdul Qadir, 2021. "Multilayer heuristics based clustering framework (MHCF) for author name disambiguation," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(9), pages 7637-7678, September.
    5. Jinseok Kim & Jenna Kim & Jason Owen‐Smith, 2021. "Ethnicity‐based name partitioning for author name disambiguation using supervised machine learning," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 72(8), pages 979-994, August.
    6. Jinseok Kim & Jenna Kim, 2020. "Effect of forename string on author name disambiguation," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 71(7), pages 839-855, July.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Andrea Ancona & Roy Cerqueti & Gianluca Vagnani, 2023. "A novel methodology to disambiguate organization names: an application to EU Framework Programmes data," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(8), pages 4447-4474, August.
    2. Jan Schulz, 2016. "Using Monte Carlo simulations to assess the impact of author name disambiguation quality on different bibliometric analyses," Scientometrics, Springer;Akadémiai Kiadó, vol. 107(3), pages 1283-1298, June.
    3. Jinseok Kim & Jenna Kim, 2020. "Effect of forename string on author name disambiguation," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 71(7), pages 839-855, July.
    4. Jinseok Kim, 2019. "A fast and integrative algorithm for clustering performance evaluation in author name disambiguation," Scientometrics, Springer;Akadémiai Kiadó, vol. 120(2), pages 661-681, August.
    5. KM. Pooja & Samrat Mondal & Joydeep Chandra, 2021. "Exploiting similarities across multiple dimensions for author name disambiguation," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(9), pages 7525-7560, September.
    6. Jinseok Kim & Jenna Kim & Jason Owen‐Smith, 2021. "Ethnicity‐based name partitioning for author name disambiguation using supervised machine learning," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 72(8), pages 979-994, August.
    7. Humaira Waqas & Muhammad Abdul Qadir, 2021. "Multilayer heuristics based clustering framework (MHCF) for author name disambiguation," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(9), pages 7637-7678, September.
    8. Anne-Wil Harzing, 2015. "Health warning: might contain multiple personalities—the problem of homonyms in Thomson Reuters Essential Science Indicators," Scientometrics, Springer;Akadémiai Kiadó, vol. 105(3), pages 2259-2270, December.
    9. Deyun Yin & Kazuyuki Motohashi & Jianwei Dang, 2020. "Large-scale name disambiguation of Chinese patent inventors (1985–2016)," Scientometrics, Springer;Akadémiai Kiadó, vol. 122(2), pages 765-790, February.
    10. Jelena Smiljanić & Arnab Chatterjee & Tomi Kauppinen & Marija Mitrović Dankulov, 2016. "A Theoretical Model for the Associative Nature of Conference Participation," PLOS ONE, Public Library of Science, vol. 11(2), pages 1-12, February.
    11. Kim, Jinseok & Diesner, Jana, 2015. "The effect of data pre-processing on understanding the evolution of collaboration networks," Journal of Informetrics, Elsevier, vol. 9(1), pages 226-236.
    12. Zdeňka Náglová & Tereza Horáková, 2017. "Position of the Bakery Enterprises in the Czech Republic According to Detailed Specification of the Businesses," Acta Universitatis Agriculturae et Silviculturae Mendelianae Brunensis, Mendel University Press, vol. 65(5), pages 1719-1727.
    13. Jinseok Kim & Jinmo Kim & Jason Owen-Smith, 2019. "Generating automatically labeled data for author name disambiguation: an iterative clustering method," Scientometrics, Springer;Akadémiai Kiadó, vol. 118(1), pages 253-280, January.
    14. Omar Hernando Avila-Poveda, 2014. "Technical report: the trend of author compound names and its implications for authorship identity identification," Scientometrics, Springer;Akadémiai Kiadó, vol. 101(1), pages 833-846, October.
    15. Quessy, Jean-François, 2021. "A Szekely–Rizzo inequality for testing general copula homogeneity hypotheses," Journal of Multivariate Analysis, Elsevier, vol. 186(C).
    16. Fionn Murtagh & Pierre Legendre, 2014. "Ward’s Hierarchical Agglomerative Clustering Method: Which Algorithms Implement Ward’s Criterion?," Journal of Classification, Springer;The Classification Society, vol. 31(3), pages 274-295, October.
    17. Zdeněk Hlávka & Marie Hušková & Simos G. Meintanis, 2020. "Change-point methods for multivariate time-series: paired vectorial observations," Statistical Papers, Springer, vol. 61(4), pages 1351-1383, August.
    18. Brault, Vincent & Ouadah, Sarah & Sansonnet, Laure & Lévy-Leduc, Céline, 2018. "Nonparametric multiple change-point estimation for analyzing large Hi-C data matrices," Journal of Multivariate Analysis, Elsevier, vol. 165(C), pages 143-165.
    19. Wang, Zhiqi & Chen, Yue & Glänzel, Wolfgang, 2020. "Preprints as accelerator of scholarly communication: An empirical analysis in Mathematics," Journal of Informetrics, Elsevier, vol. 14(4).
    20. Athanasios Constantopoulos & John Yfantopoulos & Panos Xenos & Athanassios Vozikis, 2019. "Cluster shifts based on healthcare factors: The case of Greece in an OECD background 2009-2014," Advances in Management and Applied Economics, SCIENPRESS Ltd, vol. 9(6), pages 1-4.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:scient:v:114:y:2018:i:3:d:10.1007_s11192-017-2611-8. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.