IDEAS home Printed from https://ideas.repec.org/a/spr/scient/v98y2014i3d10.1007_s11192-013-1151-0.html
   My bibliography  Save this article

Robust hybrid name disambiguation framework for large databases

Author

Listed:
  • Jia Zhu

    (South China Normal University)

  • Yi Yang

    (Carnegie Mellon University)

  • Qing Xie

    (King Abdullah University of Science and Technology)

  • Liwei Wang

    (Wuhan University)

  • Saeed-Ul Hassan

    (COMSATS Institute of Information Technology)

Abstract

In many databases, science bibliography database for example, name attribute is the most commonly chosen identifier to identify entities. However, names are often ambiguous and not always unique which cause problems in many fields. Name disambiguation is a non-trivial task in data management that aims to properly distinguish different entities which share the same name, particularly for large databases like digital libraries, as only limited information can be used to identify authors’ name. In digital libraries, ambiguous author names occur due to the existence of multiple authors with the same name or different name variations for the same person. Also known as name disambiguation, most of the previous works to solve this issue often employ hierarchical clustering approaches based on information inside the citation records, e.g. co-authors and publication titles. In this paper, we focus on proposing a robust hybrid name disambiguation framework that is not only applicable for digital libraries but also can be easily extended to other application based on different data sources. We propose a web pages genre identification component to identify the genre of a web page, e.g. whether the page is a personal homepage. In addition, we propose a re-clustering model based on multidimensional scaling that can further improve the performance of name disambiguation. We evaluated our approach on known corpora, and the favorable experiment results indicated that our proposed framework is feasible.

Suggested Citation

  • Jia Zhu & Yi Yang & Qing Xie & Liwei Wang & Saeed-Ul Hassan, 2014. "Robust hybrid name disambiguation framework for large databases," Scientometrics, Springer;Akadémiai Kiadó, vol. 98(3), pages 2255-2274, March.
  • Handle: RePEc:spr:scient:v:98:y:2014:i:3:d:10.1007_s11192-013-1151-0
    DOI: 10.1007/s11192-013-1151-0
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11192-013-1151-0
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11192-013-1151-0?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Jiang Wu & Xiu-Hao Ding, 2013. "Author name disambiguation in scientific collaboration and mobility cases," Scientometrics, Springer;Akadémiai Kiadó, vol. 96(3), pages 683-697, September.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Jia Zhu & Xingcheng Wu & Xueqin Lin & Changqin Huang & Gabriel Pui Cheong Fung & Yong Tang, 2018. "A novel multiple layers name disambiguation framework for digital libraries using dynamic clustering," Scientometrics, Springer;Akadémiai Kiadó, vol. 114(3), pages 781-794, March.
    2. Alexander Karlsson & Björn Hammarfelt & H. Joe Steinhauer & Göran Falkman & Nasrine Olson & Gustaf Nelhans & Jan Nolin, 2015. "Modeling uncertainty in bibliometrics and information retrieval: an information fusion approach," Scientometrics, Springer;Akadémiai Kiadó, vol. 102(3), pages 2255-2274, March.
    3. Anne-Wil Harzing, 2015. "Health warning: might contain multiple personalities—the problem of homonyms in Thomson Reuters Essential Science Indicators," Scientometrics, Springer;Akadémiai Kiadó, vol. 105(3), pages 2259-2270, December.
    4. Jan Schulz, 2016. "Using Monte Carlo simulations to assess the impact of author name disambiguation quality on different bibliometric analyses," Scientometrics, Springer;Akadémiai Kiadó, vol. 107(3), pages 1283-1298, June.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jia Zhu & Xingcheng Wu & Xueqin Lin & Changqin Huang & Gabriel Pui Cheong Fung & Yong Tang, 2018. "A novel multiple layers name disambiguation framework for digital libraries using dynamic clustering," Scientometrics, Springer;Akadémiai Kiadó, vol. 114(3), pages 781-794, March.
    2. Jelena Smiljanić & Arnab Chatterjee & Tomi Kauppinen & Marija Mitrović Dankulov, 2016. "A Theoretical Model for the Associative Nature of Conference Participation," PLOS ONE, Public Library of Science, vol. 11(2), pages 1-12, February.
    3. Andrea Ancona & Roy Cerqueti & Gianluca Vagnani, 2023. "A novel methodology to disambiguate organization names: an application to EU Framework Programmes data," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(8), pages 4447-4474, August.
    4. Omar Hernando Avila-Poveda, 2014. "Technical report: the trend of author compound names and its implications for authorship identity identification," Scientometrics, Springer;Akadémiai Kiadó, vol. 101(1), pages 833-846, October.
    5. Wang, Zhiqi & Chen, Yue & Glänzel, Wolfgang, 2020. "Preprints as accelerator of scholarly communication: An empirical analysis in Mathematics," Journal of Informetrics, Elsevier, vol. 14(4).
    6. Jiang Wu & Miao Jin & Xiu-Hao Ding, 2015. "Diversity of individual research disciplines in scientific funding," Scientometrics, Springer;Akadémiai Kiadó, vol. 103(2), pages 669-686, May.
    7. Jinseok Kim & Jenna Kim & Jason Owen‐Smith, 2021. "Ethnicity‐based name partitioning for author name disambiguation using supervised machine learning," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 72(8), pages 979-994, August.
    8. Jan Schulz, 2016. "Using Monte Carlo simulations to assess the impact of author name disambiguation quality on different bibliometric analyses," Scientometrics, Springer;Akadémiai Kiadó, vol. 107(3), pages 1283-1298, June.
    9. Hao Wu & Bo Li & Yijian Pei & Jun He, 2014. "Unsupervised author disambiguation using Dempster–Shafer theory," Scientometrics, Springer;Akadémiai Kiadó, vol. 101(3), pages 1955-1972, December.
    10. Vittorio Fuccella & Domenico De Stefano & Maria Prosperina Vitale & Susanna Zaccarin, 2016. "Improving co-authorship network structures by combining multiple data sources: evidence from Italian academic statisticians," Scientometrics, Springer;Akadémiai Kiadó, vol. 107(1), pages 167-184, April.
    11. Jinseok Kim & Jenna Kim, 2020. "Effect of forename string on author name disambiguation," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 71(7), pages 839-855, July.
    12. Li Zhang & Wei Lu & Jinqing Yang, 2023. "LAGOS‐AND: A large gold standard dataset for scholarly author name disambiguation," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 74(2), pages 168-185, February.
    13. Dongwook Shin & Taehwan Kim & Joongmin Choi & Jungsun Kim, 2014. "Author name disambiguation using a graph model with node splitting and merging based on bibliographic information," Scientometrics, Springer;Akadémiai Kiadó, vol. 100(1), pages 15-50, July.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:scient:v:98:y:2014:i:3:d:10.1007_s11192-013-1151-0. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.