IDEAS home Printed from https://ideas.repec.org/a/bla/jinfst/v71y2020i7p839-855.html
   My bibliography  Save this article

Effect of forename string on author name disambiguation

Author

Listed:
  • Jinseok Kim
  • Jenna Kim

Abstract

In author name disambiguation, author forenames are used to decide which name instances are disambiguated together and how much they are likely to refer to the same author. Despite such a crucial role of forenames, their effect on the performance of heuristic (string matching) and algorithmic disambiguation is not well understood. This study assesses the contributions of forenames in author name disambiguation using multiple labeled data sets under varying ratios and lengths of full forenames, reflecting real‐world scenarios in which an author is represented by forename variants (synonym) and some authors share the same forenames (homonym). The results show that increasing the ratios of full forenames substantially improves both heuristic and machine‐learning‐based disambiguation. Performance gains by algorithmic disambiguation are pronounced when many forenames are initialized or homonyms are prevalent. As the ratios of full forenames increase, however, they become marginal compared to those by string matching. Using a small portion of forename strings does not reduce much the performances of both heuristic and algorithmic disambiguation methods compared to using full‐length strings. These findings provide practical suggestions, such as restoring initialized forenames into a full‐string format via record linkage for improved disambiguation performances.

Suggested Citation

  • Jinseok Kim & Jenna Kim, 2020. "Effect of forename string on author name disambiguation," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 71(7), pages 839-855, July.
  • Handle: RePEc:bla:jinfst:v:71:y:2020:i:7:p:839-855
    DOI: 10.1002/asi.24298
    as

    Download full text from publisher

    File URL: https://doi.org/10.1002/asi.24298
    Download Restriction: no

    File URL: https://libkey.io/10.1002/asi.24298?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Jian Wang & Kaspars Berzins & Diana Hicks & Julia Melkers & Fang Xiao & Diogo Pinheiro, 2012. "A boosted-trees method for name disambiguation," Scientometrics, Springer;Akadémiai Kiadó, vol. 93(2), pages 391-411, November.
    2. Jinseok Kim & Jenna Kim, 2018. "The impact of imbalanced training data on machine learning for author name disambiguation," Scientometrics, Springer;Akadémiai Kiadó, vol. 117(1), pages 511-526, October.
    3. Dongwook Shin & Taehwan Kim & Joongmin Choi & Jungsun Kim, 2014. "Author name disambiguation using a graph model with node splitting and merging based on bibliographic information," Scientometrics, Springer;Akadémiai Kiadó, vol. 100(1), pages 15-50, July.
    4. Jiang Wu & Xiu-Hao Ding, 2013. "Author name disambiguation in scientific collaboration and mobility cases," Scientometrics, Springer;Akadémiai Kiadó, vol. 96(3), pages 683-697, September.
    5. Z. Xie & Z. Ouyang & J. Li & E. Dong & D. Yi, 2018. "Modelling transition phenomena of scientific coauthorship networks," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 69(2), pages 305-317, February.
    6. Jinseok Kim, 2019. "A fast and integrative algorithm for clustering performance evaluation in author name disambiguation," Scientometrics, Springer;Akadémiai Kiadó, vol. 120(2), pages 661-681, August.
    7. Mark-Christoph Müller & Florian Reitz & Nicolas Roy, 2017. "Data sets for author name disambiguation: an empirical analysis and a new resource," Scientometrics, Springer;Akadémiai Kiadó, vol. 111(3), pages 1467-1500, June.
    8. Jia Zhu & Xingcheng Wu & Xueqin Lin & Changqin Huang & Gabriel Pui Cheong Fung & Yong Tang, 2018. "A novel multiple layers name disambiguation framework for digital libraries using dynamic clustering," Scientometrics, Springer;Akadémiai Kiadó, vol. 114(3), pages 781-794, March.
    9. Jan Schulz, 2016. "Using Monte Carlo simulations to assess the impact of author name disambiguation quality on different bibliometric analyses," Scientometrics, Springer;Akadémiai Kiadó, vol. 107(3), pages 1283-1298, June.
    10. Jinseok Kim & Jinmo Kim & Jason Owen-Smith, 2019. "Generating automatically labeled data for author name disambiguation: an iterative clustering method," Scientometrics, Springer;Akadémiai Kiadó, vol. 118(1), pages 253-280, January.
    11. Alan Filipe Santana & Marcos André Gonçalves & Alberto H. F. Laender & Anderson A. Ferreira, 2017. "Incremental author name disambiguation by exploiting domain-specific heuristics," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 68(4), pages 931-945, April.
    12. Jinseok Kim, 2018. "Evaluating author name disambiguation for digital libraries: a case of DBLP," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(3), pages 1867-1886, September.
    13. repec:bla:jamist:v:56:y:2005:i:2:p:140-158 is not listed on IDEAS
    14. Yu Liu & Weijia Li & Zhen Huang & Qiang Fang, 2015. "A fast method based on multiple clustering for name disambiguation in bibliographic citations," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 66(3), pages 634-644, March.
    15. Ricardo G. Cota & Anderson A. Ferreira & Cristiano Nascimento & Marcos André Gonçalves & Alberto H. F. Laender, 2010. "An unsupervised heuristic-based hierarchical method for name disambiguation in bibliographic citations," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 61(9), pages 1853-1870, September.
    16. Wanli Liu & Rezarta Islamaj Doğan & Sun Kim & Donald C. Comeau & Won Kim & Lana Yeganova & Zhiyong Lu & W. John Wilbur, 2014. "Author name disambiguation for PubMed," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 65(4), pages 765-781, April.
    17. repec:bla:jamist:v:62:y:2011:i:4:p:677-690 is not listed on IDEAS
    18. repec:bla:jamist:v:61:y:2010:i:9:p:1853-1870 is not listed on IDEAS
    19. Hao Wu & Bo Li & Yijian Pei & Jun He, 2014. "Unsupervised author disambiguation using Dempster–Shafer theory," Scientometrics, Springer;Akadémiai Kiadó, vol. 101(3), pages 1955-1972, December.
    20. Song, Min & Kim, Erin Hea-Jin & Kim, Ha Jin, 2015. "Exploring author name disambiguation on PubMed-scale," Journal of Informetrics, Elsevier, vol. 9(4), pages 924-941.
    21. Natsuo Onodera & Mariko Iwasawa & Nobuyuki Midorikawa & Fuyuki Yoshikane & Kou Amano & Yutaka Ootani & Tadashi Kodama & Yasuhiko Kiyama & Hiroyuki Tsunoda & Shizuka Yamazaki, 2011. "A method for eliminating articles by homonymous authors from the large number of articles retrieved by author search," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 62(4), pages 677-690, April.
    22. Anderson A. Ferreira & Adriano Veloso & Marcos André Gonçalves & Alberto H. F. Laender, 2014. "Self-training author name disambiguation for information scarce scenarios," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 65(6), pages 1257-1278, June.
    23. Agustín D. Delgado & Raquel Martínez & Soto Montalvo & Víctor Fresno, 2017. "Person Name Disambiguation in the Web Using Adaptive Threshold Clustering," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 68(7), pages 1751-1762, July.
    24. Jinseok Kim & Jana Diesner, 2016. "Distortive effects of initial-based name disambiguation on measurements of large-scale coauthorship networks," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 67(6), pages 1446-1461, June.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Jinseok Kim & Jason Owen-Smith, 2021. "ORCID-linked labeled data for evaluating author name disambiguation at scale," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(3), pages 2057-2083, March.
    2. Mahsa Kaveh & Mahdieh Mirzabeigi & Hajar Sotudeh & Amirsaeid Moloodi, 2022. "The effects of the challenges in the transliteration of Persian names into English on the recall of retrieved results in the web of science," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(2), pages 1099-1128, February.
    3. Antonina Dattolo & Marco Corbatto, 2022. "Assisting researchers in bibliographic tasks: A new usable, real‐time tool for analyzing bibliographies," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 73(6), pages 757-776, June.
    4. Helena Mihaljević & Lucía Santamaría, 2021. "Disambiguation of author entities in ADS using supervised learning and graph theory methods," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(5), pages 3893-3917, May.
    5. Jinseok Kim & Jenna Kim & Jason Owen‐Smith, 2021. "Ethnicity‐based name partitioning for author name disambiguation using supervised machine learning," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 72(8), pages 979-994, August.
    6. Li Zhang & Wei Lu & Jinqing Yang, 2023. "LAGOS‐AND: A large gold standard dataset for scholarly author name disambiguation," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 74(2), pages 168-185, February.
    7. Humaira Waqas & Abdul Qadir, 2022. "Completing features for author name disambiguation (AND): an empirical analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(2), pages 1039-1063, February.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jinseok Kim, 2019. "A fast and integrative algorithm for clustering performance evaluation in author name disambiguation," Scientometrics, Springer;Akadémiai Kiadó, vol. 120(2), pages 661-681, August.
    2. Jinseok Kim & Jenna Kim & Jason Owen‐Smith, 2021. "Ethnicity‐based name partitioning for author name disambiguation using supervised machine learning," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 72(8), pages 979-994, August.
    3. Jinseok Kim & Jinmo Kim & Jason Owen-Smith, 2019. "Generating automatically labeled data for author name disambiguation: an iterative clustering method," Scientometrics, Springer;Akadémiai Kiadó, vol. 118(1), pages 253-280, January.
    4. Jinseok Kim & Jason Owen-Smith, 2021. "ORCID-linked labeled data for evaluating author name disambiguation at scale," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(3), pages 2057-2083, March.
    5. Jinseok Kim & Jenna Kim, 2018. "The impact of imbalanced training data on machine learning for author name disambiguation," Scientometrics, Springer;Akadémiai Kiadó, vol. 117(1), pages 511-526, October.
    6. Li Zhang & Wei Lu & Jinqing Yang, 2023. "LAGOS‐AND: A large gold standard dataset for scholarly author name disambiguation," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 74(2), pages 168-185, February.
    7. KM. Pooja & Samrat Mondal & Joydeep Chandra, 2021. "Exploiting similarities across multiple dimensions for author name disambiguation," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(9), pages 7525-7560, September.
    8. Humaira Waqas & Muhammad Abdul Qadir, 2021. "Multilayer heuristics based clustering framework (MHCF) for author name disambiguation," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(9), pages 7637-7678, September.
    9. Ciriaco Andrea D’Angelo & Nees Jan Eck, 2020. "Collecting large-scale publication data at the level of individual researchers: a practical proposal for author name disambiguation," Scientometrics, Springer;Akadémiai Kiadó, vol. 123(2), pages 883-907, May.
    10. Rehs, Andreas, 2021. "A supervised machine learning approach to author disambiguation in the Web of Science," Journal of Informetrics, Elsevier, vol. 15(3).
    11. Jan Schulz, 2016. "Using Monte Carlo simulations to assess the impact of author name disambiguation quality on different bibliometric analyses," Scientometrics, Springer;Akadémiai Kiadó, vol. 107(3), pages 1283-1298, June.
    12. Humaira Waqas & Abdul Qadir, 2022. "Completing features for author name disambiguation (AND): an empirical analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(2), pages 1039-1063, February.
    13. Jinseok Kim, 2018. "Evaluating author name disambiguation for digital libraries: a case of DBLP," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(3), pages 1867-1886, September.
    14. Andrea Ancona & Roy Cerqueti & Gianluca Vagnani, 2023. "A novel methodology to disambiguate organization names: an application to EU Framework Programmes data," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(8), pages 4447-4474, August.
    15. Helena Mihaljević & Lucía Santamaría, 2021. "Disambiguation of author entities in ADS using supervised learning and graph theory methods," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(5), pages 3893-3917, May.
    16. Janaína Gomide & Hugo Kling & Daniel Figueiredo, 2017. "Name usage pattern in the synonym ambiguity problem in bibliographic data," Scientometrics, Springer;Akadémiai Kiadó, vol. 112(2), pages 747-766, August.
    17. Mark-Christoph Müller & Florian Reitz & Nicolas Roy, 2017. "Data sets for author name disambiguation: an empirical analysis and a new resource," Scientometrics, Springer;Akadémiai Kiadó, vol. 111(3), pages 1467-1500, June.
    18. Deyun Yin & Kazuyuki Motohashi & Jianwei Dang, 2020. "Large-scale name disambiguation of Chinese patent inventors (1985–2016)," Scientometrics, Springer;Akadémiai Kiadó, vol. 122(2), pages 765-790, February.
    19. Jia Zhu & Xingcheng Wu & Xueqin Lin & Changqin Huang & Gabriel Pui Cheong Fung & Yong Tang, 2018. "A novel multiple layers name disambiguation framework for digital libraries using dynamic clustering," Scientometrics, Springer;Akadémiai Kiadó, vol. 114(3), pages 781-794, March.
    20. Song, Min & Kim, Erin Hea-Jin & Kim, Ha Jin, 2015. "Exploring author name disambiguation on PubMed-scale," Journal of Informetrics, Elsevier, vol. 9(4), pages 924-941.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:jinfst:v:71:y:2020:i:7:p:839-855. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: http://www.asis.org .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.