IDEAS home Printed from https://ideas.repec.org/a/spr/scient/v91y2012i2d10.1007_s11192-011-0589-1.html
   My bibliography  Save this article

Author disambiguation using multi-aspect similarity indicators

Author

Listed:
  • Thomas Gurney

    (Rathenau Institute)

  • Edwin Horlings

    (Rathenau Institute)

  • Peter van den Besselaar

    (VU University Amsterdam)

Abstract

Key to accurate bibliometric analyses is the ability to correctly link individuals to their corpus of work, with an optimal balance between precision and recall. We have developed an algorithm that does this disambiguation task with a very high recall and precision. The method addresses the issues of discarded records due to null data fields and their resultant effect on recall, precision and F-measure results. We have implemented a dynamic approach to similarity calculations based on all available data fields. We have also included differences in author contribution and age difference between publications, both of which have meaningful effects on overall similarity measurements, resulting in significantly higher recall and precision of returned records. The results are presented from a test dataset of heterogeneous catalysis publications. Results demonstrate significantly high average F-measure scores and substantial improvements on previous and stand-alone techniques.

Suggested Citation

  • Thomas Gurney & Edwin Horlings & Peter van den Besselaar, 2012. "Author disambiguation using multi-aspect similarity indicators," Scientometrics, Springer;Akadémiai Kiadó, vol. 91(2), pages 435-449, May.
  • Handle: RePEc:spr:scient:v:91:y:2012:i:2:d:10.1007_s11192-011-0589-1
    DOI: 10.1007/s11192-011-0589-1
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11192-011-0589-1
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11192-011-0589-1?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. T. J. Phelan, 1999. "A compendium of issues for citation analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 45(1), pages 117-136, May.
    2. Manuel Trajtenberg & Gil Shiff & Ran Melamed, 2009. "The "Names Game": Harnessing Inventors, Patent Data for Economic Research," Annals of Economics and Statistics, GENES, issue 93-94, pages 67-77.
    3. Raffo, Julio & Lhuillery, Stéphane, 2009. "How to play the "Names Game": Patent retrieval comparing different heuristics," Research Policy, Elsevier, vol. 38(10), pages 1617-1627, December.
    4. Li Tang & John P. Walsh, 2010. "Bibliometric fingerprints: name disambiguation based on approximate structure equivalence of cognitive maps," Scientometrics, Springer;Akadémiai Kiadó, vol. 84(3), pages 763-784, September.
    5. Martin S. Meyer, 2001. "Patent citation analysis in a novel field of technology:An exploration of nano-science and nano-technology," Scientometrics, Springer;Akadémiai Kiadó, vol. 51(1), pages 163-183, April.
    6. Natsuo Onodera & Mariko Iwasawa & Nobuyuki Midorikawa & Fuyuki Yoshikane & Kou Amano & Yutaka Ootani & Tadashi Kodama & Yasuhiko Kiyama & Hiroyuki Tsunoda & Shizuka Yamazaki, 2011. "A method for eliminating articles by homonymous authors from the large number of articles retrieved by author search," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 62(4), pages 677-690, April.
    7. Henk F. Moed, 2000. "Bibliometric Indicators Reflect Publication and Management Strategies," Scientometrics, Springer;Akadémiai Kiadó, vol. 47(2), pages 323-346, February.
    8. Healey, Peter & Rothman, Harry & Hoch, Paul K., 1986. "An experiment in science mapping for research planning," Research Policy, Elsevier, vol. 15(5), pages 233-251, October.
    9. Leydesdorff, Loet & Cozzens, Susan & Van den Besselaar, Peter, 1994. "Tracking areas of strategic importance using scientometric journal mappings," Research Policy, Elsevier, vol. 23(2), pages 217-229, March.
    10. Bruno Cassiman & Patrick Glenisson & Bart Looy, 2007. "Measuring industry-science links through inventor-author relations: A profiling methodology," Scientometrics, Springer;Akadémiai Kiadó, vol. 70(2), pages 379-391, February.
    11. Gerard Pasterkamp & Joris I. Rotmans & Dominique V. P. Kleijn & Cornelius Borst, 2007. "Citation frequency: A biased measure of research impact significantly influenced by the geographical origin of research articles," Scientometrics, Springer;Akadémiai Kiadó, vol. 70(1), pages 153-165, January.
    12. Natsuo Onodera & Mariko Iwasawa & Nobuyuki Midorikawa & Fuyuki Yoshikane & Kou Amano & Yutaka Ootani & Tadashi Kodama & Yasuhiko Kiyama & Hiroyuki Tsunoda & Shizuka Yamazaki, 2011. "A method for eliminating articles by homonymous authors from the large number of articles retrieved by author search," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 62(4), pages 677-690, April.
    13. Dag W. Aksnes, 2003. "A macro study of self-citation," Scientometrics, Springer;Akadémiai Kiadó, vol. 56(2), pages 235-246, February.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Fernanda Morillo & Ignacio Santabárbara & Javier Aparicio, 2013. "The automatic normalisation challenge: detailed addresses identification," Scientometrics, Springer;Akadémiai Kiadó, vol. 95(3), pages 953-966, June.
    2. Rehs, Andreas, 2021. "A supervised machine learning approach to author disambiguation in the Web of Science," Journal of Informetrics, Elsevier, vol. 15(3).
    3. Maxim Kotsemir & Sergey Shashnov, 2017. "Measuring, analysis and visualization of research capacity of university at the level of departments and staff members," Scientometrics, Springer;Akadémiai Kiadó, vol. 112(3), pages 1659-1689, September.
    4. Wu, Jiang, 2013. "Investigating the universal distributions of normalized indicators and developing field-independent index," Journal of Informetrics, Elsevier, vol. 7(1), pages 63-71.
    5. Hao Wu & Bo Li & Yijian Pei & Jun He, 2014. "Unsupervised author disambiguation using Dempster–Shafer theory," Scientometrics, Springer;Akadémiai Kiadó, vol. 101(3), pages 1955-1972, December.
    6. Koski, Timo & Sandström, Erik & Sandström, Ulf, 2016. "Towards field-adjusted production: Estimating research productivity from a zero-truncated distribution," Journal of Informetrics, Elsevier, vol. 10(4), pages 1143-1152.
    7. Omar Hernando Avila-Poveda, 2014. "Technical report: the trend of author compound names and its implications for authorship identity identification," Scientometrics, Springer;Akadémiai Kiadó, vol. 101(1), pages 833-846, October.
    8. Gurney, Thomas & Horlings, Edwin & van den Besselaar, Peter & Sumikura, Koichi & Schoen, Antoine & Laurens, Patricia & Pardo, Daniel, 2014. "Analysing knowledge capture mechanisms: Methods and a stylised bioventure case," Journal of Informetrics, Elsevier, vol. 8(1), pages 259-272.
    9. Jiang Wu & Xiu-Hao Ding, 2013. "Author name disambiguation in scientific collaboration and mobility cases," Scientometrics, Springer;Akadémiai Kiadó, vol. 96(3), pages 683-697, September.
    10. Matthijs den Besten & Catalina Martínez & Nicolas Besson & Stéphane Maraut & Jean-Michel Dalle, 2014. "Human computing via online labor markets. The perils and promises of crowdsourcing in data-rich ecosystems," Working Papers 1402, Instituto de Políticas y Bienes Públicos (IPP), CSIC.
    11. Trapido, Denis, 2015. "How novelty in knowledge earns recognition: The role of consistent identities," Research Policy, Elsevier, vol. 44(8), pages 1488-1500.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Bar-Ilan, Judit, 2008. "Informetrics at the beginning of the 21st century—A review," Journal of Informetrics, Elsevier, vol. 2(1), pages 1-52.
    2. Ventura, Samuel L. & Nugent, Rebecca & Fuchs, Erica R.H., 2015. "Seeing the non-stars: (Some) sources of bias in past disambiguation approaches and a new public tool leveraging labeled records," Research Policy, Elsevier, vol. 44(9), pages 1672-1701.
    3. Michele Pezzoni & Francesco Lissoni & Gianluca Tarasconi, 2014. "How to kill inventors: testing the Massacrator© algorithm for inventor disambiguation," Scientometrics, Springer;Akadémiai Kiadó, vol. 101(1), pages 477-504, October.
    4. Jian Wang & Kaspars Berzins & Diana Hicks & Julia Melkers & Fang Xiao & Diogo Pinheiro, 2012. "A boosted-trees method for name disambiguation," Scientometrics, Springer;Akadémiai Kiadó, vol. 93(2), pages 391-411, November.
    5. Alexander N. Larcombe & Sasha C. Voss, 2011. "Self-citation: comparison between Radiology, European Radiology and Radiology for 1997–1998," Scientometrics, Springer;Akadémiai Kiadó, vol. 87(2), pages 347-356, May.
    6. Li, Guan-Cheng & Lai, Ronald & D’Amour, Alexander & Doolin, David M. & Sun, Ye & Torvik, Vetle I. & Yu, Amy Z. & Fleming, Lee, 2014. "Disambiguation and co-authorship networks of the U.S. patent inventor database (1975–2010)," Research Policy, Elsevier, vol. 43(6), pages 941-955.
    7. Li Tang & John P. Walsh, 2010. "Bibliometric fingerprints: name disambiguation based on approximate structure equivalence of cognitive maps," Scientometrics, Springer;Akadémiai Kiadó, vol. 84(3), pages 763-784, September.
    8. Rehs, Andreas, 2021. "A supervised machine learning approach to author disambiguation in the Web of Science," Journal of Informetrics, Elsevier, vol. 15(3).
    9. Jiang Wu & Xiu-Hao Ding, 2013. "Author name disambiguation in scientific collaboration and mobility cases," Scientometrics, Springer;Akadémiai Kiadó, vol. 96(3), pages 683-697, September.
    10. Deyun Yin & Kazuyuki Motohashi & Jianwei Dang, 2020. "Large-scale name disambiguation of Chinese patent inventors (1985–2016)," Scientometrics, Springer;Akadémiai Kiadó, vol. 122(2), pages 765-790, February.
    11. Wen-Yau Cathy Lin & Mu-Hsuan Huang, 2012. "The relationship between co-authorship, currency of references and author self-citations," Scientometrics, Springer;Akadémiai Kiadó, vol. 90(2), pages 343-360, February.
    12. Roberta Piergiovanni & Enrico Santarelli, 2013. "The more you spend, the more you get? The effects of R&D and capital expenditures on the patenting activities of biotechnology firms," Scientometrics, Springer;Akadémiai Kiadó, vol. 94(2), pages 497-521, February.
    13. Paulo Vinícius Marcondes Cordeiro & Dario Eduardo Amaral Dergint & Kazuo Hatakeyama, 2014. "Proposal Of Method For An Automatic Complementarities Search Between Companies' R&D," International Journal of Innovation and Technology Management (IJITM), World Scientific Publishing Co. Pte. Ltd., vol. 11(02), pages 1-21.
    14. Zaggl, Michael A., 2017. "Manipulation of explicit reputation in innovation and knowledge exchange communities: The example of referencing in science," Research Policy, Elsevier, vol. 46(5), pages 970-983.
    15. Benjamin Balsmeier & Mohamad Assaf & Tyler Chesebro & Gabe Fierro & Kevin Johnson & Scott Johnson & Guan‐Cheng Li & Sonja Lück & Doug O'Reagan & Bill Yeh & Guangzheng Zang & Lee Fleming, 2018. "Machine learning and natural language processing on the patent corpus: Data, tools, and new measures," Journal of Economics & Management Strategy, Wiley Blackwell, vol. 27(3), pages 535-553, September.
    16. Martin Ganco & Rosemarie H. Ziedonis & Rajshree Agarwal, 2015. "More stars stay, but the brightest ones still leave: Job hopping in the shadow of patent enforcement," Strategic Management Journal, Wiley Blackwell, vol. 36(5), pages 659-685, May.
    17. Onodera, Natsuo, 2016. "Properties of an index of citation durability of an article," Journal of Informetrics, Elsevier, vol. 10(4), pages 981-1004.
    18. Carayol, Nicolas & Bergé, Laurent & Cassi, Lorenzo & Roux, Pascale, 2019. "Unintended triadic closure in social networks: The strategic formation of research collaborations between French inventors," Journal of Economic Behavior & Organization, Elsevier, vol. 163(C), pages 218-238.
    19. Loet Leydesdorff & Ping Zhou, 2007. "Nanotechnology as a field of science: Its delineation in terms of journals and patents," Scientometrics, Springer;Akadémiai Kiadó, vol. 70(3), pages 693-713, March.
    20. Stéphane Maraut & Catalina Martínez, 2014. "Identifying author–inventors from Spain: methods and a first insight into results," Scientometrics, Springer;Akadémiai Kiadó, vol. 101(1), pages 445-476, October.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:scient:v:91:y:2012:i:2:d:10.1007_s11192-011-0589-1. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.