IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0264270.html
   My bibliography  Save this article

Avoiding bias when inferring race using name-based approaches

Author

Listed:
  • Diego Kozlowski
  • Dakota S Murray
  • Alexis Bell
  • Will Hulsey
  • Vincent Larivière
  • Thema Monroe-White
  • Cassidy R Sugimoto

Abstract

Racial disparity in academia is a widely acknowledged problem. The quantitative understanding of racial-based systemic inequalities is an important step towards a more equitable research system. However, because of the lack of robust information on authors’ race, few large-scale analyses have been performed on this topic. Algorithmic approaches offer one solution, using known information about authors, such as their names, to infer their perceived race. As with any other algorithm, the process of racial inference can generate biases if it is not carefully considered. The goal of this article is to assess the extent to which algorithmic bias is introduced using different approaches for name-based racial inference. We use information from the U.S. Census and mortgage applications to infer the race of U.S. affiliated authors in the Web of Science. We estimate the effects of using given and family names, thresholds or continuous distributions, and imputation. Our results demonstrate that the validity of name-based inference varies by race/ethnicity and that threshold approaches underestimate Black authors and overestimate White authors. We conclude with recommendations to avoid potential biases. This article lays the foundation for more systematic and less-biased investigations into racial disparities in science.

Suggested Citation

  • Diego Kozlowski & Dakota S Murray & Alexis Bell & Will Hulsey & Vincent Larivière & Thema Monroe-White & Cassidy R Sugimoto, 2022. "Avoiding bias when inferring race using name-based approaches," PLOS ONE, Public Library of Science, vol. 17(3), pages 1-16, March.
  • Handle: RePEc:plo:pone00:0264270
    DOI: 10.1371/journal.pone.0264270
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0264270
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0264270&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0264270?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Vincent Larivière & Chaoqun Ni & Yves Gingras & Blaise Cronin & Cassidy R. Sugimoto, 2013. "Bibliometrics: Global gender disparities in science," Nature, Nature, vol. 504(7479), pages 211-213, December.
    2. Baum, Matthew A. & Dietrich, Bryce J. & Goldstein, Rebecca & Sen, Maya, 2019. "Estimating the Effect of Asking About Citizenship on the US Census: Results from a Randomized Controlled Trial," Working Paper Series rwp19-015, Harvard University, John F. Kennedy School of Government.
    3. Roland G. Fryer & Steven D. Levitt, 2004. "The Causes and Consequences of Distinctively Black Names," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 119(3), pages 767-805.
    4. Richard B. Freeman & Wei Huang, 2014. "Collaborating With People Like Me: Ethnic co-authorship within the US," NBER Working Papers 19905, National Bureau of Economic Research, Inc.
    5. Jinseok Kim & Jenna Kim & Jason Owen‐Smith, 2021. "Ethnicity‐based name partitioning for author name disambiguation using supervised machine learning," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 72(8), pages 979-994, August.
    6. Gerald Marschke & Allison Nunez & Bruce A. Weinberg & Huifeng Yu, 2018. "Last Place? The Intersection of Ethnicity, Gender, and Race in Biomedical Authorship," AEA Papers and Proceedings, American Economic Association, vol. 108, pages 222-227, May.
    7. Lisa Cook, 2014. "Violence and economic activity: evidence from African American patents, 1870–1940," Journal of Economic Growth, Springer, vol. 19(2), pages 221-257, June.
    8. John Brandt & Kathleen Buckingham & Cody Buntain & Will Anderson & Sabin Ray & John-Rob Pool & Natasha Ferrari, 2020. "Identifying social media user demographics and topic diversity with computational social science: a case study of a major international policy forum," Journal of Computational Social Science, Springer, vol. 3(1), pages 167-188, April.
    9. Allison L. Hopkins & James W. Jawitz & Christopher McCarty & Alex Goldman & Nandita B. Basu, 2013. "Disparities in publication patterns by gender, race and ethnicity based on a survey of a random sample of authors," Scientometrics, Springer;Akadémiai Kiadó, vol. 96(2), pages 515-534, August.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Nakajima, Kazuki & Liu, Ruodan & Shudo, Kazuyuki & Masuda, Naoki, 2023. "Quantifying gender imbalance in East Asian academia: Research career and citation practice," Journal of Informetrics, Elsevier, vol. 17(4).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Zhang, Ning & He, Guangye & Shi, Dongbo & Zhao, Zhenyue & Li, Jiang, 2022. "Does a gender-neutral name associate with the research impact of a scientist?," Journal of Informetrics, Elsevier, vol. 16(1).
    2. Mike Thelwall & Tamara Nevill, 2019. "No evidence of citation bias as a determinant of STEM gender disparities in US biochemistry, genetics and molecular biology research," Scientometrics, Springer;Akadémiai Kiadó, vol. 121(3), pages 1793-1801, December.
    3. Vasarhelyi, Orsolya & Brooke, Siân, 2022. "Computing Gender," SocArXiv admcs, Center for Open Science.
    4. Luke Holman & Claire Morandin, 2019. "Researchers collaborate with same-gendered colleagues more often than expected across the life sciences," PLOS ONE, Public Library of Science, vol. 14(4), pages 1-19, April.
    5. Aleksandra Cislak & Magdalena Formanowicz & Tamar Saguy, 2018. "Bias against research on gender bias," Scientometrics, Springer;Akadémiai Kiadó, vol. 115(1), pages 189-200, April.
    6. González-Álvarez, Julio & Cervera-Crespo, Teresa, 2017. "Research production in high-impact journals of contemporary neuroscience: A gender analysis," Journal of Informetrics, Elsevier, vol. 11(1), pages 232-243.
    7. Johannes Buggle & Thierry Mayer & Seyhun Orcan Sakalli & Mathias Thoenig, 2023. "The Refugee’s Dilemma: Evidence from Jewish Migration out of Nazi Germany," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 138(2), pages 1273-1345.
    8. Chowdhury, Shyamal & Ooi, Evarn & Slonim, Robert, 2017. "Racial discrimination and white first name adoption: a field experiment in the Australian labour market," Working Papers 2017-15, University of Sydney, School of Economics.
    9. Mujcic, Redzo & Frijters, Paul, 2013. "Still Not Allowed on the Bus: It Matters If You're Black or White!," IZA Discussion Papers 7300, Institute of Labor Economics (IZA).
    10. Keith Head & Thierry Mayer, 2008. "Detection Of Local Interactions From The Spatial Pattern Of Names In France," Journal of Regional Science, Wiley Blackwell, vol. 48(1), pages 67-95, February.
    11. Lin Zhang & Yuanyuan Shang & Ying Huang & Gunnar Sivertsen, 2022. "Gender differences among active reviewers: an investigation based on publons," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(1), pages 145-179, January.
    12. Jan Hanousek & Štěpán Jurajda, 2018. "Názvy společností a jejich vliv na výkonnost firem [Corporate Names and Performance]," Politická ekonomie, Prague University of Economics and Business, vol. 2018(6), pages 671-688.
    13. Collado, M. Dolores & Ortuño Ortin, Ignacio & Romeu, Andrés, 2008. "Vertical Transmission of Consumption Behavior and the Distribution of Surnames," UMUFAE Economics Working Papers 2651, DIGITUM. Universidad de Murcia.
    14. Button, Patrick & Walker, Brigham, 2020. "Employment discrimination against Indigenous Peoples in the United States: Evidence from a field experiment," Labour Economics, Elsevier, vol. 65(C).
    15. Hoekman, Jarno & Rake, Bastian, 2024. "Geography of authorship: How geography shapes authorship attribution in big team science," Research Policy, Elsevier, vol. 53(2).
    16. Kai On Wong & Osmar R Zaïane & Faith G Davis & Yutaka Yasui, 2020. "A machine learning approach to predict ethnicity using personal name and census location in Canada," PLOS ONE, Public Library of Science, vol. 15(11), pages 1-16, November.
    17. Lisa Cook, 2014. "Violence and economic activity: evidence from African American patents, 1870–1940," Journal of Economic Growth, Springer, vol. 19(2), pages 221-257, June.
    18. Mohammadi, Ali & Broström, Anders & Franzoni, Chiara, 2015. "Work Force Composition and Innovation: How Diversity in Employees’ Ethnical and Disciplinary Backgrounds Facilitates Knowledge Re-combination," Working Paper Series in Economics and Institutions of Innovation 413, Royal Institute of Technology, CESIS - Centre of Excellence for Science and Innovation Studies.
    19. Wu, Jiang & Ou, Guiyan & Liu, Xiaohui & Dong, Ke, 2022. "How does academic education background affect top researchers’ performance? Evidence from the field of artificial intelligence," Journal of Informetrics, Elsevier, vol. 16(2).
    20. Olivetti, Claudia & Paserman, M. Daniele & Salisbury, Laura, 2018. "Three-generation mobility in the United States, 1850–1940: The role of maternal and paternal grandparents," Explorations in Economic History, Elsevier, vol. 70(C), pages 73-90.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0264270. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.