IDEAS home Printed from https://ideas.repec.org/a/inm/ormnsc/v68y2022i4p2600-2618.html
   My bibliography  Save this article

Implications of Data Anonymization on the Statistical Evidence of Disparity

Author

Listed:
  • Heng Xu

    (Kogod School of Business, American University, Washington, District of Columbia 20016)

  • Nan Zhang

    (Kogod School of Business, American University, Washington, District of Columbia 20016)

Abstract

Research and practical development of data-anonymization techniques have proliferated in recent years. Yet, limited attention has been paid to examine the potentially disparate impact of privacy protection on underprivileged subpopulations. This study is one of the first attempts to examine the extent to which data anonymization could mask the gross statistical disparities between subpopulations in the data. We first describe two common mechanisms of data anonymization and two prevalent types of statistical evidence for disparity. Then, we develop conceptual foundation and mathematical formalism demonstrating that the two data-anonymization mechanisms have distinctive impacts on the identifiability of disparity, which also varies based on its statistical operationalization. After validating our findings with empirical evidence, we discuss the business and policy implications, highlighting the need for firms and policy makers to balance between the protection of privacy and the recognition/rectification of disparate impact.

Suggested Citation

  • Heng Xu & Nan Zhang, 2022. "Implications of Data Anonymization on the Statistical Evidence of Disparity," Management Science, INFORMS, vol. 68(4), pages 2600-2618, April.
  • Handle: RePEc:inm:ormnsc:v:68:y:2022:i:4:p:2600-2618
    DOI: 10.1287/mnsc.2021.4028
    as

    Download full text from publisher

    File URL: http://dx.doi.org/10.1287/mnsc.2021.4028
    Download Restriction: no

    File URL: https://libkey.io/10.1287/mnsc.2021.4028?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Krishnamurty Muralidhar & Dinesh Batra & Peeter J. Kirs, 1995. "Accessibility, Security, and Accuracy in Statistical Databases: The Case for the Multiplicative Fixed Data Perturbation Approach," Management Science, INFORMS, vol. 41(9), pages 1549-1564, September.
    2. Xiao-Bai Li & Sumit Sarkar, 2013. "Class-Restricted Clustering and Microperturbation for Data Privacy," Management Science, INFORMS, vol. 59(4), pages 796-812, April.
    3. John M. Abowd & Ian M. Schmutte, 2019. "An Economic Analysis of Privacy Protection and Statistical Accuracy as Social Choices," American Economic Review, American Economic Association, vol. 109(1), pages 171-202, January.
    4. Luc Rocher & Julien M. Hendrickx & Yves-Alexandre de Montjoye, 2019. "Estimating the success of re-identifications in incomplete datasets using generative models," Nature Communications, Nature, vol. 10(1), pages 1-9, December.
    5. Krishnamurty Muralidhar & Rahul Parsa & Rathindra Sarathy, 1999. "A General Additive Data Perturbation Method for Database Security," Management Science, INFORMS, vol. 45(10), pages 1399-1415, October.
    6. Santos-Lozada, Alexis R & Perez-Rivera, Danilo T & Bhat, Aarti C., 2020. "How differential privacy will affect our understanding of population growth in the United States," SocArXiv pmux7, Center for Open Science.
    7. John, Leslie K. & Loewenstein, George & Acquisti, Alessandro & Vosgerau, Joachim, 2018. "When and why randomized response techniques (fail to) elicit the truth," Organizational Behavior and Human Decision Processes, Elsevier, vol. 148(C), pages 101-123.
    8. Jon Kleinberg & Sendhil Mullainathan, 2019. "Simplicity Creates Inequity: Implications for Fairness, Stereotypes, and Interpretability," NBER Working Papers 25854, National Bureau of Economic Research, Inc.
    9. Phyllis A. Siegel & Donald C. Hambrick, 2005. "Pay Disparities Within Top Management Groups: Evidence of Harmful Effects on Performance of High-Technology Firms," Organization Science, INFORMS, vol. 16(3), pages 259-274, June.
    10. Templ, Matthias & Kowarik, Alexander & Meindl, Bernhard, 2015. "Statistical Disclosure Control for Micro-Data Using the R Package sdcMicro," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 67(i04).
    11. Alexis R. Santos-Lozada & Jeffrey T. Howard & Ashton M. Verdery, 2020. "How differential privacy will affect our understanding of health disparities in the United States," Proceedings of the National Academy of Sciences, Proceedings of the National Academy of Sciences, vol. 117(24), pages 13405-13412, June.
    12. Alessandro Acquisti & Christina Fong, 2020. "An Experiment in Hiring Discrimination via Online Social Networks," Management Science, INFORMS, vol. 66(3), pages 1005-1024, March.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Nan Zhang & Heng Xu, 2024. "Fairness of Ratemaking for Catastrophe Insurance: Lessons from Machine Learning," Information Systems Research, INFORMS, vol. 35(2), pages 469-488, June.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Haibing Lu & Jaideep Vaidya & Vijayalakshmi Atluri & Yingjiu Li, 2015. "Statistical Database Auditing Without Query Denial Threat," INFORMS Journal on Computing, INFORMS, vol. 27(1), pages 20-34, February.
    2. Ron S. Jarmin & John M. Abowd & Robert Ashmead & Ryan Cumings-Menon & Nathan Goldschlag & Michael B. Hawes & Sallie Ann Keller & Daniel Kifer & Philip Leclerc & Jerome P. Reiter & Rolando A. Rodrígue, 2023. "An in-depth examination of requirements for disclosure risk assessment," Proceedings of the National Academy of Sciences, Proceedings of the National Academy of Sciences, vol. 120(43), pages 2220558120-, October.
    3. Sigurd Dyrting & Abraham Flaxman & Ethan Sharygin, 2022. "Reconstruction of age distributions from differentially private census data," Population Research and Policy Review, Springer;Southern Demographic Association (SDA), vol. 41(6), pages 2311-2329, December.
    4. Rathindra Sarathy & Krishnamurty Muralidhar & Rahul Parsa, 2002. "Perturbing Nonnormal Confidential Attributes: The Copula Approach," Management Science, INFORMS, vol. 48(12), pages 1613-1627, December.
    5. Rehse, Dominik & Tremöhlen, Felix, 2020. "Fostering participation in digital public health interventions: The case of digital contact tracing," ZEW Discussion Papers 20-076, ZEW - Leibniz Centre for European Economic Research.
    6. Yi Qian & Hui Xie, 2013. "Drive More Effective Data-Based Innovations: Enhancing the Utility of Secure Databases," NBER Working Papers 19586, National Bureau of Economic Research, Inc.
    7. Chu, Amanda M.Y. & Ip, Chun Yin & Lam, Benson S.Y. & So, Mike K.P., 2022. "Vine copula statistical disclosure control for mixed-type data," Computational Statistics & Data Analysis, Elsevier, vol. 176(C).
    8. Yi Qian & Hui Xie, 2015. "Drive More Effective Data-Based Innovations: Enhancing the Utility of Secure Databases," Management Science, INFORMS, vol. 61(3), pages 520-541, March.
    9. Rathindra Sarathy & Krishnamurty Muralidhar, 2002. "The Security of Confidential Numerical Data in Databases," Information Systems Research, INFORMS, vol. 13(4), pages 389-403, December.
    10. Manuel A. Nunez & Robert S. Garfinkel & Ram D. Gopal, 2007. "Stochastic Protection of Confidential Information in Databases: A Hybrid of Data Perturbation and Query Restriction," Operations Research, INFORMS, vol. 55(5), pages 890-908, October.
    11. J. Tom Mueller & Alexis R. Santos-Lozada, 2022. "The 2020 US Census Differential Privacy Method Introduces Disproportionate Discrepancies for Rural and Non-White Populations," Population Research and Policy Review, Springer;Southern Demographic Association (SDA), vol. 41(4), pages 1417-1430, August.
    12. Seokho Lee & Marc G. Genton & Reinaldo B. Arellano-Valle, 2010. "Perturbation of Numerical Confidential Data via Skew-t Distributions," Management Science, INFORMS, vol. 56(2), pages 318-333, February.
    13. Steven Ruggles & David Riper, 2022. "The Role of Chance in the Census Bureau Database Reconstruction Experiment," Population Research and Policy Review, Springer;Southern Demographic Association (SDA), vol. 41(3), pages 781-788, June.
    14. Francesco Capozza & Ingar Haaland & Christopher Roth & Johannes Wohlfart, 2021. "Studying Information Acquisition in the Field: A Practical Guide and Review," CEBI working paper series 21-15, University of Copenhagen. Department of Economics. The Center for Economic Behavior and Inequality (CEBI).
    15. John R. J. Thompson & Longlong Feng & R. Mark Reesor & Chuck Grace, 2021. "Know Your Clients’ Behaviours: A Cluster Analysis of Financial Transactions," JRFM, MDPI, vol. 14(2), pages 1-29, January.
    16. John M. Abowd & Ian M. Schmutte & William Sexton & Lars Vilhuber, 2019. "Suboptimal Provision of Privacy and Statistical Accuracy When They are Public Goods," Papers 1906.09353, arXiv.org.
    17. Sylvain Chassang & Christian Zehnder, 2019. "Secure Survey Design in Organizations: Theory and Experiments," Working Papers 2019-22, Princeton University. Economics Department..
    18. P. Daniel Wright & Matthew J. Liberatore & Robert L. Nydick, 2006. "A Survey of Operations Research Models and Applications in Homeland Security," Interfaces, INFORMS, vol. 36(6), pages 514-529, December.
    19. Wang, Liwen & Zhao, Jane Zheng & Zhou, Kevin Zheng, 2018. "How do incentives motivate absorptive capacity development? The mediating role of employee learning and relational contingencies," Journal of Business Research, Elsevier, vol. 85(C), pages 226-237.
    20. Burgstaller, Lilith & Feld, Lars P. & Pfeil, Katharina, 2022. "Working in the shadow: Survey techniques for measuring and explaining undeclared work," Journal of Economic Behavior & Organization, Elsevier, vol. 200(C), pages 661-671.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:ormnsc:v:68:y:2022:i:4:p:2600-2618. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.