IDEAS home Printed from https://ideas.repec.org/p/nbr/nberwo/19433.html
   My bibliography  Save this paper

Privacy and Data-Based Research

Author

Listed:
  • Ori Heffetz
  • Katrina Ligett

Abstract

What can we, as users of microdata, formally guarantee to the individuals (or firms) in our dataset, regarding their privacy? We retell a few stories, well-known in data-privacy circles, of failed anonymization attempts in publicly released datasets. We then provide a mostly informal introduction to several ideas from the literature on differential privacy, an active literature in computer science that studies formal approaches to preserving the privacy of individuals in statistical databases. We apply some of its insights to situations routinely faced by applied economists, emphasizing big-data contexts.

Suggested Citation

  • Ori Heffetz & Katrina Ligett, 2013. "Privacy and Data-Based Research," NBER Working Papers 19433, National Bureau of Economic Research, Inc.
  • Handle: RePEc:nbr:nberwo:19433
    Note: AG LS PE
    as

    Download full text from publisher

    File URL: http://www.nber.org/papers/w19433.pdf
    Download Restriction: no
    ---><---

    Other versions of this item:

    References listed on IDEAS

    as
    1. Peter L. Rousseau, 2024. "Report of the Secretary," AEA Papers and Proceedings, American Economic Association, vol. 114, pages 701-705, May.
    2. Satkartar K. Kinney & Jerome P. Reiter & Arnold P. Reznek & Javier Miranda & Ron S. Jarmin & John M. Abowd, 2011. "Towards Unrestricted Public Use Business Microdata: The Synthetic Longitudinal Business Database," International Statistical Review, International Statistical Institute, vol. 79(3), pages 362-384, December.
    3. Wasserman, Larry & Zhou, Shuheng, 2010. "A Statistical Framework for Differential Privacy," Journal of the American Statistical Association, American Statistical Association, vol. 105(489), pages 375-389.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Garret Christensen & Edward Miguel, 2018. "Transparency, Reproducibility, and the Credibility of Economics Research," Journal of Economic Literature, American Economic Association, vol. 56(3), pages 920-980, September.
    2. John M. Abowd & Ian M. Schmutte & William Sexton & Lars Vilhuber, 2019. "Suboptimal Provision of Privacy and Statistical Accuracy When They are Public Goods," Papers 1906.09353, arXiv.org.
    3. Katherine B. Coffman & Lucas C. Coffman & Keith M. Marzilli Ericson, 2017. "The Size of the LGBT Population and the Magnitude of Antigay Sentiment Are Substantially Underestimated," Management Science, INFORMS, vol. 63(10), pages 3168-3186, October.
    4. Khai Xiang Chiong & Matthew Shum, 2019. "Random Projection Estimation of Discrete-Choice Models with Large Choice Sets," Management Science, INFORMS, vol. 65(1), pages 256-271, January.
    5. Yosuke Uno & Akira Sonoda & Masaki Bessho, 2021. "The Economics of Privacy: A Primer Especially for Policymakers," Bank of Japan Working Paper Series 21-E-11, Bank of Japan.
    6. Evan S. Totty & Thor Watson, 2024. "Privacy Protection and Accuracy: What Do We Know? Do We Know Things?? Let's Find Out!," NBER Chapters, in: Data Privacy Protection and the Conduct of Applied Research: Methods, Approaches and their Consequences, National Bureau of Economic Research, Inc.
    7. Alessandro Acquisti & Curtis Taylor & Liad Wagman, 2016. "The Economics of Privacy," Journal of Economic Literature, American Economic Association, vol. 54(2), pages 442-492, June.
    8. Inbal Dekel & Rachel Cummings & Ori Heffetz & Katrina Ligett, 2024. "Privacy Elasticity: A (Hopefully) Useful New Concept," NBER Chapters, in: Data Privacy Protection and the Conduct of Applied Research: Methods, Approaches and their Consequences, National Bureau of Economic Research, Inc.
    9. Bharadwaj, Prashant & Pai, Mallesh M. & Suziedelyte, Agne, 2017. "Mental health stigma," Economics Letters, Elsevier, vol. 159(C), pages 57-60.
    10. Ran Eilat & Kfir Eliaz Eliaz & Xiaosheng Mu, 2021. "Bayesian Privacy," Working Papers 2021-65, Princeton University. Economics Department..
    11. John M. Abowd & Ian M. Schmutte, 2017. "Revisiting the Economics of Privacy: Population Statistics and Confidentiality Protection as Public Goods," Working Papers 17-37, Center for Economic Studies, U.S. Census Bureau.
    12. Martin Browning & Thomas F. Crossley & Joachim Winter, 2014. "The Measurement of Household Consumption Expenditures," Annual Review of Economics, Annual Reviews, vol. 6(1), pages 475-501, August.
    13. Rachel Cummings & Federico Echenique & Adam Wierman, 2016. "The Empirical Implications of Privacy-Aware Choice," Operations Research, INFORMS, vol. 64(1), pages 67-78, February.
    14. Kobbi Nissim & Rann Smorodinsky & Moshe Tennenholtz, 2018. "Segmentation, Incentives, and Privacy," Mathematics of Operations Research, INFORMS, vol. 43(4), pages 1252-1268, November.
    15. McLean, Richard P. & Postlewaite, Andrew, 2017. "A dynamic non-direct implementation mechanism for interdependent value problems," Games and Economic Behavior, Elsevier, vol. 101(C), pages 34-48.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ron S. Jarmin & John M. Abowd & Robert Ashmead & Ryan Cumings-Menon & Nathan Goldschlag & Michael B. Hawes & Sallie Ann Keller & Daniel Kifer & Philip Leclerc & Jerome P. Reiter & Rolando A. Rodrígue, 2023. "An in-depth examination of requirements for disclosure risk assessment," Proceedings of the National Academy of Sciences, Proceedings of the National Academy of Sciences, vol. 120(43), pages 2220558120-, October.
    2. John M. Abowd & Ian M. Schmutte & William Sexton & Lars Vilhuber, 2019. "Suboptimal Provision of Privacy and Statistical Accuracy When They are Public Goods," Papers 1906.09353, arXiv.org.
    3. David R. Munro, 2021. "Consumer Behavior and Firm Volatility," Journal of Money, Credit and Banking, Blackwell Publishing, vol. 53(4), pages 845-873, June.
    4. Raj Chetty & John N. Friedman, 2019. "A Practical Method to Reduce Privacy Loss When Disclosing Statistics Based on Small Samples," AEA Papers and Proceedings, American Economic Association, vol. 109, pages 414-420, May.
    5. John M. Abowd & Robert Ashmead & Ryan Cumings-Menon & Simson Garfinkel & Micah Heineck & Christine Heiss & Robert Johns & Daniel Kifer & Philip Leclerc & Ashwin Machanavajjhala & Brett Moran & William, 2022. "The 2020 Census Disclosure Avoidance System TopDown Algorithm," Papers 2204.08986, arXiv.org.
    6. Martin Klein & Ricardo Moura & Bimal Sinha, 2021. "Multivariate Normal Inference based on Singly Imputed Synthetic Data under Plug-in Sampling," Sankhya B: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 83(1), pages 273-287, May.
    7. Daniel H. Weinberg & John M. Abowd & Robert F. Belli & Noel Cressie & David C. Folch & Scott H. Holan & Margaret C. Levenstein & Kristen M. Olson & Jerome P. Reiter & Matthew D. Shapiro & Jolene Smyth, 2017. "Effects of a Government-Academic Partnership: Has the NSF-Census Bureau Research Network Helped Improve the U.S. Statistical System?," Working Papers 17-59r, Center for Economic Studies, U.S. Census Bureau.
    8. Joshua Snoke & Gillian M. Raab & Beata Nowok & Chris Dibben & Aleksandra Slavkovic, 2018. "General and specific utility measures for synthetic data," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 181(3), pages 663-688, June.
    9. Gary Benedetto & Jordan C. Stanley & Evan Totty, 2018. "The Creation and Use of the SIPP Synthetic Beta v7.0," CES Technical Notes Series 18-03, Center for Economic Studies, U.S. Census Bureau.
    10. Toth Daniell, 2014. "Data Smearing: An Approach to Disclosure Limitation for Tabular Data," Journal of Official Statistics, Sciendo, vol. 30(4), pages 839-857, December.
    11. Soumya Mukherjee & Aratrika Mustafi & Aleksandra Slavkovi'c & Lars Vilhuber, 2023. "Assessing Utility of Differential Privacy for RCTs," Papers 2309.14581, arXiv.org.
    12. Hang J. Kim & Jörg Drechsler & Katherine J. Thompson, 2021. "Synthetic microdata for establishment surveys under informative sampling," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 184(1), pages 255-281, January.
    13. Nowok, Beata & Raab, Gillian M. & Dibben, Chris, 2016. "synthpop: Bespoke Creation of Synthetic Data in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 74(i11).
    14. Katherine B. Coffman & Lucas C. Coffman & Keith M. Marzilli Ericson, 2017. "The Size of the LGBT Population and the Magnitude of Antigay Sentiment Are Substantially Underestimated," Management Science, INFORMS, vol. 63(10), pages 3168-3186, October.
    15. Chongliang Luo & Md. Nazmul Islam & Natalie E. Sheils & John Buresh & Jenna Reps & Martijn J. Schuemie & Patrick B. Ryan & Mackenzie Edmondson & Rui Duan & Jiayi Tong & Arielle Marks-Anglin & Jiang Bi, 2022. "DLMM as a lossless one-shot algorithm for collaborative multi-site distributed linear mixed models," Nature Communications, Nature, vol. 13(1), pages 1-10, December.
    16. Illenin Kondo & Logan T. Lewis & Andrea Stella, 2021. "Establishment Size Distributions in the Synthetic LBD," CES Technical Notes Series 21-06, Center for Economic Studies, U.S. Census Bureau.
    17. Javier Miranda & Lars Vilhuber, 2016. "Using Partially Synthetic Microdata to Protect Sensitive Cells in Business Statistics," Working Papers 16-10, Center for Economic Studies, U.S. Census Bureau.
    18. Chu, Amanda M.Y. & Ip, Chun Yin & Lam, Benson S.Y. & So, Mike K.P., 2022. "Vine copula statistical disclosure control for mixed-type data," Computational Statistics & Data Analysis, Elsevier, vol. 176(C).
    19. Lalanne, Clément & Gadat, Sébastien, 2024. "Privately Learning Smooth Distributions on the Hypercube by Projections," TSE Working Papers 24-1505, Toulouse School of Economics (TSE).
    20. Tatiana Komarova & Denis Nekipelov & Evgeny Yakovlev, 2018. "Identification, data combination, and the risk of disclosure," Quantitative Economics, Econometric Society, vol. 9(1), pages 395-440, March.

    More about this item

    JEL classification:

    • C49 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods: Special Topics - - - Other
    • C89 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - Other
    • D89 - Microeconomics - - Information, Knowledge, and Uncertainty - - - Other
    • Z00 - Other Special Topics - - General - - - General

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nbr:nberwo:19433. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: the person in charge (email available below). General contact details of provider: https://edirc.repec.org/data/nberrus.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.