IDEAS home Printed from https://ideas.repec.org/a/pal/palcom/v10y2023i1d10.1057_s41599-023-01694-y.html
   My bibliography  Save this article

Releasing survey microdata with exact cluster locations and additional privacy safeguards

Author

Listed:
  • Till Koebe

    (Saarland University)

  • Alejandra Arias-Salazar

    (University of Costa Rica)

  • Timo Schmid

    (Otto Friedrich University Bamberg)

Abstract

Household survey programs around the world publish fine-granular georeferenced microdata to support research on the interdependence of human livelihoods and their surrounding environment. To safeguard the respondents’ privacy, micro-level survey data is usually (pseudo)-anonymized through deletion or perturbation procedures such as obfuscating the true location of data collection. This, however, poses a challenge to emerging approaches that augment survey data with auxiliary information on a local level. Here, we propose an alternative microdata dissemination strategy that leverages the utility of the original microdata with additional privacy safeguards through synthetically generated data using generative models. We back our proposal with experiments using data from the 2011 Costa Rican census and satellite-derived auxiliary information. Our strategy reduces the respondents’ re-identification risk for any number of disclosed attributes by 60–80% even under re-identification attempts.

Suggested Citation

  • Till Koebe & Alejandra Arias-Salazar & Timo Schmid, 2023. "Releasing survey microdata with exact cluster locations and additional privacy safeguards," Palgrave Communications, Palgrave Macmillan, vol. 10(1), pages 1-13, December.
  • Handle: RePEc:pal:palcom:v:10:y:2023:i:1:d:10.1057_s41599-023-01694-y
    DOI: 10.1057/s41599-023-01694-y
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1057/s41599-023-01694-y
    File Function: Abstract
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1057/s41599-023-01694-y?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Eszter Bokányi & Dániel Kondor & László Dobos & Tamás Sebők & József Stéger & István Csabai & Gábor Vattay, 2016. "Race, religion and the city: twitter word frequency patterns reveal dominant demographic dimensions in the United States," Palgrave Communications, Palgrave Macmillan, vol. 2(1), pages 1-9, December.
    2. Sophie Mitra & Debra L. Brucker, 2017. "Income Poverty and Multiple Deprivations in a High-Income Country: The Case of the United States," Social Science Quarterly, Southwestern Social Science Association, vol. 98(1), pages 37-56, March.
    3. Chad M. Topaz & Jude Higdon & Avriel Epps-Darling & Ethan Siau & Harper Kerkhoff & Shivani Mendiratta & Eric Young, 2022. "Race- and gender-based under-representation of creative contributors: art, fashion, film, and music," Palgrave Communications, Palgrave Macmillan, vol. 9(1), pages 1-11, December.
    4. Douglas R. Leasure & Warren C. Jochem & Eric M. Weber & Vincent Seaman & Andrew J. Tatem, 2020. "National population mapping from sparse survey data: A hierarchical Bayesian modeling framework to account for uncertainty," Proceedings of the National Academy of Sciences, Proceedings of the National Academy of Sciences, vol. 117(39), pages 24173-24179, September.
    5. Guanghua Chi & Han Fang & Sourav Chatterjee & Joshua E. Blumenstock, 2022. "Microestimates of wealth for all low- and middle-income countries," Proceedings of the National Academy of Sciences, Proceedings of the National Academy of Sciences, vol. 119(3), pages 2113658119-, January.
    6. Emily Aiken & Suzanne Bellue & Dean Karlan & Chris Udry & Joshua E. Blumenstock, 2022. "Machine learning and phone data can improve targeting of humanitarian aid," Nature, Nature, vol. 603(7903), pages 864-870, March.
    7. repec:bla:jorssa:v:180:y:2017:i:4:p:1163-1190 is not listed on IDEAS
    8. Joshua E. Blumenstock, 2018. "Estimating Economic Characteristics with Phone Data," AEA Papers and Proceedings, American Economic Association, vol. 108, pages 72-76, May.
    9. Luc Rocher & Julien M. Hendrickx & Yves-Alexandre de Montjoye, 2019. "Estimating the success of re-identifications in incomplete datasets using generative models," Nature Communications, Nature, vol. 10(1), pages 1-9, December.
    10. Till Koebe, 2020. "Better coverage, better outcomes? Mapping mobile network data to official statistics using satellite imagery and radio propagation modelling," PLOS ONE, Public Library of Science, vol. 15(11), pages 1-28, November.
    11. Tim Janke & Mohamed Ghanmi & Florian Steinke, 2021. "Implicit Generative Copulas," Papers 2109.14567, arXiv.org, revised Nov 2021.
    12. Byungduk Jeong & Wonjoon Lee & Deok-Soo Kim & Hayong Shin, 2016. "Copula-Based Approach to Synthetic Population Generation," PLOS ONE, Public Library of Science, vol. 11(8), pages 1-28, August.
    13. Templ, Matthias & Meindl, Bernhard & Kowarik, Alexander & Dupriez, Olivier, 2017. "Simulation of Synthetic Complex Data: The R Package simPop," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 79(i10).
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Aiken, Emily L. & Bedoya, Guadalupe & Blumenstock, Joshua E. & Coville, Aidan, 2023. "Program targeting with machine learning and mobile phone data: Evidence from an anti-poverty intervention in Afghanistan," Journal of Development Economics, Elsevier, vol. 161(C).
    2. Emily Aiken & Suzanne Bellue & Dean Karlan & Christopher R. Udry & Joshua Blumenstock, 2021. "Machine Learning and Mobile Phone Data Can Improve the Targeting of Humanitarian Assistance," NBER Working Papers 29070, National Bureau of Economic Research, Inc.
    3. Till Koebe & Alejandra Arias‐Salazar & Natalia Rojas‐Perilla & Timo Schmid, 2022. "Intercensal updating using structure‐preserving methods and satellite imagery," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 185(S2), pages 170-196, December.
    4. John R. J. Thompson & Longlong Feng & R. Mark Reesor & Chuck Grace, 2021. "Know Your Clients’ Behaviours: A Cluster Analysis of Financial Transactions," JRFM, MDPI, vol. 14(2), pages 1-29, January.
    5. Edward J. Oughton & Jatin Mathur, 2020. "Predicting cell phone adoption metrics using satellite imagery," Papers 2006.07311, arXiv.org, revised Jun 2021.
    6. Daniel Bjorkegren & Joshua E. Blumenstock & Samsun Knight, 2020. "Manipulation-Proof Machine Learning," Papers 2004.03865, arXiv.org.
    7. Ron S. Jarmin & John M. Abowd & Robert Ashmead & Ryan Cumings-Menon & Nathan Goldschlag & Michael B. Hawes & Sallie Ann Keller & Daniel Kifer & Philip Leclerc & Jerome P. Reiter & Rolando A. Rodrígue, 2023. "An in-depth examination of requirements for disclosure risk assessment," Proceedings of the National Academy of Sciences, Proceedings of the National Academy of Sciences, vol. 120(43), pages 2220558120-, October.
    8. Lee, Kamwoo & Braithwaite, Jeanine, 2022. "High-resolution poverty maps in Sub-Saharan Africa," World Development, Elsevier, vol. 159(C).
    9. Kibrom A Abay & Nishant Yonzan & Sikandra Kurdi & Kibrom Tafere, 2023. "Revisiting Poverty Trends and the Role of Social Protection Systems in Africa during the COVID-19 Pandemic," Journal of African Economies, Centre for the Study of African Economies, vol. 32(Supplemen), pages 44-68.
    10. Guanghua Chi & Han Fang & Sourav Chatterjee & Joshua E. Blumenstock, 2022. "Microestimates of wealth for all low- and middle-income countries," Proceedings of the National Academy of Sciences, Proceedings of the National Academy of Sciences, vol. 119(3), pages 2113658119-, January.
    11. D. Woods & A. Cunningham & C. E. Utazi & M. Bondarenko & L. Shengjie & G. E. Rogers & P. Koper & C. W. Ruktanonchai & E. zu Erbach-Schoenberg & A. J. Tatem & J. Steele & A. Sorichetta, 2022. "Exploring methods for mapping seasonal population changes using mobile phone data," Palgrave Communications, Palgrave Macmillan, vol. 9(1), pages 1-17, December.
    12. Aysegül Kayaoglu & Ghassan Baliki & Tilman Brück & Melodie Al Daccache & Dorothee Weiffen, 2023. "How to conduct impact evaluations in humanitarian and conflict settings," HiCN Working Papers 387, Households in Conflict Network.
    13. Oeindrila Dube & Joshua E. Blumenstock & Michael Callen & Michael J. Callen, 2022. "Measuring Religion from Behavior: Climate Shocks and Religious Adherence in Afghanistan," CESifo Working Paper Series 10114, CESifo.
    14. Francis Rathinam & Sayak Khatua & Zeba Siddiqui & Manya Malik & Pallavi Duggal & Samantha Watson & Xavier Vollenweider, 2021. "Using big data for evaluating development outcomes: A systematic map," Campbell Systematic Reviews, John Wiley & Sons, vol. 17(3), September.
    15. Flores Lanza, Micaela & Leonard, Alycia & Hirmer, Stephanie, 2024. "Geospatial and socioeconomic prediction of value-driven clean cooking uptake," Renewable and Sustainable Energy Reviews, Elsevier, vol. 192(C).
    16. Mathieu J. P. Poirier & Karen A. Grépin & Michel Grignon, 2020. "Approaches and Alternatives to the Wealth Index to Measure Socioeconomic Status Using Survey Data: A Critical Interpretive Synthesis," Social Indicators Research: An International and Interdisciplinary Journal for Quality-of-Life Measurement, Springer, vol. 148(1), pages 1-46, February.
    17. Nathan Ratledge & Gabriel Cadamuro & Brandon De la Cuesta & Matthieu Stigler & Marshall Burke, 2021. "Using Satellite Imagery and Machine Learning to Estimate the Livelihood Impact of Electricity Access," NBER Working Papers 29237, National Bureau of Economic Research, Inc.
    18. Panle Jia Barwick & Yanyan Liu & Eleonora Patacchini & Qi Wu, 2023. "Information, Mobile Communication, and Referral Effects," American Economic Review, American Economic Association, vol. 113(5), pages 1170-1207, May.
    19. Till Koebe & Zinnya Villar & Brahmani Nutakki & Nursulu Sagimbayeva & Ingmar Weber, 2024. "Unveiling local patterns of child pornography consumption in France using Tor," Palgrave Communications, Palgrave Macmillan, vol. 11(1), pages 1-11, December.
    20. Rehse, Dominik & Tremöhlen, Felix, 2020. "Fostering participation in digital public health interventions: The case of digital contact tracing," ZEW Discussion Papers 20-076, ZEW - Leibniz Centre for European Economic Research.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:pal:palcom:v:10:y:2023:i:1:d:10.1057_s41599-023-01694-y. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: https://www.nature.com/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.