IDEAS home Printed from https://ideas.repec.org/p/iab/iabdpa/200720.html
   My bibliography  Save this paper

Releasing multiply-imputed synthetic data generated in two stages to protect confidentiality

Author

Listed:
  • Reiter, Jerome P.
  • Drechsler, Jörg

Abstract

"To protect the cofidentiality of survey respondents' identities and sensitive attributes, statistical agencies can release data in which cofidential values are replaced with multiple imputations. These are called synthetic data. We propose a two-stage approach to generating synthetic data that enables agencies to release different numbers of imputations for different variables. Generation in two stages can reduce computational burdens, decrease disclosure risk, and increase inferential accuracy relative to generation in one stage. We present methods for obtaining inferences from such data. We describe the application of two stage synthesis to creating a public use file for a German business database." (Author's abstract, IAB-Doku) ((en))

Suggested Citation

  • Reiter, Jerome P. & Drechsler, Jörg, 2007. "Releasing multiply-imputed synthetic data generated in two stages to protect confidentiality," IAB-Discussion Paper 200720, Institut für Arbeitsmarkt- und Berufsforschung (IAB), Nürnberg [Institute for Employment Research, Nuremberg, Germany].
  • Handle: RePEc:iab:iabdpa:200720
    as

    Download full text from publisher

    File URL: https://doku.iab.de/discussionpapers/2007/dp2007.pdf
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Donald B. Rubin, 2003. "Nested multiple imputation of NMES via partially incompatible MCMC," Statistica Neerlandica, Netherlands Society for Statistics and Operations Research, vol. 57(1), pages 3-18, February.
    2. Drechsler, Jörg & Dundler, Agnes & Bender, Stefan & Rässler, Susanne & Zwick, Thomas, 2007. "A new approach for disclosure control in the IAB Establishment Panel : multiple imputation for a better data access," IAB-Discussion Paper 200711, Institut für Arbeitsmarkt- und Berufsforschung (IAB), Nürnberg [Institute for Employment Research, Nuremberg, Germany].
    3. John M. Abowd & Julia I. Lane, 2004. "New Approaches to Confidentiality Protection Synthetic Data, Remote Access and Research Data Centers," Longitudinal Employer-Household Dynamics Technical Papers 2004-03, Center for Economic Studies, U.S. Census Bureau.
    4. C. J. Skinner & M. J. Elliot, 2002. "A measure of disclosure risk for microdata," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 64(4), pages 855-867, October.
    5. Karr, A.F. & Kohnen, C.N. & Oganian, A. & Reiter, J.P. & Sanil, A.P., 2006. "A Framework for Evaluating the Utility of Data Altered to Protect Confidentiality," The American Statistician, American Statistical Association, vol. 60, pages 224-232, August.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Jörg Höhne, 2008. "Anonymisierungsverfahren für Paneldaten," AStA Wirtschafts- und Sozialstatistisches Archiv, Springer;Deutsche Statistische Gesellschaft - German Statistical Society, vol. 2(3), pages 259-275, October.
    2. Humera Razzak & Christian Heumann, 2019. "Hybrid Multiple Imputation In A Large Scale Complex Survey," Statistics in Transition New Series, Polish Statistical Association, vol. 20(4), pages 33-58, December.
    3. Razzak Humera & Heumann Christian, 2019. "Hybrid Multiple Imputation In A Large Scale Complex Survey," Statistics in Transition New Series, Statistics Poland, vol. 20(4), pages 33-58, December.
    4. Jan Pablo Burgard & Jan-Philipp Kolb & Hariolf Merkle & Ralf Münnich, 2017. "Synthetic data for open and reproducible methodological research in social sciences and official statistics," AStA Wirtschafts- und Sozialstatistisches Archiv, Springer;Deutsche Statistische Gesellschaft - German Statistical Society, vol. 11(3), pages 233-244, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jahangir Alam M. & Dostie Benoit & Drechsler Jörg & Vilhuber Lars, 2020. "Applying data synthesis for longitudinal business data across three countries," Statistics in Transition New Series, Polish Statistical Association, vol. 21(4), pages 212-236, August.
    2. Loong Bronwyn & Rubin Donald B., 2017. "Multiply-Imputed Synthetic Data: Advice to the Imputer," Journal of Official Statistics, Sciendo, vol. 33(4), pages 1005-1019, December.
    3. James Jackson & Robin Mitra & Brian Francis & Iain Dove, 2022. "Using saturated count models for user‐friendly synthesis of large confidential administrative databases," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 185(4), pages 1613-1643, October.
    4. Jörg Drechsler, 2015. "Multiple Imputation of Multilevel Missing Data—Rigor Versus Simplicity," Journal of Educational and Behavioral Statistics, , vol. 40(1), pages 69-95, February.
    5. Drechsler, Jörg & Reiter, Jerome P., 2011. "An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets," Computational Statistics & Data Analysis, Elsevier, vol. 55(12), pages 3232-3243, December.
    6. Simon Grund & Oliver Lüdtke & Alexander Robitzsch, 2018. "Multiple Imputation of Missing Data at Level 2: A Comparison of Fully Conditional and Joint Modeling in Multilevel Designs," Journal of Educational and Behavioral Statistics, , vol. 43(3), pages 316-353, June.
    7. Claire McKay Bowen & Fang Liu & Bingyue Su, 2021. "Differentially private data release via statistical election to partition sequentially," METRON, Springer;Sapienza Università di Roma, vol. 79(1), pages 1-31, April.
    8. Hammon, Angelina & Zinn, Sabine, 2020. "Multiple imputation of binary multilevel missing not at random data," EconStor Open Access Articles and Book Chapters, ZBW - Leibniz Information Centre for Economics, vol. 69(3), pages 547-564.
    9. Templ Matthias, 2015. "Quality Indicators for Statistical Disclosure Methods: A Case Study on the Structure of Earnings Survey," Journal of Official Statistics, Sciendo, vol. 31(4), pages 737-761, December.
    10. Li‐Chun Zhang & Gustav Haraldsen, 2022. "Secure big data collection and processing: Framework, means and opportunities," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 185(4), pages 1541-1559, October.
    11. Natalie Shlomo & Chris Skinner, 2022. "Measuring risk of re‐identification in microdata: State‐of‐the art and new directions," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 185(4), pages 1644-1662, October.
    12. Amanda M. Y. Chu & Benson S. Y. Lam & Agnes Tiwari & Mike K. P. So, 2019. "An Empirical Study of Applying Statistical Disclosure Control Methods to Public Health Research," IJERPH, MDPI, vol. 16(22), pages 1-17, November.
    13. Maurice Brandt & Dirk Oberschachtsiek & Ramona Pohl, 2008. "Neue Datenangebote in den Forschungsdatenzentren – Betriebs- und Unternehmensdaten im Längsschnitt –," AStA Wirtschafts- und Sozialstatistisches Archiv, Springer;Deutsche Statistische Gesellschaft - German Statistical Society, vol. 2(3), pages 193-207, October.
    14. Frauke Kreuter, 2013. "Facing the Nonresponse Challenge," The ANNALS of the American Academy of Political and Social Science, , vol. 645(1), pages 23-35, January.
    15. Geoffrey M. Jacquez & Aleksander Essex & Andrew Curtis & Betsy Kohler & Recinda Sherman & Khaled El Emam & Chen Shi & Andy Kaufmann & Linda Beale & Thomas Cusick & Daniel Goldberg & Pierre Goovaerts, 2017. "Geospatial cryptography: enabling researchers to access private, spatially referenced, human subjects data for cancer control and prevention," Journal of Geographical Systems, Springer, vol. 19(3), pages 197-220, July.
    16. Gerd Ronning, 2014. "Vertraulichkeit und Verfügbarkeit von Mikrodaten," IAW Discussion Papers 101, Institut für Angewandte Wirtschaftsforschung (IAW).
    17. Tatiana Komarova & Denis Nekipelov & Evgeny Yakovlev, 2018. "Identification, data combination, and the risk of disclosure," Quantitative Economics, Econometric Society, vol. 9(1), pages 395-440, March.
    18. Burns, Christopher & Prager, Daniel & Ghosh, Sujit & Goodwin, Barry, 2015. "Imputing for Missing Data in the ARMS Household Section: A Multivariate Imputation Approach," 2015 AAEA & WAEA Joint Annual Meeting, July 26-28, San Francisco, California 205291, Agricultural and Applied Economics Association.
    19. repec:iab:iabfme:200702(de is not listed on IDEAS
    20. Ton Waal & Jacco Daalmans, 2024. "Calibrated imputation for multivariate categorical data," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 108(3), pages 545-576, September.
    21. Khaled Khatab & Maruf A Raheem & Benn Sartorius & Mubarak Ismail, 2019. "Prevalence and risk factors for child labour and violence against children in Egypt using Bayesian geospatial modelling with multiple imputation," PLOS ONE, Public Library of Science, vol. 14(5), pages 1-20, May.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:iab:iabdpa:200720. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: IAB, Geschäftsbereich Wissenschaftliche Fachinformation und Bibliothek (email available below). General contact details of provider: https://edirc.repec.org/data/iabbbde.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.