IDEAS home Printed from https://ideas.repec.org/a/vrs/offsta/v30y2014i1p123-146n7.html
   My bibliography  Save this article

Disclosure-Protected Inference with Linked Microdata Using a Remote Analysis Server

Author

Listed:
  • Chipperfield James O.

    (Senior Research Fellow, National Institute for Applied Statistics Research Australia, University of Wollongong, and Assistant Director, Methodology Division, Australian Bureau of Statistics, Canberra, ACT, 2617, Australia)

Abstract

Large amounts of microdata are collected by data custodians in the form of censuses and administrative records. Often, data custodians will collect different information on the same individual. Many important questions can be answered by linking microdata collected by different data custodians. For this reason, there is very strong demand from analysts, within government, business, and universities, for linked microdata. However, many data custodians are legally obliged to ensure the risk of disclosing information about a person or organisation is acceptably low. Different authors have considered the problem of how to facilitate reliable statistical inference from analysis of linked microdata while ensuring that the risk of disclosure is acceptably low. This article considers the problem from the perspective of an Integrating Authority that, by definition, is trusted to link the microdata and to facilitate analysts’ access to the linked microdata via a remote server, which allows analysts to fit models and view the statistical output without being able to observe the underlying linked microdata. One disclosure risk that must be managed by an Integrating Authority is that one data custodian may use the microdata it supplied to the Integrating Authority and statistical output released from the remote server to disclose information about a person or organisation that was supplied by the other data custodian. This article considers analysis of only binary variables. The utility and disclosure risk of the proposed method are investigated both in a simulation and using a real example. This article shows that some popular protections against disclosure (dropping records, rounding regression coefficients or imposing restrictions on model selection) can be ineffective in the above setting.

Suggested Citation

  • Chipperfield James O., 2014. "Disclosure-Protected Inference with Linked Microdata Using a Remote Analysis Server," Journal of Official Statistics, Sciendo, vol. 30(1), pages 123-146, March.
  • Handle: RePEc:vrs:offsta:v:30:y:2014:i:1:p:123-146:n:7
    DOI: 10.2478/jos-2014-0007
    as

    Download full text from publisher

    File URL: https://doi.org/10.2478/jos-2014-0007
    Download Restriction: no

    File URL: https://libkey.io/10.2478/jos-2014-0007?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Christine N. Kohnen & Jerome P. Reiter, 2009. "Multiple imputation for combining confidential data owned by two agencies," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 172(2), pages 511-528, April.
    2. Skinner, Chris & Shlomo, Natalie, 2008. "Assessing Identification Risk in Survey Microdata Using Log-Linear Models," Journal of the American Statistical Association, American Statistical Association, vol. 103(483), pages 989-1001.
    3. Christine M. O'Keefe & James O. Chipperfield, 2013. "A Summary of Attack Methods and Confidentiality Protection Measures for Fully Automated Remote Analysis Systems," International Statistical Review, International Statistical Institute, vol. 81(3), pages 426-455, December.
    4. Skinner, Chris J. & Shlomo, Natalie, 2008. "Assessing identification risk in survey microdata using log-linear models," LSE Research Online Documents on Economics 39112, London School of Economics and Political Science, LSE Library.
    5. Natalie Shlomo, 2007. "Statistical Disclosure Control Methods for Census Frequency Tables," International Statistical Review, International Statistical Institute, vol. 75(2), pages 199-217, August.
    6. Lawrence H. Cox & Alan F. Karr & Satkartar K. Kinney, 2011. "Risk‐Utility Paradigms for Statistical Disclosure Limitation: How to Think, But Not How to Act," International Statistical Review, International Statistical Institute, vol. 79(2), pages 160-183, August.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Christine M. O'Keefe & James O. Chipperfield, 2013. "A Summary of Attack Methods and Confidentiality Protection Measures for Fully Automated Remote Analysis Systems," International Statistical Review, International Statistical Institute, vol. 81(3), pages 426-455, December.
    2. Drechsler, Jörg & Reiter, Jerome P., 2011. "An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets," Computational Statistics & Data Analysis, Elsevier, vol. 55(12), pages 3232-3243, December.
    3. Eurosystem Household Finance and Consumption Network, 2013. "The Eurosystem Household Finance and Consumption Survey - Methodological report," Statistics Paper Series 1, European Central Bank.
    4. Li‐Chun Zhang & Gustav Haraldsen, 2022. "Secure big data collection and processing: Framework, means and opportunities," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 185(4), pages 1541-1559, October.
    5. James Jackson & Robin Mitra & Brian Francis & Iain Dove, 2022. "Using saturated count models for user‐friendly synthesis of large confidential administrative databases," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 185(4), pages 1613-1643, October.
    6. Sergio I. Prada & Claudia González-Martínez & Joshua Borton & Johannes Fernandes-Huessy & Craig Holden & Elizabeth Hair & and Tim Mulcahy, 2011. "Avoiding Disclosure of Individually Identifiable Health Information," SAGE Open, , vol. 1(3), pages 21582440114, October.
    7. Krenzke Tom & Gentleman Jane F. & Li Jianzhu & Moriarity Chris, 2013. "Addressing Disclosure Concerns and Analysis Demands in a Real-Time Online Analytic System," Journal of Official Statistics, Sciendo, vol. 29(1), pages 99-124, March.
    8. Iwona Bąk & Katarzyna Cheba, 2022. "Green Transformation: Applying Statistical Data Analysis to a Systematic Literature Review," Energies, MDPI, vol. 16(1), pages 1-22, December.
    9. Jerome P. Reiter, 2009. "Using Multiple Imputation to Integrate and Disseminate Confidential Microdata," International Statistical Review, International Statistical Institute, vol. 77(2), pages 179-195, August.
    10. Prada, Sergio I & Gonzalez, Claudia & Borton, Joshua & Fernandes-Huessy, Johannes & Holden, Craig & Hair, Elizabeth & Mulcahy, Tim, 2011. "Avoiding disclosure of individually identifiable health information: a literature review," MPRA Paper 35463, University Library of Munich, Germany.
    11. Cinzia Carota & Maurizio Filippone & Silvia Polettini, 2022. "Assessing Bayesian Semi‐Parametric Log‐Linear Models: An Application to Disclosure Risk Estimation," International Statistical Review, International Statistical Institute, vol. 90(1), pages 165-183, April.
    12. Favaro, Stefano & Panero, Francesca & Rigon, Tommaso, 2021. "Bayesian nonparametric disclosure risk assessment," LSE Research Online Documents on Economics 117305, London School of Economics and Political Science, LSE Library.
    13. Shlomo, Natalie & Skinner, Chris, 2022. "Measuring risk of re-identification in microdata: state-of-the art and new directions," LSE Research Online Documents on Economics 117168, London School of Economics and Political Science, LSE Library.
    14. Chipperfield James & Newman John & Thompson Gwenda & Ma Yue & Lin Yan-Xia, 2019. "Prospects for Protecting Business Microdata when Releasing Population Totals via a Remote Server," Journal of Official Statistics, Sciendo, vol. 35(2), pages 319-336, June.
    15. Dong Hua & Meeden Glen, 2016. "Constructing Synthetic Samples," Journal of Official Statistics, Sciendo, vol. 32(1), pages 113-127, March.
    16. Felix Ritchie & Jim Smith, 2019. "Confidentiality and linked data," Papers 1907.06465, arXiv.org.
    17. Shlomo Natalie & Antal Laszlo & Elliot Mark, 2015. "Measuring Disclosure Risk and Data Utility for Flexible Table Generators," Journal of Official Statistics, Sciendo, vol. 31(2), pages 305-324, June.
    18. John M. Abowd & Ian M. Schmutte, 2015. "Economic Analysis and Statistical Disclosure Limitation," Brookings Papers on Economic Activity, Economic Studies Program, The Brookings Institution, vol. 50(1 (Spring), pages 221-293.
    19. Hang J. Kim & Jörg Drechsler & Katherine J. Thompson, 2021. "Synthetic microdata for establishment surveys under informative sampling," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 184(1), pages 255-281, January.
    20. Goldstein Harvey & Shlomo Natalie, 2020. "A Probabilistic Procedure for Anonymisation, for Assessing the Risk of Re-identification and for the Analysis of Perturbed Data Sets," Journal of Official Statistics, Sciendo, vol. 36(1), pages 89-115, March.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:vrs:offsta:v:30:y:2014:i:1:p:123-146:n:7. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Peter Golla (email available below). General contact details of provider: https://www.sciendo.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.