IDEAS home Printed from https://ideas.repec.org/a/bla/istatr/v81y2013i3p426-455.html
   My bibliography  Save this article

A Summary of Attack Methods and Confidentiality Protection Measures for Fully Automated Remote Analysis Systems

Author

Listed:
  • Christine M. O'Keefe
  • James O. Chipperfield

Abstract

This paper presents a summary of the current state of research on reducing the risk of disclosure related to what may be called “non‐traditional” outputs for statistical agencies. Whereas traditional outputs include frequency tables, magnitude tables and public use microdata files, non‐traditional outputs include outputs associated with user‐defined exploratory data analysis and statistical modelling offered through a remote analysis system. In remote analysis, a system accepts a query from an analyst, runs it on data held in a secure environment, and then returns the results to the analyst. There is a considerable current interest in fully automated remote analysis systems, because these have the potential to enable agencies to respond to growing researcher demand for more and more detailed data. In practice, a range of protective measures is most effective in remote analysis, and the choice of this range depends heavily on the context including the regulatory environment, the dataset itself, and the purpose of the access.This paper provides a summary of known attack methods on remote analysis system outputs, focussing on exploratory data analysis and linear regression. The paper also summarizes the associated suggested protective measures designed to prevent disclosures and thwart attacks in fully automated remote analysis systems. Some commentary on the attacks and measures is provided. Cet article présente l'état actuel des connaissances dans les problèmes de risque de divulgation d'information sensible via ce que l'on pourrait appeler les “produits non‐traditionnels” des agences statistiques. Alors que la production traditionnelle de ces agences prend la forme de tables de fréquences, de tables de grandeurs, et de fichiers de micro‐données, les activités nouvelles incluent l'accès à distance à des analyses de données et des modélisations statistiques définies par l'usager lui‐même. Les systèmes permettant l'accès à ces analyses à distance acceptent les demandes d'analyses des utilisateurs, exécutent celles‐ci dans un environnement sécurisé, et renvoient les résultats aux utilisateurs. Les systèmes d'analyse à distance entièrement automatisés sont depuis peu l'objet d'une attention considérable, car ils permettent aux agences de répondre à une demande sans cesse croissante. En pratique, un ensemble de mesures de protection de données efficaces dans ce domaine des analyses à distance dépend du contexte: environnement réglementaire, nature des données, et motifs de l'accès demandé. Cet article fournit un aperçu sommaire des attaques connues, en mettant l'accent sur les analyse exploratoires de données et la régression linéaire. Il décrit et commente également les moyens de se protéger contre la divulgation d'information et de repousser les attaques dans le cadre des systèmes d'analyse à distance entièrement automatisés.

Suggested Citation

  • Christine M. O'Keefe & James O. Chipperfield, 2013. "A Summary of Attack Methods and Confidentiality Protection Measures for Fully Automated Remote Analysis Systems," International Statistical Review, International Statistical Institute, vol. 81(3), pages 426-455, December.
  • Handle: RePEc:bla:istatr:v:81:y:2013:i:3:p:426-455
    DOI: 10.1111/insr.12021
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/insr.12021
    Download Restriction: no

    File URL: https://libkey.io/10.1111/insr.12021?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. C. J. Skinner & M. J. Elliot, 2002. "A measure of disclosure risk for microdata," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 64(4), pages 855-867, October.
    2. Skinner, Chris J. & Shlomo, Natalie, 2008. "Assessing identification risk in survey microdata using log-linear models," LSE Research Online Documents on Economics 39112, London School of Economics and Political Science, LSE Library.
    3. Natalie Shlomo, 2007. "Statistical Disclosure Control Methods for Census Frequency Tables," International Statistical Review, International Statistical Institute, vol. 75(2), pages 199-217, August.
    4. Skinner, Chris & Shlomo, Natalie, 2008. "Assessing Identification Risk in Survey Microdata Using Log-Linear Models," Journal of the American Statistical Association, American Statistical Association, vol. 103(483), pages 989-1001.
    5. Duncan, George & Lambert, Diane, 1989. "The Risk of Disclosure for Microdata," Journal of Business & Economic Statistics, American Statistical Association, vol. 7(2), pages 207-217, April.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Chipperfield James O., 2014. "Disclosure-Protected Inference with Linked Microdata Using a Remote Analysis Server," Journal of Official Statistics, Sciendo, vol. 30(1), pages 123-146, March.
    2. Chipperfield James & Newman John & Thompson Gwenda & Ma Yue & Lin Yan-Xia, 2019. "Prospects for Protecting Business Microdata when Releasing Population Totals via a Remote Server," Journal of Official Statistics, Sciendo, vol. 35(2), pages 319-336, June.
    3. Felix Ritchie & Jim Smith, 2019. "Confidentiality and linked data," Papers 1907.06465, arXiv.org.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. James Jackson & Robin Mitra & Brian Francis & Iain Dove, 2022. "Using saturated count models for user‐friendly synthesis of large confidential administrative databases," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 185(4), pages 1613-1643, October.
    2. Shlomo, Natalie & Skinner, Chris, 2022. "Measuring risk of re-identification in microdata: state-of-the art and new directions," LSE Research Online Documents on Economics 117168, London School of Economics and Political Science, LSE Library.
    3. Li‐Chun Zhang & Gustav Haraldsen, 2022. "Secure big data collection and processing: Framework, means and opportunities," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 185(4), pages 1541-1559, October.
    4. Favaro, Stefano & Panero, Francesca & Rigon, Tommaso, 2021. "Bayesian nonparametric disclosure risk assessment," LSE Research Online Documents on Economics 117305, London School of Economics and Political Science, LSE Library.
    5. Chipperfield James O., 2014. "Disclosure-Protected Inference with Linked Microdata Using a Remote Analysis Server," Journal of Official Statistics, Sciendo, vol. 30(1), pages 123-146, March.
    6. Drechsler, Jörg & Reiter, Jerome P., 2011. "An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets," Computational Statistics & Data Analysis, Elsevier, vol. 55(12), pages 3232-3243, December.
    7. Eurosystem Household Finance and Consumption Network, 2013. "The Eurosystem Household Finance and Consumption Survey - Methodological report," Statistics Paper Series 1, European Central Bank.
    8. Natalie Shlomo & Chris Skinner, 2022. "Measuring risk of re‐identification in microdata: State‐of‐the art and new directions," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 185(4), pages 1644-1662, October.
    9. Sergio I. Prada & Claudia González-Martínez & Joshua Borton & Johannes Fernandes-Huessy & Craig Holden & Elizabeth Hair & and Tim Mulcahy, 2011. "Avoiding Disclosure of Individually Identifiable Health Information," SAGE Open, , vol. 1(3), pages 21582440114, October.
    10. Skinner, Chris J., 2007. "The probability of identification: applying ideas from forensic statistics to disclosure risk assessment," LSE Research Online Documents on Economics 39105, London School of Economics and Political Science, LSE Library.
    11. Krenzke Tom & Gentleman Jane F. & Li Jianzhu & Moriarity Chris, 2013. "Addressing Disclosure Concerns and Analysis Demands in a Real-Time Online Analytic System," Journal of Official Statistics, Sciendo, vol. 29(1), pages 99-124, March.
    12. Iwona Bąk & Katarzyna Cheba, 2022. "Green Transformation: Applying Statistical Data Analysis to a Systematic Literature Review," Energies, MDPI, vol. 16(1), pages 1-22, December.
    13. C. J. Skinner, 2007. "The probability of identification: applying ideas from forensic statistics to disclosure risk assessment," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 170(1), pages 195-212, January.
    14. Prada, Sergio I & Gonzalez, Claudia & Borton, Joshua & Fernandes-Huessy, Johannes & Holden, Craig & Hair, Elizabeth & Mulcahy, Tim, 2011. "Avoiding disclosure of individually identifiable health information: a literature review," MPRA Paper 35463, University Library of Munich, Germany.
    15. Cinzia Carota & Maurizio Filippone & Silvia Polettini, 2022. "Assessing Bayesian Semi‐Parametric Log‐Linear Models: An Application to Disclosure Risk Estimation," International Statistical Review, International Statistical Institute, vol. 90(1), pages 165-183, April.
    16. Tapan K. Nayak & Samson A. Adeshiyan, 2016. "On Invariant Post-randomization for Statistical Disclosure Control," International Statistical Review, International Statistical Institute, vol. 84(1), pages 26-42, April.
    17. Xiao-Bai Li & Jialun Qin, 2017. "Anonymizing and Sharing Medical Text Records," Information Systems Research, INFORMS, vol. 28(2), pages 332-352, June.
    18. Shlomo, Natalie & Skinner, Chris J., 2010. "Assessing the protection provided by misclassification-based disclosure limitation methods for survey microdata," LSE Research Online Documents on Economics 39119, London School of Economics and Political Science, LSE Library.
    19. Skinner, Chris J. & Shlomo, Natalie, 2008. "Assessing identification risk in survey microdata using log-linear models," LSE Research Online Documents on Economics 39112, London School of Economics and Political Science, LSE Library.
    20. Sumit Dutta Chowdhury & George T. Duncan & Ramayya Krishnan & Stephen F. Roehrig & Sumitra Mukherjee, 1999. "Disclosure Detection in Multivariate Categorical Databases: Auditing Confidentiality Protection Through Two New Matrix Operators," Management Science, INFORMS, vol. 45(12), pages 1710-1723, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:istatr:v:81:y:2013:i:3:p:426-455. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: https://edirc.repec.org/data/isiiinl.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.