IDEAS home Printed from https://ideas.repec.org/p/ehl/lserod/39119.html
   My bibliography  Save this paper

Assessing the protection provided by misclassification-based disclosure limitation methods for survey microdata

Author

Listed:
  • Shlomo, Natalie
  • Skinner, Chris J.

Abstract

Government statistical agencies often apply statistical disclosure limitation techniques to survey microdata to protect the confidentiality of respondents. There is a need for valid and practical ways to assess the protection provided. This paper develops some simple methods for disclosure limitation techniques which perturb the values of categorical identifying variables. The methods are applied in numerical experiments based upon census data from the United Kingdom which are subject to two perturbation techniques: data swapping (random and targeted) and the post randomization method. Some simplifying approximations to the measure of risk are found to work well in capturing the impacts of these techniques. These approximations provide simple extensions of existing risk assessment methods based upon Poisson log-linear models. A numerical experiment is also undertaken to assess the impact of multivariate misclassification with an increasing number of identifying variables. It is found that the misclassification dominates the usual monotone increasing relationship between this number and risk so that the risk eventually declines, implying less sensitivity of risk to choice of identifying variables. The methods developed in this paper may also be used to obtain more realistic assessments of risk which take account of the kinds of measurement and other nonsampling errors commonly arising in surveys.

Suggested Citation

  • Shlomo, Natalie & Skinner, Chris J., 2010. "Assessing the protection provided by misclassification-based disclosure limitation methods for survey microdata," LSE Research Online Documents on Economics 39119, London School of Economics and Political Science, LSE Library.
  • Handle: RePEc:ehl:lserod:39119
    as

    Download full text from publisher

    File URL: http://eprints.lse.ac.uk/39119/
    File Function: Open access version.
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Paass, Gerhard, 1988. "Disclosure Risk and Disclosure Avoidance for Microdata," Journal of Business & Economic Statistics, American Statistical Association, vol. 6(4), pages 487-500, October.
    2. Duncan, George & Lambert, Diane, 1989. "The Risk of Disclosure for Microdata," Journal of Business & Economic Statistics, American Statistical Association, vol. 7(2), pages 207-217, April.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Tapan K. Nayak & Samson A. Adeshiyan, 2016. "On Invariant Post-randomization for Statistical Disclosure Control," International Statistical Review, International Statistical Institute, vol. 84(1), pages 26-42, April.
    2. Goldstein Harvey & Shlomo Natalie, 2020. "A Probabilistic Procedure for Anonymisation, for Assessing the Risk of Re-identification and for the Analysis of Perturbed Data Sets," Journal of Official Statistics, Sciendo, vol. 36(1), pages 89-115, March.
    3. Bernard Baffour & James Raymer, 2019. "Estimating multiregional survivorship probabilities for sparse data: An application to immigrant populations in Australia, 1981–2011," Demographic Research, Max Planck Institute for Demographic Research, Rostock, Germany, vol. 40(18), pages 463-502.
    4. Shlomo, Natalie & Skinner, Chris, 2022. "Measuring risk of re-identification in microdata: state-of-the art and new directions," LSE Research Online Documents on Economics 117168, London School of Economics and Political Science, LSE Library.
    5. Krenzke Tom & Gentleman Jane F. & Li Jianzhu & Moriarity Chris, 2013. "Addressing Disclosure Concerns and Analysis Demands in a Real-Time Online Analytic System," Journal of Official Statistics, Sciendo, vol. 29(1), pages 99-124, March.
    6. Natalie Shlomo & Chris Skinner, 2022. "Measuring risk of re‐identification in microdata: State‐of‐the art and new directions," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 185(4), pages 1644-1662, October.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Natalie Shlomo & Chris Skinner, 2022. "Measuring risk of re‐identification in microdata: State‐of‐the art and new directions," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 185(4), pages 1644-1662, October.
    2. Braathen, Christian & Thorsen, Inge & Ubøe, Jan, 2022. "Adjusting for Cell Suppression in Commuting Trip Data," Discussion Papers 2022/13, Norwegian School of Economics, Department of Business and Management Science.
    3. Kokolakis, G. & Fouskakis, D., 2009. "Importance partitioning in micro-aggregation," Computational Statistics & Data Analysis, Elsevier, vol. 53(7), pages 2439-2445, May.
    4. Skinner, Chris J., 2007. "The probability of identification: applying ideas from forensic statistics to disclosure risk assessment," LSE Research Online Documents on Economics 39105, London School of Economics and Political Science, LSE Library.
    5. C. J. Skinner, 2007. "The probability of identification: applying ideas from forensic statistics to disclosure risk assessment," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 170(1), pages 195-212, January.
    6. Shlomo, Natalie & Skinner, Chris, 2022. "Measuring risk of re-identification in microdata: state-of-the art and new directions," LSE Research Online Documents on Economics 117168, London School of Economics and Political Science, LSE Library.
    7. Xiao-Bai Li & Jialun Qin, 2017. "Anonymizing and Sharing Medical Text Records," Information Systems Research, INFORMS, vol. 28(2), pages 332-352, June.
    8. Skinner, Chris J. & Shlomo, Natalie, 2008. "Assessing identification risk in survey microdata using log-linear models," LSE Research Online Documents on Economics 39112, London School of Economics and Political Science, LSE Library.
    9. Sumit Dutta Chowdhury & George T. Duncan & Ramayya Krishnan & Stephen F. Roehrig & Sumitra Mukherjee, 1999. "Disclosure Detection in Multivariate Categorical Databases: Auditing Confidentiality Protection Through Two New Matrix Operators," Management Science, INFORMS, vol. 45(12), pages 1710-1723, December.
    10. Gilboa-Freedman, Gail & Smorodinsky, Rann, 2020. "On the properties that characterize privacy," Mathematical Social Sciences, Elsevier, vol. 103(C), pages 59-68.
    11. James Jackson & Robin Mitra & Brian Francis & Iain Dove, 2022. "Using saturated count models for user‐friendly synthesis of large confidential administrative databases," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 185(4), pages 1613-1643, October.
    12. Shaobo Li & Matthew J. Schneider & Yan Yu & Sachin Gupta, 2023. "Reidentification Risk in Panel Data: Protecting for k -Anonymity," Information Systems Research, INFORMS, vol. 34(3), pages 1066-1088, September.
    13. George Kokolakis & Dimitris Fouskakis, 2008. "On the Discrepancy Measures for the Optimal Equal Probability Partitioning in Bayesian Multivariate Micro-Aggregation," Journal of Classification, Springer;The Classification Society, vol. 25(2), pages 209-224, November.
    14. Christine M. O'Keefe & James O. Chipperfield, 2013. "A Summary of Attack Methods and Confidentiality Protection Measures for Fully Automated Remote Analysis Systems," International Statistical Review, International Statistical Institute, vol. 81(3), pages 426-455, December.
    15. Xiao-Bai Li & Sumit Sarkar, 2013. "Class-Restricted Clustering and Microperturbation for Data Privacy," Management Science, INFORMS, vol. 59(4), pages 796-812, April.
    16. Xiao-Bai Li & Sumit Sarkar, 2006. "Privacy Protection in Data Mining: A Perturbation Approach for Categorical Data," Information Systems Research, INFORMS, vol. 17(3), pages 254-270, September.
    17. Nigel Melville & Michael McQuaid, 2012. "Research Note ---Generating Shareable Statistical Databases for Business Value: Multiple Imputation with Multimodal Perturbation," Information Systems Research, INFORMS, vol. 23(2), pages 559-574, June.
    18. Duncan Smith, 2020. "Re‐identification in the Absence of Common Variables for Matching," International Statistical Review, International Statistical Institute, vol. 88(2), pages 354-379, August.
    19. S F Roehrig & R Padman & R Krishnan & G T Duncan, 2011. "Exact and heuristic methods for cell suppression in multi-dimensional linked tables," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 62(2), pages 291-304, February.
    20. Loong Bronwyn & Rubin Donald B., 2017. "Multiply-Imputed Synthetic Data: Advice to the Imputer," Journal of Official Statistics, Sciendo, vol. 33(4), pages 1005-1019, December.

    More about this item

    Keywords

    disclosure risk; identification risk; log linear model; measurement error; post randomization method; data swapping;
    All these keywords.

    JEL classification:

    • C1 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:ehl:lserod:39119. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: LSERO Manager (email available below). General contact details of provider: https://edirc.repec.org/data/lsepsuk.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.