IDEAS home Printed from https://ideas.repec.org/a/jss/jstsof/v067i04.html
   My bibliography  Save this article

Statistical Disclosure Control for Micro-Data Using the R Package sdcMicro

Author

Listed:
  • Templ, Matthias
  • Kowarik, Alexander
  • Meindl, Bernhard

Abstract

The demand for data from surveys, censuses or registers containing sensible information on people or enterprises has increased significantly over the last years. However, before data can be provided to the public or to researchers, confidentiality has to be respected for any data set possibly containing sensible information about individual units. Confidentiality can be achieved by applying statistical disclosure control (SDC) methods to the data in order to decrease the disclosure risk of data.The R package sdcMicro serves as an easy-to-handle, object-oriented S4 class implementation of SDC methods to evaluate and anonymize confidential micro-data sets. It includes all popular disclosure risk and perturbation methods. The package performs automated recalculation of frequency counts, individual and global risk measures, information loss and data utility statistics after each anonymization step. All methods are highly optimized in terms of computational costs to be able to work with large data sets. Reporting facilities that summarize the anonymization process can also be easily used by practitioners. We describe the package and demonstrate its functionality with a complex household survey test data set that has been distributed by the International Household Survey Network.

Suggested Citation

  • Templ, Matthias & Kowarik, Alexander & Meindl, Bernhard, 2015. "Statistical Disclosure Control for Micro-Data Using the R Package sdcMicro," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 67(i04).
  • Handle: RePEc:jss:jstsof:v:067:i04
    DOI: http://hdl.handle.net/10.18637/jss.v067.i04
    as

    Download full text from publisher

    File URL: https://www.jstatsoft.org/index.php/jss/article/view/v067i04/v67i04.pdf
    Download Restriction: no

    File URL: https://www.jstatsoft.org/index.php/jss/article/downloadSuppFile/v067i04/sdcMicro_4.6.0.tar.gz
    Download Restriction: no

    File URL: https://www.jstatsoft.org/index.php/jss/article/downloadSuppFile/v067i04/v67i04.R
    Download Restriction: no

    File URL: https://libkey.io/http://hdl.handle.net/10.18637/jss.v067.i04?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Krishnamurty Muralidhar & Rathindra Sarathy, 2006. "Data Shuffling--A New Masking Approach for Numerical Data," Management Science, INFORMS, vol. 52(5), pages 658-670, May.
    2. Krishnamurty Muralidhar & Rahul Parsa & Rathindra Sarathy, 1999. "A General Additive Data Perturbation Method for Database Security," Management Science, INFORMS, vol. 45(10), pages 1399-1415, October.
    3. Alfons, Andreas & Templ, Matthias, 2013. "Estimation of Social Exclusion Indicators from Complex Surveys: The R Package laeken," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 54(i15).
    4. Meyer, David & Hornik, Kurt, 2009. "Generalized and Customizable Sets in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 31(i02).
    5. Andreas Alfons & Stefan Kraft & Matthias Templ & Peter Filzmoser, 2011. "Simulation of close-to-reality population data for household surveys with application to EU-SILC," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 20(3), pages 383-407, August.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Heng Xu & Nan Zhang, 2022. "Implications of Data Anonymization on the Statistical Evidence of Disparity," Management Science, INFORMS, vol. 68(4), pages 2600-2618, April.
    2. Wieringa, Jaap & Kannan, P.K. & Ma, Xiao & Reutterer, Thomas & Risselada, Hans & Skiera, Bernd, 2021. "Data analytics in a privacy-concerned world," Journal of Business Research, Elsevier, vol. 122(C), pages 915-925.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Amanda M. Y. Chu & Benson S. Y. Lam & Agnes Tiwari & Mike K. P. So, 2019. "An Empirical Study of Applying Statistical Disclosure Control Methods to Public Health Research," IJERPH, MDPI, vol. 16(22), pages 1-17, November.
    2. Templ, Matthias & Meindl, Bernhard & Kowarik, Alexander & Dupriez, Olivier, 2017. "Simulation of Synthetic Complex Data: The R Package simPop," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 79(i10).
    3. Trottini, Mario & Muralidhar, Krish & Sarathy, Rathindra, 2011. "Maintaining tail dependence in data shuffling using t copula," Statistics & Probability Letters, Elsevier, vol. 81(3), pages 420-428, March.
    4. Templ Matthias, 2015. "Quality Indicators for Statistical Disclosure Methods: A Case Study on the Structure of Earnings Survey," Journal of Official Statistics, Sciendo, vol. 31(4), pages 737-761, December.
    5. Haibing Lu & Jaideep Vaidya & Vijayalakshmi Atluri & Yingjiu Li, 2015. "Statistical Database Auditing Without Query Denial Threat," INFORMS Journal on Computing, INFORMS, vol. 27(1), pages 20-34, February.
    6. M. Templ & K. Hron & P. Filzmoser, 2017. "Exploratory tools for outlier detection in compositional data with structural zeros," Journal of Applied Statistics, Taylor & Francis Journals, vol. 44(4), pages 734-752, March.
    7. Chu, Amanda M.Y. & Ip, Chun Yin & Lam, Benson S.Y. & So, Mike K.P., 2022. "Vine copula statistical disclosure control for mixed-type data," Computational Statistics & Data Analysis, Elsevier, vol. 176(C).
    8. Seokho Lee & Marc G. Genton & Reinaldo B. Arellano-Valle, 2010. "Perturbation of Numerical Confidential Data via Skew-t Distributions," Management Science, INFORMS, vol. 56(2), pages 318-333, February.
    9. Yi Qian & Hui Xie, 2015. "Drive More Effective Data-Based Innovations: Enhancing the Utility of Secure Databases," Management Science, INFORMS, vol. 61(3), pages 520-541, March.
    10. Castro, Jordi, 2012. "Recent advances in optimization techniques for statistical tabular data protection," European Journal of Operational Research, Elsevier, vol. 216(2), pages 257-269.
    11. Patrick Krennmair & Timo Schmid, 2022. "Flexible domain prediction using mixed effects random forests," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 71(5), pages 1865-1894, November.
    12. P. Daniel Wright & Matthew J. Liberatore & Robert L. Nydick, 2006. "A Survey of Operations Research Models and Applications in Homeland Security," Interfaces, INFORMS, vol. 36(6), pages 514-529, December.
    13. Riza, Lala Septem & Bergmeir, Christoph & Herrera, Francisco & Benítez, José M., 2015. "frbs: Fuzzy Rule-Based Systems for Classification and Regression in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 65(i06).
    14. Juris Breidaks, 2015. "Variance Estimation Using Package vardpoor in R," Romanian Statistical Review, Romanian Statistical Review, vol. 63(2), pages 24-38, June.
    15. Anna Murawska & Bartosz Mickiewicz & Małgorzata Zajdel & Małgorzata Michalcewicz-Kaniowska, 2020. "Multidimensional Analysis of the Relationship between Sustainable Living Conditions and Long and Good Health in the European Union Countries," European Research Studies Journal, European Research Studies Journal, vol. 0(3), pages 716-735.
    16. repec:jss:jstsof:37:i02 is not listed on IDEAS
    17. Ermanno Catullo & Antonio Palestrini & Ruggero Grilli & Mauro Gallegati, 2018. "Early warning indicators and macro-prudential policies: a credit network agent based model," Journal of Economic Interaction and Coordination, Springer;Society for Economic Science with Heterogeneous Interacting Agents, vol. 13(1), pages 81-115, April.
    18. Rathindra Sarathy & Krishnamurty Muralidhar & Rahul Parsa, 2002. "Perturbing Nonnormal Confidential Attributes: The Copula Approach," Management Science, INFORMS, vol. 48(12), pages 1613-1627, December.
    19. Rathindra Sarathy & Krishnamurty Muralidhar, 2002. "The Security of Confidential Numerical Data in Databases," Information Systems Research, INFORMS, vol. 13(4), pages 389-403, December.
    20. Matthew J. Schneider & Dawn Iacobucci, 2020. "Protecting survey data on a consumer level," Journal of Marketing Analytics, Palgrave Macmillan, vol. 8(1), pages 3-17, March.
    21. Marchetti Stefano & Tzavidis Nikos, 2021. "Robust Estimation of the Theil Index and the Gini Coeffient for Small Areas," Journal of Official Statistics, Sciendo, vol. 37(4), pages 955-979, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:jss:jstsof:v:067:i04. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Christopher F. Baum (email available below). General contact details of provider: http://www.jstatsoft.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.