IDEAS home Printed from https://ideas.repec.org/a/gam/jijerp/v16y2019i22p4519-d287403.html
   My bibliography  Save this article

An Empirical Study of Applying Statistical Disclosure Control Methods to Public Health Research

Author

Listed:
  • Amanda M. Y. Chu

    (Department of Social Sciences, The Education University of Hong Kong, Tai Po, Hong Kong, China)

  • Benson S. Y. Lam

    (Department of Mathematics and Statistics, The Hang Seng University of Hong Kong, Shatin, Hong Kong, China)

  • Agnes Tiwari

    (School of Nursing, The University of Hong Kong, Pokfulam Road, Hong Kong, China
    School of Nursing, Hong Kong Sanatorium & Hospital, Hong Kong, China)

  • Mike K. P. So

    (Department of Information Systems, Business Statistics and Operations Management, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, China)

Abstract

Patient data or information collected from public health and health care surveys are of great research value. Usually, the data contain sensitive personal information. Doctors, nurses, or researchers in the public health and health care sector do not analyze the available datasets or survey data on their own, and may outsource the tasks to third parties. Even though all identifiers such as names and ID card numbers are removed, there may still be some occasions in which an individual can be re-identified via the demographic or particular information provided in the datasets. Such data privacy issues can become an obstacle in health-related research. Statistical disclosure control (SDC) is a useful technique used to resolve this problem by masking and designing released data based on the original data. Whilst ensuring the released data can satisfy the needs of researchers for data analysis, there is high protection of the original data from disclosure. In this research, we discuss the statistical properties of two SDC methods: the General Additive Data Perturbation (GADP) method and the Gaussian Copula General Additive Data Perturbation (CGADP) method. An empirical study is provided to demonstrate how we can apply these two SDC methods in public health research.

Suggested Citation

  • Amanda M. Y. Chu & Benson S. Y. Lam & Agnes Tiwari & Mike K. P. So, 2019. "An Empirical Study of Applying Statistical Disclosure Control Methods to Public Health Research," IJERPH, MDPI, vol. 16(22), pages 1-17, November.
  • Handle: RePEc:gam:jijerp:v:16:y:2019:i:22:p:4519-:d:287403
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/1660-4601/16/22/4519/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/1660-4601/16/22/4519/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Rathindra Sarathy & Krishnamurty Muralidhar & Rahul Parsa, 2002. "Perturbing Nonnormal Confidential Attributes: The Copula Approach," Management Science, INFORMS, vol. 48(12), pages 1613-1627, December.
    2. M. Templ & P. Filzmoser, 2014. "Simulation and quality of a synthetic close-to-reality employer--employee population," Journal of Applied Statistics, Taylor & Francis Journals, vol. 41(5), pages 1053-1072, May.
    3. Krishnamurty Muralidhar & Rathindra Sarathy, 2006. "Data Shuffling--A New Masking Approach for Numerical Data," Management Science, INFORMS, vol. 52(5), pages 658-670, May.
    4. Krishnamurty Muralidhar & Rahul Parsa & Rathindra Sarathy, 1999. "A General Additive Data Perturbation Method for Database Security," Management Science, INFORMS, vol. 45(10), pages 1399-1415, October.
    5. John M. Abowd & Julia I. Lane, 2004. "New Approaches to Confidentiality Protection Synthetic Data, Remote Access and Research Data Centers," Longitudinal Employer-Household Dynamics Technical Papers 2004-03, Center for Economic Studies, U.S. Census Bureau.
    6. Andreas Alfons & Stefan Kraft & Matthias Templ & Peter Filzmoser, 2011. "Simulation of close-to-reality population data for household surveys with application to EU-SILC," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 20(3), pages 383-407, August.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Chu, Amanda M.Y. & Ip, Chun Yin & Lam, Benson S.Y. & So, Mike K.P., 2022. "Vine copula statistical disclosure control for mixed-type data," Computational Statistics & Data Analysis, Elsevier, vol. 176(C).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Templ, Matthias & Kowarik, Alexander & Meindl, Bernhard, 2015. "Statistical Disclosure Control for Micro-Data Using the R Package sdcMicro," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 67(i04).
    2. Haibing Lu & Jaideep Vaidya & Vijayalakshmi Atluri & Yingjiu Li, 2015. "Statistical Database Auditing Without Query Denial Threat," INFORMS Journal on Computing, INFORMS, vol. 27(1), pages 20-34, February.
    3. Chu, Amanda M.Y. & Ip, Chun Yin & Lam, Benson S.Y. & So, Mike K.P., 2022. "Vine copula statistical disclosure control for mixed-type data," Computational Statistics & Data Analysis, Elsevier, vol. 176(C).
    4. Seokho Lee & Marc G. Genton & Reinaldo B. Arellano-Valle, 2010. "Perturbation of Numerical Confidential Data via Skew-t Distributions," Management Science, INFORMS, vol. 56(2), pages 318-333, February.
    5. Yi Qian & Hui Xie, 2015. "Drive More Effective Data-Based Innovations: Enhancing the Utility of Secure Databases," Management Science, INFORMS, vol. 61(3), pages 520-541, March.
    6. Trottini, Mario & Muralidhar, Krish & Sarathy, Rathindra, 2011. "Maintaining tail dependence in data shuffling using t copula," Statistics & Probability Letters, Elsevier, vol. 81(3), pages 420-428, March.
    7. Yi Qian & Hui Xie, 2013. "Drive More Effective Data-Based Innovations: Enhancing the Utility of Secure Databases," NBER Working Papers 19586, National Bureau of Economic Research, Inc.
    8. Nowok, Beata & Raab, Gillian M. & Dibben, Chris, 2016. "synthpop: Bespoke Creation of Synthetic Data in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 74(i11).
    9. Castro, Jordi, 2012. "Recent advances in optimization techniques for statistical tabular data protection," European Journal of Operational Research, Elsevier, vol. 216(2), pages 257-269.
    10. P. Daniel Wright & Matthew J. Liberatore & Robert L. Nydick, 2006. "A Survey of Operations Research Models and Applications in Homeland Security," Interfaces, INFORMS, vol. 36(6), pages 514-529, December.
    11. Templ, Matthias & Meindl, Bernhard & Kowarik, Alexander & Dupriez, Olivier, 2017. "Simulation of Synthetic Complex Data: The R Package simPop," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 79(i10).
    12. Roszka Wojciech, 2019. "Spatial Microsimulation Of Personal Income In Poland At The Level Of Subregions," Statistics in Transition New Series, Polish Statistical Association, vol. 20(3), pages 133-153, September.
    13. Templ Matthias, 2015. "Quality Indicators for Statistical Disclosure Methods: A Case Study on the Structure of Earnings Survey," Journal of Official Statistics, Sciendo, vol. 31(4), pages 737-761, December.
    14. Woodcock, Simon D. & Benedetto, Gary, 2009. "Distribution-preserving statistical disclosure limitation," Computational Statistics & Data Analysis, Elsevier, vol. 53(12), pages 4228-4242, October.
    15. Frauke Kreuter, 2013. "Facing the Nonresponse Challenge," The ANNALS of the American Academy of Political and Social Science, , vol. 645(1), pages 23-35, January.
    16. Geoffrey M. Jacquez & Aleksander Essex & Andrew Curtis & Betsy Kohler & Recinda Sherman & Khaled El Emam & Chen Shi & Andy Kaufmann & Linda Beale & Thomas Cusick & Daniel Goldberg & Pierre Goovaerts, 2017. "Geospatial cryptography: enabling researchers to access private, spatially referenced, human subjects data for cancer control and prevention," Journal of Geographical Systems, Springer, vol. 19(3), pages 197-220, July.
    17. Rathindra Sarathy & Krishnamurty Muralidhar & Rahul Parsa, 2002. "Perturbing Nonnormal Confidential Attributes: The Copula Approach," Management Science, INFORMS, vol. 48(12), pages 1613-1627, December.
    18. Rathindra Sarathy & Krishnamurty Muralidhar, 2002. "The Security of Confidential Numerical Data in Databases," Information Systems Research, INFORMS, vol. 13(4), pages 389-403, December.
    19. Matthew J. Schneider & Dawn Iacobucci, 2020. "Protecting survey data on a consumer level," Journal of Marketing Analytics, Palgrave Macmillan, vol. 8(1), pages 3-17, March.
    20. Syam Menon & Sumit Sarkar & Shibnath Mukherjee, 2005. "Maximizing Accuracy of Shared Databases when Concealing Sensitive Patterns," Information Systems Research, INFORMS, vol. 16(3), pages 256-270, September.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jijerp:v:16:y:2019:i:22:p:4519-:d:287403. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.