IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v12y2024i16p2487-d1454731.html
   My bibliography  Save this article

Child Health Dataset Publishing and Mining Based on Differential Privacy Preservation

Author

Listed:
  • Wenyu Li

    (School of Mathematics and Statistics, Beihua University, Jilin 132013, China)

  • Siqi Wang

    (School of Mathematics and Statistics, Beihua University, Jilin 132013, China)

  • Hongwei Wang

    (Departments of Mathematics and Physics, Texas A&M International University, Laredo, TX 78045, USA)

  • Yunlong Lu

    (School of Mathematics and Statistics, Beihua University, Jilin 132013, China)

Abstract

With the emergence and development of application requirements such as data analysis and publishing, it is particularly important to use differential privacy protection technology to provide more reliable, secure, and compliant datasets for research in the field of children’s health. This paper focuses on the differential privacy protection of the ultrasound examination health dataset of adolescents in southern Texas from three aspects: differential privacy protection with output perturbation on basic statistics, publication of differential privacy marginal histogram and synthesized data, and a machine learning differential privacy learning algorithm. Firstly, differential privacy protection results with output perturbation show that Laplace and Gaussian mechanisms for numerical data, as well as the exponential mechanism for non-numerical data, can achieve the goal of protecting privacy. The exponential mechanism provides higher privacy protection. Secondly, a differential privacy marginal histogram with four attributes can be obtained with an appropriate privacy budget that approximates the marginal histogram of the original data. In order to publish synthetic data, we construct a synthetic query to obtain the corresponding differential privacy histogram for two attributes. Further, a synthetic dataset can be constructed by following the data distribution of the original dataset and the quality of the synthetic data publication can also be evaluated by the mean square error and error rate. Finally, consider a differential privacy logistic regression model under machine learning to predict whether children have fatty liver in binary classification tasks. The experimental results show that the model combined with quadratic perturbation has better accuracy and privacy protection. This paper can provide differential privacy protection models under different demands, which provides important data release and analysis options for data managers and research organizations, in addition to enriching the research on child health data releasing and mining.

Suggested Citation

  • Wenyu Li & Siqi Wang & Hongwei Wang & Yunlong Lu, 2024. "Child Health Dataset Publishing and Mining Based on Differential Privacy Preservation," Mathematics, MDPI, vol. 12(16), pages 1-11, August.
  • Handle: RePEc:gam:jmathe:v:12:y:2024:i:16:p:2487-:d:1454731
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/12/16/2487/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/12/16/2487/
    Download Restriction: no
    ---><---

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:12:y:2024:i:16:p:2487-:d:1454731. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.