IDEAS home Printed from https://ideas.repec.org/a/gam/jstats/v7y2024i4p70-1208d1500483.html
   My bibliography  Save this article

Is Anonymization Through Discretization Reliable? Modeling Latent Probability Distributions for Ordinal Data as a Solution to the Small Sample Size Problem

Author

Listed:
  • Stefan Michael Stroka

    (Department of Statistics, Ludwig-Maximilians-University Munich, 80539 Munich, Germany)

  • Christian Heumann

    (Department of Statistics, Ludwig-Maximilians-University Munich, 80539 Munich, Germany)

Abstract

The growing interest in data privacy and anonymization presents challenges, as traditional methods such as ordinal discretization often result in information loss by coarsening metric data. Current research suggests that modeling the latent distributions of ordinal classes can reduce the effectiveness of anonymization and increase traceability. In fact, combining probability distributions with a small training sample can effectively infer true metric values from discrete information, depending on the model and data complexity. Our method uses metric values and ordinal classes to model latent normal distributions for each discrete class. This approach, applied with both linear and Bayesian linear regression, aims to enhance supervised learning models. Evaluated with synthetic datasets and real-world datasets from UCI and Kaggle, our method shows improved mean point estimation and narrower prediction intervals compared to the baseline. With 5–10% training data randomly split from each dataset population, it achieves an average 10% reduction in MSE and a ~5–10% increase in R ² on out-of-sample test data overall.

Suggested Citation

  • Stefan Michael Stroka & Christian Heumann, 2024. "Is Anonymization Through Discretization Reliable? Modeling Latent Probability Distributions for Ordinal Data as a Solution to the Small Sample Size Problem," Stats, MDPI, vol. 7(4), pages 1-20, October.
  • Handle: RePEc:gam:jstats:v:7:y:2024:i:4:p:70-1208:d:1500483
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2571-905X/7/4/70/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2571-905X/7/4/70/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Naijun Sha & Benard Owusu Dechi, 2019. "A Bayes Inference for Ordinal Response with Latent Variable Approach," Stats, MDPI, vol. 2(2), pages 1-11, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Qi Zhang & Yihui Zhang & Yemao Xia, 2024. "Bayesian Feature Extraction for Two-Part Latent Variable Model with Polytomous Manifestations," Mathematics, MDPI, vol. 12(5), pages 1-23, March.
    2. Ejike R. Ugba & Daniel Mörlein & Jan Gertheiss, 2021. "Smoothing in Ordinal Regression: An Application to Sensory Data," Stats, MDPI, vol. 4(3), pages 1-18, July.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jstats:v:7:y:2024:i:4:p:70-1208:d:1500483. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.