IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v53y2009i7p2439-2445.html
   My bibliography  Save this article

Importance partitioning in micro-aggregation

Author

Listed:
  • Kokolakis, G.
  • Fouskakis, D.

Abstract

One of the techniques of data holders for the protection of confidentiality of continuous data is that of micro-aggregation. Rather than releasing raw data (individual records), micro-aggregation releases the averages of small groups and thus reduces the risk of identity disclosure. At the same time the method implies loss of information and often distorts the data. Thus, the choice of groups is very crucial to minimize the information loss and the data distortion. No exact polynomial algorithms exist up to date for optimal micro-aggregation, and so the usage of heuristic methods is necessary. A heuristic algorithm, based on the notion of importance partitioning, is proposed and it is shown that compared with other micro-aggregation heuristics achieves improved performance.

Suggested Citation

  • Kokolakis, G. & Fouskakis, D., 2009. "Importance partitioning in micro-aggregation," Computational Statistics & Data Analysis, Elsevier, vol. 53(7), pages 2439-2445, May.
  • Handle: RePEc:eee:csdana:v:53:y:2009:i:7:p:2439-2445
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167-9473(08)00461-1
    Download Restriction: Full text for ScienceDirect subscribers only.
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Ghosh, Joyee & Reiter, Jerome P. & Karr, Alan F., 2007. "Secure computation with horizontally partitioned data using adaptive regression splines," Computational Statistics & Data Analysis, Elsevier, vol. 51(12), pages 5813-5820, August.
    2. Paass, Gerhard, 1988. "Disclosure Risk and Disclosure Avoidance for Microdata," Journal of Business & Economic Statistics, American Statistical Association, vol. 6(4), pages 487-500, October.
    3. Panda, S. K. & Nagabhushanam, A., 1995. "Fuzzy data distortion," Computational Statistics & Data Analysis, Elsevier, vol. 19(5), pages 553-562, May.
    4. Duncan, George & Lambert, Diane, 1989. "The Risk of Disclosure for Microdata," Journal of Business & Economic Statistics, American Statistical Association, vol. 7(2), pages 207-217, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Natalie Shlomo & Chris Skinner, 2022. "Measuring risk of re‐identification in microdata: State‐of‐the art and new directions," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 185(4), pages 1644-1662, October.
    2. Skinner, Chris J., 2007. "The probability of identification: applying ideas from forensic statistics to disclosure risk assessment," LSE Research Online Documents on Economics 39105, London School of Economics and Political Science, LSE Library.
    3. Shlomo, Natalie & Skinner, Chris, 2022. "Measuring risk of re-identification in microdata: state-of-the art and new directions," LSE Research Online Documents on Economics 117168, London School of Economics and Political Science, LSE Library.
    4. Shlomo, Natalie & Skinner, Chris J., 2010. "Assessing the protection provided by misclassification-based disclosure limitation methods for survey microdata," LSE Research Online Documents on Economics 39119, London School of Economics and Political Science, LSE Library.
    5. Braathen, Christian & Thorsen, Inge & Ubøe, Jan, 2022. "Adjusting for Cell Suppression in Commuting Trip Data," Discussion Papers 2022/13, Norwegian School of Economics, Department of Business and Management Science.
    6. C. J. Skinner, 2007. "The probability of identification: applying ideas from forensic statistics to disclosure risk assessment," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 170(1), pages 195-212, January.
    7. Mehri, Ali & Agahi, Hamzeh & Mehri-Dehnavi, Hossein, 2019. "A novel word ranking method based on distorted entropy," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 521(C), pages 484-492.
    8. James Jackson & Robin Mitra & Brian Francis & Iain Dove, 2022. "Using saturated count models for user‐friendly synthesis of large confidential administrative databases," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 185(4), pages 1613-1643, October.
    9. Christine M. O'Keefe & James O. Chipperfield, 2013. "A Summary of Attack Methods and Confidentiality Protection Measures for Fully Automated Remote Analysis Systems," International Statistical Review, International Statistical Institute, vol. 81(3), pages 426-455, December.
    10. Xiao-Bai Li & Sumit Sarkar, 2013. "Class-Restricted Clustering and Microperturbation for Data Privacy," Management Science, INFORMS, vol. 59(4), pages 796-812, April.
    11. Nigel Melville & Michael McQuaid, 2012. "Research Note ---Generating Shareable Statistical Databases for Business Value: Multiple Imputation with Multimodal Perturbation," Information Systems Research, INFORMS, vol. 23(2), pages 559-574, June.
    12. Duncan Smith, 2020. "Re‐identification in the Absence of Common Variables for Matching," International Statistical Review, International Statistical Institute, vol. 88(2), pages 354-379, August.
    13. S F Roehrig & R Padman & R Krishnan & G T Duncan, 2011. "Exact and heuristic methods for cell suppression in multi-dimensional linked tables," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 62(2), pages 291-304, February.
    14. Loong Bronwyn & Rubin Donald B., 2017. "Multiply-Imputed Synthetic Data: Advice to the Imputer," Journal of Official Statistics, Sciendo, vol. 33(4), pages 1005-1019, December.
    15. Tapan K. Nayak & Samson A. Adeshiyan, 2016. "On Invariant Post-randomization for Statistical Disclosure Control," International Statistical Review, International Statistical Institute, vol. 84(1), pages 26-42, April.
    16. Walter Mãœller & Uwe Blien & Heike Wirth, 1995. "Identification Risks of Microdata," Sociological Methods & Research, , vol. 24(2), pages 131-157, November.
    17. Xiao-Bai Li & Sumit Sarkar, 2009. "Against Classification Attacks: A Decision Tree Pruning Approach to Privacy Protection in Data Mining," Operations Research, INFORMS, vol. 57(6), pages 1496-1509, December.
    18. Christine N. Kohnen & Jerome P. Reiter, 2009. "Multiple imputation for combining confidential data owned by two agencies," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 172(2), pages 511-528, April.
    19. Gottschalk, Sandra, 2002. "Anonymisierung von Unternehmensdaten: Ein Überblick und beispielhafte Darstellung anhand des Mannheimer Innovationspanels," ZEW Discussion Papers 02-23, ZEW - Leibniz Centre for European Economic Research.
    20. Daniela Ichim, 2009. "Disclosure Control of Business Microdata: A Density‐Based Approach," International Statistical Review, International Statistical Institute, vol. 77(2), pages 196-211, August.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:53:y:2009:i:7:p:2439-2445. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.