IDEAS home Printed from https://ideas.repec.org/a/inm/oropre/v57y2009i6p1496-1509.html
   My bibliography  Save this article

Against Classification Attacks: A Decision Tree Pruning Approach to Privacy Protection in Data Mining

Author

Listed:
  • Xiao-Bai Li

    (Department of Operations and Information Systems, University of Massachusetts Lowell, Lowell, Massachusetts 01854)

  • Sumit Sarkar

    (School of Management, The University of Texas at Dallas, Richardson, Texas 75080)

Abstract

Data-mining techniques can be used not only to study collective behavior about customers, but also to discover private information about individuals. In this study, we demonstrate that decision trees, a popular classification technique for data mining, can be used to effectively reveal individuals' confidential data, even when the identities of the individuals are not present in the data. We propose a novel approach for organizations to protect confidential data from such a classification attack. The key components of this approach include a set of entropy-based measures to evaluate disclosure risks of individual records, an optimal pruning algorithm to identify high-risk records, and a pair of data-swapping procedures to reduce the disclosure risks. The proposed method provides the best trade-off between data utility and privacy protection against classification attacks. It can be applied to data with both numeric and categorical attributes. An experimental study on six real-world data sets shows that the proposed method is very effective in protecting privacy while enabling legitimate data mining and analysis.

Suggested Citation

  • Xiao-Bai Li & Sumit Sarkar, 2009. "Against Classification Attacks: A Decision Tree Pruning Approach to Privacy Protection in Data Mining," Operations Research, INFORMS, vol. 57(6), pages 1496-1509, December.
  • Handle: RePEc:inm:oropre:v:57:y:2009:i:6:p:1496-1509
    DOI: 10.1287/opre.1090.0702
    as

    Download full text from publisher

    File URL: http://dx.doi.org/10.1287/opre.1090.0702
    Download Restriction: no

    File URL: https://libkey.io/10.1287/opre.1090.0702?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Ram Gopal & Robert Garfinkel & Paulo Goes, 2002. "Confidentiality via Camouflage: The CVC Approach to Disclosure Limitation When Answering Queries to Databases," Operations Research, INFORMS, vol. 50(3), pages 501-516, June.
    2. Robert Garfinkel & Ram Gopal & Paulo Goes, 2002. "Privacy Protection of Binary Confidential Data Against Deterministic, Stochastic, and Insider Threat," Management Science, INFORMS, vol. 48(6), pages 749-764, June.
    3. Manuel A. Nunez & Robert S. Garfinkel & Ram D. Gopal, 2007. "Stochastic Protection of Confidential Information in Databases: A Hybrid of Data Perturbation and Query Restriction," Operations Research, INFORMS, vol. 55(5), pages 890-908, October.
    4. Xiao-Bai Li & Sumit Sarkar, 2006. "Privacy Protection in Data Mining: A Perturbation Approach for Categorical Data," Information Systems Research, INFORMS, vol. 17(3), pages 254-270, September.
    5. Duncan, George & Lambert, Diane, 1989. "The Risk of Disclosure for Microdata," Journal of Business & Economic Statistics, American Statistical Association, vol. 7(2), pages 207-217, April.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Mingzheng Wang & Zhengrui Jiang & Haifang Yang & Yu Zhang, 2018. "T -Closeness Slicing: A New Privacy-Preserving Approach for Transactional Data Publishing," INFORMS Journal on Computing, INFORMS, vol. 30(3), pages 438-453, August.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Xiao-Bai Li & Sumit Sarkar, 2013. "Class-Restricted Clustering and Microperturbation for Data Privacy," Management Science, INFORMS, vol. 59(4), pages 796-812, April.
    2. Haibing Lu & Jaideep Vaidya & Vijayalakshmi Atluri & Yingjiu Li, 2015. "Statistical Database Auditing Without Query Denial Threat," INFORMS Journal on Computing, INFORMS, vol. 27(1), pages 20-34, February.
    3. Syam Menon & Abhijeet Ghoshal & Sumit Sarkar, 2022. "Modifying Transactional Databases to Hide Sensitive Association Rules," Information Systems Research, INFORMS, vol. 33(1), pages 152-178, March.
    4. Syam Menon & Sumit Sarkar & Shibnath Mukherjee, 2005. "Maximizing Accuracy of Shared Databases when Concealing Sensitive Patterns," Information Systems Research, INFORMS, vol. 16(3), pages 256-270, September.
    5. Shaobo Li & Matthew J. Schneider & Yan Yu & Sachin Gupta, 2023. "Reidentification Risk in Panel Data: Protecting for k -Anonymity," Information Systems Research, INFORMS, vol. 34(3), pages 1066-1088, September.
    6. Xiao-Bai Li & Sumit Sarkar, 2006. "Privacy Protection in Data Mining: A Perturbation Approach for Categorical Data," Information Systems Research, INFORMS, vol. 17(3), pages 254-270, September.
    7. Robert Garfinkel & Ram Gopal & Steven Thompson, 2007. "Releasing Individually Identifiable Microdata with Privacy Protection Against Stochastic Threat: An Application to Health Information," Information Systems Research, INFORMS, vol. 18(1), pages 23-41, March.
    8. Xiao-Bai Li & Jialun Qin, 2017. "Anonymizing and Sharing Medical Text Records," Information Systems Research, INFORMS, vol. 28(2), pages 332-352, June.
    9. Joseph B. Kadane & Ramayya Krishnan & Galit Shmueli, 2006. "A Data Disclosure Policy for Count Data Based on the COM-Poisson Distribution," Management Science, INFORMS, vol. 52(10), pages 1610-1617, October.
    10. Xiao-Bai Li & Sumit Sarkar, 2011. "Protecting Privacy Against Record Linkage Disclosure: A Bounded Swapping Approach for Numeric Data," Information Systems Research, INFORMS, vol. 22(4), pages 774-789, December.
    11. Shlomo, Natalie & Skinner, Chris J., 2010. "Assessing the protection provided by misclassification-based disclosure limitation methods for survey microdata," LSE Research Online Documents on Economics 39119, London School of Economics and Political Science, LSE Library.
    12. Skinner, Chris J. & Shlomo, Natalie, 2008. "Assessing identification risk in survey microdata using log-linear models," LSE Research Online Documents on Economics 39112, London School of Economics and Political Science, LSE Library.
    13. Natalie Shlomo & Chris Skinner, 2022. "Measuring risk of re‐identification in microdata: State‐of‐the art and new directions," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 185(4), pages 1644-1662, October.
    14. Sumit Dutta Chowdhury & George T. Duncan & Ramayya Krishnan & Stephen F. Roehrig & Sumitra Mukherjee, 1999. "Disclosure Detection in Multivariate Categorical Databases: Auditing Confidentiality Protection Through Two New Matrix Operators," Management Science, INFORMS, vol. 45(12), pages 1710-1723, December.
    15. Braathen, Christian & Thorsen, Inge & Ubøe, Jan, 2022. "Adjusting for Cell Suppression in Commuting Trip Data," Discussion Papers 2022/13, Norwegian School of Economics, Department of Business and Management Science.
    16. Gilboa-Freedman, Gail & Smorodinsky, Rann, 2020. "On the properties that characterize privacy," Mathematical Social Sciences, Elsevier, vol. 103(C), pages 59-68.
    17. James Jackson & Robin Mitra & Brian Francis & Iain Dove, 2022. "Using saturated count models for user‐friendly synthesis of large confidential administrative databases," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 185(4), pages 1613-1643, October.
    18. Matthew J. Schneider & Shawn Mankad, 2021. "A Two-Stage Authorship Attribution Method Using Text and Structured Data for De-Anonymizing User-Generated Content," Customer Needs and Solutions, Springer;Institute for Sustainable Innovation and Growth (iSIG), vol. 8(3), pages 66-83, September.
    19. Syam Menon & Sumit Sarkar, 2007. "Minimizing Information Loss and Preserving Privacy," Management Science, INFORMS, vol. 53(1), pages 101-116, January.
    20. George Kokolakis & Dimitris Fouskakis, 2008. "On the Discrepancy Measures for the Optimal Equal Probability Partitioning in Bayesian Multivariate Micro-Aggregation," Journal of Classification, Springer;The Classification Society, vol. 25(2), pages 209-224, November.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:oropre:v:57:y:2009:i:6:p:1496-1509. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.