IDEAS home Printed from https://ideas.repec.org/a/inm/orisre/v16y2005i3p256-270.html
   My bibliography  Save this article

Maximizing Accuracy of Shared Databases when Concealing Sensitive Patterns

Author

Listed:
  • Syam Menon

    (School of Management, University of Texas at Dallas, Richardson, Texas 75083)

  • Sumit Sarkar

    (School of Management, University of Texas at Dallas, Richardson, Texas 75083)

  • Shibnath Mukherjee

    (School of Management, University of Texas at Dallas, Richardson, Texas 75083)

Abstract

The sharing of databases either within or across organizations raises the possibility of unintentionally revealing sensitive relationships contained in them. Recent advances in data-mining technology have increased the chances of such disclosure. Consequently, firms that share their databases might choose to hide these sensitive relationships prior to sharing. Ideally, the approach used to hide relationships should be impervious to as many data-mining techniques as possible, while minimizing the resulting distortion to the database. This paper focuses on frequent item sets, the identification of which forms a critical initial step in a variety of data-mining tasks. It presents an optimal approach for hiding sensitive item sets, while keeping the number of modified transactions to a minimum. The approach is particularly attractive as it easily handles databases with millions of transactions. Results from extensive tests conducted on publicly available real data and data generated using IBM’s synthetic data generator indicate that the approach presented is very effective, optimally solving problems involving millions of transactions in a few seconds.

Suggested Citation

  • Syam Menon & Sumit Sarkar & Shibnath Mukherjee, 2005. "Maximizing Accuracy of Shared Databases when Concealing Sensitive Patterns," Information Systems Research, INFORMS, vol. 16(3), pages 256-270, September.
  • Handle: RePEc:inm:orisre:v:16:y:2005:i:3:p:256-270
    DOI: 10.1287/isre.1050.0056
    as

    Download full text from publisher

    File URL: http://dx.doi.org/10.1287/isre.1050.0056
    Download Restriction: no

    File URL: https://libkey.io/10.1287/isre.1050.0056?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Sumit Dutta Chowdhury & George T. Duncan & Ramayya Krishnan & Stephen F. Roehrig & Sumitra Mukherjee, 1999. "Disclosure Detection in Multivariate Categorical Databases: Auditing Confidentiality Protection Through Two New Matrix Operators," Management Science, INFORMS, vol. 45(12), pages 1710-1723, December.
    2. Rathindra Sarathy & Krishnamurty Muralidhar, 2002. "The Security of Confidential Numerical Data in Databases," Information Systems Research, INFORMS, vol. 13(4), pages 389-403, December.
    3. Rathindra Sarathy & Krishnamurty Muralidhar & Rahul Parsa, 2002. "Perturbing Nonnormal Confidential Attributes: The Copula Approach," Management Science, INFORMS, vol. 48(12), pages 1613-1627, December.
    4. Ram Gopal & Robert Garfinkel & Paulo Goes, 2002. "Confidentiality via Camouflage: The CVC Approach to Disclosure Limitation When Answering Queries to Databases," Operations Research, INFORMS, vol. 50(3), pages 501-516, June.
    5. Robert Garfinkel & Ram Gopal & Paulo Goes, 2002. "Privacy Protection of Binary Confidential Data Against Deterministic, Stochastic, and Insider Threat," Management Science, INFORMS, vol. 48(6), pages 749-764, June.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Peng Cheng & Chun-Wei Lin & Jeng-Shyang Pan, 2015. "Use HypE to Hide Association Rules by Adding Items," PLOS ONE, Public Library of Science, vol. 10(6), pages 1-19, June.
    2. Xiao-Bai Li & Jialun Qin, 2017. "Anonymizing and Sharing Medical Text Records," Information Systems Research, INFORMS, vol. 28(2), pages 332-352, June.
    3. Syam Menon & Abhijeet Ghoshal & Sumit Sarkar, 2022. "Modifying Transactional Databases to Hide Sensitive Association Rules," Information Systems Research, INFORMS, vol. 33(1), pages 152-178, March.
    4. Nigel Melville & Michael McQuaid, 2012. "Research Note ---Generating Shareable Statistical Databases for Business Value: Multiple Imputation with Multimodal Perturbation," Information Systems Research, INFORMS, vol. 23(2), pages 559-574, June.
    5. Robert Garfinkel & Ram Gopal & Steven Thompson, 2007. "Releasing Individually Identifiable Microdata with Privacy Protection Against Stochastic Threat: An Application to Health Information," Information Systems Research, INFORMS, vol. 18(1), pages 23-41, March.
    6. Damangir, Sina & Du, Rex Yuxing & Hu, Ye, 2018. "Uncovering Patterns of Product Co-consideration: A Case Study of Online Vehicle Price Quote Request Data," Journal of Interactive Marketing, Elsevier, vol. 42(C), pages 1-17.
    7. Abhijeet Ghoshal & Jing Hao & Syam Menon & Sumit Sarkar, 2020. "Hiding Sensitive Information when Sharing Distributed Transactional Data," Information Systems Research, INFORMS, vol. 31(2), pages 473-490, June.
    8. Syam Menon & Sumit Sarkar, 2007. "Minimizing Information Loss and Preserving Privacy," Management Science, INFORMS, vol. 53(1), pages 101-116, January.
    9. Mingzheng Wang & Zhengrui Jiang & Haifang Yang & Yu Zhang, 2018. "T -Closeness Slicing: A New Privacy-Preserving Approach for Transactional Data Publishing," INFORMS Journal on Computing, INFORMS, vol. 30(3), pages 438-453, August.
    10. Urbinati, Andrea & Bogers, Marcel & Chiesa, Vittorio & Frattini, Federico, 2019. "Creating and capturing value from Big Data: A multiple-case study analysis of provider companies," Technovation, Elsevier, vol. 84, pages 21-36.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Joseph B. Kadane & Ramayya Krishnan & Galit Shmueli, 2006. "A Data Disclosure Policy for Count Data Based on the COM-Poisson Distribution," Management Science, INFORMS, vol. 52(10), pages 1610-1617, October.
    2. Syam Menon & Sumit Sarkar, 2007. "Minimizing Information Loss and Preserving Privacy," Management Science, INFORMS, vol. 53(1), pages 101-116, January.
    3. Haibing Lu & Jaideep Vaidya & Vijayalakshmi Atluri & Yingjiu Li, 2015. "Statistical Database Auditing Without Query Denial Threat," INFORMS Journal on Computing, INFORMS, vol. 27(1), pages 20-34, February.
    4. Xiao-Bai Li & Sumit Sarkar, 2013. "Class-Restricted Clustering and Microperturbation for Data Privacy," Management Science, INFORMS, vol. 59(4), pages 796-812, April.
    5. Xiao-Bai Li & Sumit Sarkar, 2006. "Privacy Protection in Data Mining: A Perturbation Approach for Categorical Data," Information Systems Research, INFORMS, vol. 17(3), pages 254-270, September.
    6. Amalia R. Miller & Catherine Tucker, 2009. "Privacy Protection and Technology Diffusion: The Case of Electronic Medical Records," Management Science, INFORMS, vol. 55(7), pages 1077-1093, July.
    7. Robert Garfinkel & Ram Gopal & Steven Thompson, 2007. "Releasing Individually Identifiable Microdata with Privacy Protection Against Stochastic Threat: An Application to Health Information," Information Systems Research, INFORMS, vol. 18(1), pages 23-41, March.
    8. Xiao-Bai Li & Sumit Sarkar, 2009. "Against Classification Attacks: A Decision Tree Pruning Approach to Privacy Protection in Data Mining," Operations Research, INFORMS, vol. 57(6), pages 1496-1509, December.
    9. P. Daniel Wright & Matthew J. Liberatore & Robert L. Nydick, 2006. "A Survey of Operations Research Models and Applications in Homeland Security," Interfaces, INFORMS, vol. 36(6), pages 514-529, December.
    10. Woodcock, Simon D. & Benedetto, Gary, 2009. "Distribution-preserving statistical disclosure limitation," Computational Statistics & Data Analysis, Elsevier, vol. 53(12), pages 4228-4242, October.
    11. Amanda M. Y. Chu & Benson S. Y. Lam & Agnes Tiwari & Mike K. P. So, 2019. "An Empirical Study of Applying Statistical Disclosure Control Methods to Public Health Research," IJERPH, MDPI, vol. 16(22), pages 1-17, November.
    12. Rathindra Sarathy & Krishnamurty Muralidhar, 2002. "The Security of Confidential Numerical Data in Databases," Information Systems Research, INFORMS, vol. 13(4), pages 389-403, December.
    13. Syam Menon & Abhijeet Ghoshal & Sumit Sarkar, 2022. "Modifying Transactional Databases to Hide Sensitive Association Rules," Information Systems Research, INFORMS, vol. 33(1), pages 152-178, March.
    14. Yi Qian & Hui Xie, 2013. "Drive More Effective Data-Based Innovations: Enhancing the Utility of Secure Databases," NBER Working Papers 19586, National Bureau of Economic Research, Inc.
    15. Rajiv D. Banker & Robert J. Kauffman, 2004. "50th Anniversary Article: The Evolution of Research on Information Systems: A Fiftieth-Year Survey of the Literature in Management Science," Management Science, INFORMS, vol. 50(3), pages 281-298, March.
    16. Mario Trottini, 2008. "Data disclosure limitation as a decision problem," Metron - International Journal of Statistics, Dipartimento di Statistica, Probabilità e Statistiche Applicate - University of Rome, vol. 0(1), pages 109-134.
    17. Robert Garfinkel & Ram Gopal & Paulo Goes, 2002. "Privacy Protection of Binary Confidential Data Against Deterministic, Stochastic, and Insider Threat," Management Science, INFORMS, vol. 48(6), pages 749-764, June.
    18. Yang, Jingping & Cheng, Shihong & Zhang, Lihong, 2006. "Bivariate copula decomposition in terms of comonotonicity, countermonotonicity and independence," Insurance: Mathematics and Economics, Elsevier, vol. 39(2), pages 267-284, October.
    19. Xue Bai & Ram Gopal & Manuel Nunez & Dmitry Zhdanov, 2012. "On the Prevention of Fraud and Privacy Exposure in Process Information Flow," INFORMS Journal on Computing, INFORMS, vol. 24(3), pages 416-432, August.
    20. S F Roehrig & R Padman & R Krishnan & G T Duncan, 2011. "Exact and heuristic methods for cell suppression in multi-dimensional linked tables," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 62(2), pages 291-304, February.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:orisre:v:16:y:2005:i:3:p:256-270. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.