IDEAS home Printed from https://ideas.repec.org/a/inm/orisre/v23y2012i2p559-574.html
   My bibliography  Save this article

Research Note ---Generating Shareable Statistical Databases for Business Value: Multiple Imputation with Multimodal Perturbation

Author

Listed:
  • Nigel Melville

    (Stephen M. Ross School of Business, University of Michigan, Ann Arbor, Michigan 48109)

  • Michael McQuaid

    (School of Information, University of Michigan, Ann Arbor, Michigan 48109)

Abstract

Business organizations are generating growing volumes of data about their employees, customers, and suppliers. Much of these data cannot be exploited for business value due to privacy and confidentiality concerns. National statistical agencies share sensitive data collected from individuals and businesses by modifying the data so individuals and firms cannot be identified but statistical utility is preserved. We build on this literature to develop a hybrid approach to data masking for business organizations. We demonstrate the validity of the hybrid approach, which we call multiple imputation with multimodal perturbation (MIMP), using Monte Carlo simulation and illustrate its application in a specific business context. Results of our analysis open new areas of research for information systems scholarship and new potential revenue sources for business organizations.

Suggested Citation

  • Nigel Melville & Michael McQuaid, 2012. "Research Note ---Generating Shareable Statistical Databases for Business Value: Multiple Imputation with Multimodal Perturbation," Information Systems Research, INFORMS, vol. 23(2), pages 559-574, June.
  • Handle: RePEc:inm:orisre:v:23:y:2012:i:2:p:559-574
    DOI: 10.1287/isre.1110.0361
    as

    Download full text from publisher

    File URL: http://dx.doi.org/10.1287/isre.1110.0361
    Download Restriction: no

    File URL: https://libkey.io/10.1287/isre.1110.0361?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Philip K. Hopke & Chuanhai Liu & Donald B. Rubin, 2001. "Multiple Imputation for Multivariate Data with Missing and Below‐Threshold Measurements: Time‐Series Concentrations of Pollutants in the Arctic," Biometrics, The International Biometric Society, vol. 57(1), pages 22-33, March.
    2. Krishnamurty Muralidhar & Rathindra Sarathy, 2006. "Data Shuffling--A New Masking Approach for Numerical Data," Management Science, INFORMS, vol. 52(5), pages 658-670, May.
    3. Jeffrey M. Perloff & Mark Denbaly, 2007. "Data Needs for Consumer and Retail Firm Studies," American Journal of Agricultural Economics, Agricultural and Applied Economics Association, vol. 89(5), pages 1282-1287.
    4. Robert Garfinkel & Ram Gopal & Steven Thompson, 2007. "Releasing Individually Identifiable Microdata with Privacy Protection Against Stochastic Threat: An Application to Health Information," Information Systems Research, INFORMS, vol. 18(1), pages 23-41, March.
    5. Jerome P. Reiter, 2005. "Releasing multiply imputed, synthetic public use microdata: an illustration and empirical study," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 168(1), pages 185-205, January.
    6. Paass, Gerhard, 1988. "Disclosure Risk and Disclosure Avoidance for Microdata," Journal of Business & Economic Statistics, American Statistical Association, vol. 6(4), pages 487-500, October.
    7. Syam Menon & Sumit Sarkar & Shibnath Mukherjee, 2005. "Maximizing Accuracy of Shared Databases when Concealing Sensitive Patterns," Information Systems Research, INFORMS, vol. 16(3), pages 256-270, September.
    8. Seokho Lee & Marc G. Genton & Reinaldo B. Arellano-Valle, 2010. "Perturbation of Numerical Confidential Data via Skew-t Distributions," Management Science, INFORMS, vol. 56(2), pages 318-333, February.
    9. Xiao-Bai Li & Sumit Sarkar, 2006. "Privacy Protection in Data Mining: A Perturbation Approach for Categorical Data," Information Systems Research, INFORMS, vol. 17(3), pages 254-270, September.
    10. Zhiqiang Zheng & Balaji Padmanabhan, 2006. "Selectively Acquiring Customer Information: A New Data Acquisition Problem and an Active Learning-Based Solution," Management Science, INFORMS, vol. 52(5), pages 697-712, May.
    11. C.J. Skinner, 1992. "On identification disclosure and prediction disclosure for microdata," Statistica Neerlandica, Netherlands Society for Statistics and Operations Research, vol. 46(1), pages 21-32, March.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Xiao-Bai Li & Jialun Qin, 2017. "Anonymizing and Sharing Medical Text Records," Information Systems Research, INFORMS, vol. 28(2), pages 332-352, June.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Haibing Lu & Jaideep Vaidya & Vijayalakshmi Atluri & Yingjiu Li, 2015. "Statistical Database Auditing Without Query Denial Threat," INFORMS Journal on Computing, INFORMS, vol. 27(1), pages 20-34, February.
    2. Xiao-Bai Li & Sumit Sarkar, 2013. "Class-Restricted Clustering and Microperturbation for Data Privacy," Management Science, INFORMS, vol. 59(4), pages 796-812, April.
    3. Chu, Amanda M.Y. & Ip, Chun Yin & Lam, Benson S.Y. & So, Mike K.P., 2022. "Vine copula statistical disclosure control for mixed-type data," Computational Statistics & Data Analysis, Elsevier, vol. 176(C).
    4. Yi Qian & Hui Xie, 2015. "Drive More Effective Data-Based Innovations: Enhancing the Utility of Secure Databases," Management Science, INFORMS, vol. 61(3), pages 520-541, March.
    5. Xiao-Bai Li & Jialun Qin, 2017. "Anonymizing and Sharing Medical Text Records," Information Systems Research, INFORMS, vol. 28(2), pages 332-352, June.
    6. Xiao-Bai Li & Sumit Sarkar, 2011. "Protecting Privacy Against Record Linkage Disclosure: A Bounded Swapping Approach for Numeric Data," Information Systems Research, INFORMS, vol. 22(4), pages 774-789, December.
    7. Trottini, Mario & Muralidhar, Krish & Sarathy, Rathindra, 2011. "Maintaining tail dependence in data shuffling using t copula," Statistics & Probability Letters, Elsevier, vol. 81(3), pages 420-428, March.
    8. Natalie Shlomo & Chris Skinner, 2022. "Measuring risk of re‐identification in microdata: State‐of‐the art and new directions," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 185(4), pages 1644-1662, October.
    9. Weiyin Hong & Frank K. Y. Chan & James Y. L. Thong, 2021. "Drivers and Inhibitors of Internet Privacy Concern: A Multidimensional Development Theory Perspective," Journal of Business Ethics, Springer, vol. 168(3), pages 539-564, January.
    10. Robert Garfinkel & Ram Gopal & Steven Thompson, 2007. "Releasing Individually Identifiable Microdata with Privacy Protection Against Stochastic Threat: An Application to Health Information," Information Systems Research, INFORMS, vol. 18(1), pages 23-41, March.
    11. Shlomo, Natalie & Skinner, Chris, 2022. "Measuring risk of re-identification in microdata: state-of-the art and new directions," LSE Research Online Documents on Economics 117168, London School of Economics and Political Science, LSE Library.
    12. Castro, Jordi, 2012. "Recent advances in optimization techniques for statistical tabular data protection," European Journal of Operational Research, Elsevier, vol. 216(2), pages 257-269.
    13. Drechsler, Jörg & Reiter, Jerome P., 2011. "An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets," Computational Statistics & Data Analysis, Elsevier, vol. 55(12), pages 3232-3243, December.
    14. Klein Martin & Sinha Bimal, 2013. "Statistical Analysis of Noise-Multiplied Data Using Multiple Imputation," Journal of Official Statistics, Sciendo, vol. 29(3), pages 425-465, June.
    15. Zhiyuan Wang & Zhiqiang (Eric) Zheng & Wei Jiang & Shaojie Tang, 2021. "Blockchain‐Enabled Data Sharing in Supply Chains: Model, Operationalization, and Tutorial," Production and Operations Management, Production and Operations Management Society, vol. 30(7), pages 1965-1985, July.
    16. Sungduk Kim & Zhen Chen & Neil J. Perkins & Enrique F. Schisterman & Germaine M. Buck Louis, 2019. "A Model-Based Approach to Detection Limits in Studying Environmental Exposures and Human Fecundity," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 11(3), pages 524-547, December.
    17. Shlomo, Natalie & Skinner, Chris J., 2010. "Assessing the protection provided by misclassification-based disclosure limitation methods for survey microdata," LSE Research Online Documents on Economics 39119, London School of Economics and Political Science, LSE Library.
    18. Yonghua Ji & Subodha Kumar & Vijay Mookerjee, 2016. "When Being Hot Is Not Cool: Monitoring Hot Lists for Information Security," Information Systems Research, INFORMS, vol. 27(4), pages 897-918, December.
    19. Woodcock, Simon D. & Benedetto, Gary, 2009. "Distribution-preserving statistical disclosure limitation," Computational Statistics & Data Analysis, Elsevier, vol. 53(12), pages 4228-4242, October.
    20. Amanda M. Y. Chu & Benson S. Y. Lam & Agnes Tiwari & Mike K. P. So, 2019. "An Empirical Study of Applying Statistical Disclosure Control Methods to Public Health Research," IJERPH, MDPI, vol. 16(22), pages 1-17, November.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:orisre:v:23:y:2012:i:2:p:559-574. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.