IDEAS home Printed from https://ideas.repec.org/a/gam/jftint/v12y2020i4p71-d346887.html
   My bibliography  Save this article

Publishing Anonymized Set-Valued Data via Disassociation towards Analysis

Author

Listed:
  • Nancy Awad

    (Femto-ST Institute, UMR 6174 CNRS, University of Bourgogne-Franche-Comte, 25000 Besançon, France
    TICKET Labortary, Antonine University, Hadat-Baabda 1003, Lebanon)

  • Jean-Francois Couchot

    (Femto-ST Institute, UMR 6174 CNRS, University of Bourgogne-Franche-Comte, 25000 Besançon, France)

  • Bechara Al Bouna

    (TICKET Labortary, Antonine University, Hadat-Baabda 1003, Lebanon)

  • Laurent Philippe

    (Femto-ST Institute, UMR 6174 CNRS, University of Bourgogne-Franche-Comte, 25000 Besançon, France)

Abstract

Data publishing is a challenging task for privacy preservation constraints. To ensure privacy, many anonymization techniques have been proposed. They differ in terms of the mathematical properties they verify and in terms of the functional objectives expected. Disassociation is one of the techniques that aim at anonymizing of set-valued datasets (e.g., discrete locations, search and shopping items) while guaranteeing the confidentiality property known as k m -anonymity. Disassociation separates the items of an itemset in vertical chunks to create ambiguity in the original associations. In a previous work, we defined a new ant-based clustering algorithm for the disassociation technique to preserve some items associated together, called utility rules, throughout the anonymization process, for accurate analysis. In this paper, we examine the disassociated dataset in terms of knowledge extraction. To make data analysis easy on top of the anonymized dataset, we define neighbor datasets or in other terms datasets that are the result of a probabilistic re-association process. To assess the neighborhood notion set-valued datasets are formalized into trees and a tree edit distance (TED) is directly applied between these neighbors. Finally, we prove the faithfulness of the neighbors to knowledge extraction for future analysis, in the experiments.

Suggested Citation

  • Nancy Awad & Jean-Francois Couchot & Bechara Al Bouna & Laurent Philippe, 2020. "Publishing Anonymized Set-Valued Data via Disassociation towards Analysis," Future Internet, MDPI, vol. 12(4), pages 1-21, April.
  • Handle: RePEc:gam:jftint:v:12:y:2020:i:4:p:71-:d:346887
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/1999-5903/12/4/71/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/1999-5903/12/4/71/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Hahsler, Michael & Grün, Bettina & Hornik, Kurt, 2005. "arules - A Computational Environment for Mining Association Rules and Frequent Item Sets," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 14(i15).
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jesus Crespo Cuaresma & Bettina Grün & Paul Hofmarcher & Stefan Humer & Mathias Moser, 2015. "A Comprehensive Approach to Posterior Jointness Analysis in Bayesian Model Averaging Applications," Department of Economics Working Papers wuwp193, Vienna University of Economics and Business, Department of Economics.
    2. Yoichi Matsumoto, 2013. "Heterogeneous Combinations of Knowledge Elements: How the Knowledge Base Structure Impacts Knowledge-related Outcomes of a Firm," Discussion Paper Series DP2013-15, Research Institute for Economics & Business Administration, Kobe University.
    3. Man-, ZuyiKeunZuyi Wang & Takagi, Chifumi & Kim, Man-Keun & Chung, Anh, 2022. "Uncover Drivers Influencing Consumers' WTP Using Machine Learning: Case of Organic Coffee in Taiwan," 2022 Annual Meeting, July 31-August 2, Anaheim, California 322150, Agricultural and Applied Economics Association.
    4. Kurt Hornik & Christian Buchta & Achim Zeileis, 2009. "Open-source machine learning: R meets Weka," Computational Statistics, Springer, vol. 24(2), pages 225-232, May.
    5. Hofmarcher, Paul & Crespo Cuaresma, Jesus & Grün, Bettina & Humer, Stefan & Moser, Mathias, 2018. "Bivariate jointness measures in Bayesian Model Averaging: Solving the conundrum," Journal of Macroeconomics, Elsevier, vol. 57(C), pages 150-165.
    6. Małecka-Ziembińska Edyta & Siwiec Anna, 2020. "Searching for similarities in EU corporate income taxes for their harmonization," Economics and Business Review, Sciendo, vol. 6(4), pages 72-94, December.
    7. Scholz, Michael, 2016. "R Package clickstream: Analyzing Clickstream Data with Markov Chains," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 74(i04).
    8. Khanh Giang Le & Quang Hoc Tran & Van Manh Do, 2023. "Urban Traffic Accident Features Investigation to Improve Urban Transportation Infrastructure Sustainability by Integrating GIS and Data Mining Techniques," Sustainability, MDPI, vol. 16(1), pages 1-19, December.
    9. Jasleen Kaur & Khushdeep Dharni, 2022. "Assessing efficacy of association rules for predicting global stock indices," DECISION: Official Journal of the Indian Institute of Management Calcutta, Springer;Indian Institute of Management Calcutta, vol. 49(3), pages 329-339, September.
    10. Deszczyński, Bartosz & Beręsewicz, Maciej, 2021. "The maturity of relationship management and firm performance – A step toward relationship management middle-range theory," Journal of Business Research, Elsevier, vol. 135(C), pages 358-372.
    11. Michael Hahsler & Radoslaw Karpienko, 2017. "Visualizing association rules in hierarchical groups," Journal of Business Economics, Springer, vol. 87(3), pages 317-335, April.
    12. Ji Yeon Lee & Richa Kumari & Jae Yun Jeong & Tae-Hyun Kim & Byeong-Hee Lee, 2020. "Knowledge Discovering on Graphene Green Technology by Text Mining in National R&D Projects in South Korea," Sustainability, MDPI, vol. 12(23), pages 1-16, November.
    13. Yoonju Lee & Heejin Kim & Hyesun Jeong & Yunhwan Noh, 2020. "Patterns of Multimorbidity in Adults: An Association Rules Analysis Using the Korea Health Panel," IJERPH, MDPI, vol. 17(8), pages 1-14, April.
    14. Sun, Chenhao & Wang, Xin & Zheng, Yihui, 2020. "An ensemble system to predict the spatiotemporal distribution of energy security weaknesses in transmission networks," Applied Energy, Elsevier, vol. 258(C).
    15. Suelane Garcia Fontes & Ronaldo Gonçalves Morato & Silvio Luiz Stanzani & Pedro Luiz Pizzigatti Corrêa, 2021. "Jaguar movement behavior: using trajectories and association rule mining algorithms to unveil behavioral states and social interactions," PLOS ONE, Public Library of Science, vol. 16(2), pages 1-18, February.
    16. Mulenga, Brian P. & Raper, Kellie Curry & Peel, Derrell S., 2020. "A Market Basket Analysis of Beef Calf Management Practice Adoption," Journal of Agricultural and Resource Economics, Western Agricultural Economics Association, vol. 46(2), August.
    17. Da-Yeong Lee & Dae-Seong Lee & Young-Seuk Park, 2022. "Taxonomic and Functional Diversity of Benthic Macroinvertebrate Assemblages in Reservoirs of South Korea," IJERPH, MDPI, vol. 20(1), pages 1-17, December.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jftint:v:12:y:2020:i:4:p:71-:d:346887. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.