IDEAS home Printed from https://ideas.repec.org/a/wsi/ijitdm/v19y2020i01ns0219622019300064.html
   My bibliography  Save this article

Clustering Categorical Data: A Survey

Author

Listed:
  • Sami Naouali

    (Virtual Reality and Information Technologies, Military Academy of Fondouk Jedid, Nabeul, Tunisia)

  • Semeh Ben Salem

    (Polytechnic School of Tunisia, La Marsa, Tunis B.P. 743, Rue El Khawarizmi 2078, Tunisia)

  • Zied Chtourou

    (Digital Research Center of Sfax, B.P. 275, Sakiet Ezzit, Sfax 3021, Tunisia)

Abstract

Clustering is a complex unsupervised method used to group most similar observations of a given dataset within the same cluster. To guarantee high efficiency, the clustering process should ensure high accuracy and low complexity. Many clustering methods were developed in various fields depending on the type of application and the data type considered. Categorical clustering considers segmenting a dataset in which the data are categorical and were widely used in many real-world applications. Thus several methods were developed including hard, fuzzy and rough set-based methods. In this survey, more than 30 categorical clustering algorithms were investigated. These methods were classified into hierarchical and partitional clustering methods and classified in terms of their accuracy, precision and recall to identify the most prominent ones. Experimental results show that rough set-based clustering methods provided better efficiency than hard and fuzzy methods. Besides, methods based on the initialization of the centroids also provided good results.

Suggested Citation

  • Sami Naouali & Semeh Ben Salem & Zied Chtourou, 2020. "Clustering Categorical Data: A Survey," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 19(01), pages 49-96, February.
  • Handle: RePEc:wsi:ijitdm:v:19:y:2020:i:01:n:s0219622019300064
    DOI: 10.1142/S0219622019300064
    as

    Download full text from publisher

    File URL: https://www.worldscientific.com/doi/abs/10.1142/S0219622019300064
    Download Restriction: Access to full text is restricted to subscribers

    File URL: https://libkey.io/10.1142/S0219622019300064?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Cao, Fuyuan & Huang, Joshua Zhexue & Liang, Jiye, 2017. "A fuzzy SV-k-modes algorithm for clustering categorical data with set-valued attributes," Applied Mathematics and Computation, Elsevier, vol. 295(C), pages 1-15.
    2. Anna, Petrenko, 2016. "Мaркування готової продукції як складова частина інформаційного забезпечення маркетингової діяльності підприємств овочепродуктового підкомплексу," Agricultural and Resource Economics: International Scientific E-Journal, Agricultural and Resource Economics: International Scientific E-Journal, vol. 2(1), March.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Shi, Lingyuan & Yang, Xin & Chang, Ximing & Wu, Jianjun & Sun, Huijun, 2023. "An improved density peaks clustering algorithm based on k nearest neighbors and turning point for evaluating the severity of railway accidents," Reliability Engineering and System Safety, Elsevier, vol. 233(C).
    2. Golzari Oskouei, Amin & Balafar, Mohammad Ali & Motamed, Cina, 2021. "FKMAWCW: Categorical fuzzy k-modes clustering with automated attribute-weight and cluster-weight learning," Chaos, Solitons & Fractals, Elsevier, vol. 153(P1).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Vivian Welch & Christine M. Mathew & Panteha Babelmorad & Yanfei Li & Elizabeth T. Ghogomu & Johan Borg & Monserrat Conde & Elizabeth Kristjansson & Anne Lyddiatt & Sue Marcus & Jason W. Nickerson & K, 2021. "Health, social care and technological interventions to improve functional ability of older adults living at home: An evidence and gap map," Campbell Systematic Reviews, John Wiley & Sons, vol. 17(3), September.
    2. Persson, Petra & Qiu, Xinyao & Rossin-Slater, Maya, 2021. "Family Spillover Effects of Marginal Diagnoses: The Case of ADHD," IZA Discussion Papers 14020, Institute of Labor Economics (IZA).
    3. Menkhoff, Lukas & Miethe, Jakob, 2019. "Tax evasion in new disguise? Examining tax havens' international bank deposits," EconStor Open Access Articles and Book Chapters, ZBW - Leibniz Information Centre for Economics, vol. 176, pages 53-78.
    4. Ran Abramitzky & Roy Mill & Santiago Pérez, 2020. "Linking individuals across historical sources: A fully automated approach," Historical Methods: A Journal of Quantitative and Interdisciplinary History, Taylor & Francis Journals, vol. 53(2), pages 94-111, April.
    5. Werner Eichhorst & Ulf Rinne, 2017. "Digital Challenges for the Welfare State," CESifo Forum, ifo Institute - Leibniz Institute for Economic Research at the University of Munich, vol. 18(04), pages 03-08, December.
    6. Sant'Anna, Ana Claudia & Bergtold, Jason & Shanoyan, Aleksan & Caldas, Marcellus & Granco, Gabriel, 2021. "Deal or No Deal? Analysis of Bioenergy Feedstock Contract Choice with Multiple Opt-out Options and Contract Attribute Substitutability," 2021 Conference, August 17-31, 2021, Virtual 315289, International Association of Agricultural Economists.
    7. Tommaso Colussi & Ingo E. Isphording & Nico Pestel, 2021. "Minority Salience and Political Extremism," American Economic Journal: Applied Economics, American Economic Association, vol. 13(3), pages 237-271, July.
    8. Erkmen Giray Aslim, 2019. "The Relationship Between Health Insurance and Early Retirement: Evidence from the Affordable Care Act," Eastern Economic Journal, Palgrave Macmillan;Eastern Economic Association, vol. 45(1), pages 112-140, January.
    9. Balint, T. & Lamperti, F. & Mandel, A. & Napoletano, M. & Roventini, A. & Sapio, A., 2017. "Complexity and the Economics of Climate Change: A Survey and a Look Forward," Ecological Economics, Elsevier, vol. 138(C), pages 252-265.
    10. Edna P. Conwi & Alexander G. Cortez & Normita Ramos, 2016. "Effects of the Dualized Training Program on the Occupational Interest of the Students Enrolled in Bachelor of Science in Hotel and Restaurant Management," Indian Journal of Commerce and Management Studies, Educational Research Multimedia & Publications,India, vol. 7(1), pages 31-36, January.
    11. Nihan Akyelken, 2017. "Mobility-Related Economic Exclusion: Accessibility and Commuting Patterns in Industrial Zones in Turkey," Social Inclusion, Cogitatio Press, vol. 5(4), pages 175-182.
    12. Youngna Choi, 2022. "Economic Stimulus and Financial Instability: Recent Case of the U.S. Household," JRFM, MDPI, vol. 15(6), pages 1-25, June.
    13. Camillia Kong & John Coggon & Michael Dunn & Penny Cooper, 2019. "Judging Values and Participation in Mental Capacity Law," Laws, MDPI, vol. 8(1), pages 1-22, February.
    14. Dreher, Axel & Fuchs, Andreas & Langlotz, Sarah, 2019. "The effects of foreign aid on refugee flows," European Economic Review, Elsevier, vol. 112(C), pages 127-147.
    15. Dindo, Pietro & Massari, Filippo, 2020. "The wisdom of the crowd in dynamic economies," Theoretical Economics, Econometric Society, vol. 15(4), November.
    16. Ferrarini, Benno & Maupin, Julie & Hinojales , Marthe, 2017. "Distributed Ledger Technologies for Developing Asia," ADB Economics Working Paper Series 533, Asian Development Bank.
    17. Andrzej Cieślik & Sarhad Hamza, 2022. "Inward FDI, IFRS Adoption and Institutional Quality: Insights from the MENA Countries," IJFS, MDPI, vol. 10(3), pages 1-19, June.
    18. Georg Feigl & Markus Marterbauer & Miriam Rehm & Matthias Schnetzer & Sepp Zuckerstätter & Lars Nørvang Andersen & Thea Nissen & Signe Dahl & Peter Hohlfeld & Benjamin Lojak & Achim Truger & Andrew Wa, 2016. "The Elusive Recovery," SciencePo Working papers Main hal-03459084, HAL.
      • Georg Feigl & Markus Marterbauer & Miriam Rehm & Matthias Schnetzer & Sepp Zuckerstätter & Lars Nørvang Andersen & Thea Nissen & Signe Dahl & Peter Hohlfeld & Benjamin Lojak & Achim Truger & Andrew Wa, 2016. "The Elusive Recovery," PSE-Ecole d'économie de Paris (Postprint) hal-03459084, HAL.
      • Georg Feigl & Markus Marterbauer & Miriam Rehm & Matthias Schnetzer & Sepp Zuckerstätter & Lars Nørvang Andersen & Thea Nissen & Signe Dahl & Peter Hohlfeld & Benjamin Lojak & Thomas Theobald & Achim , 2016. "The Elusive Recovery," PSE Working Papers hal-03612850, HAL.
      • Georg Feigl & Markus Marterbauer & Miriam Rehm & Matthias Schnetzer & Sepp Zuckerstätter & Lars Nørvang Andersen & Thea Nissen & Signe Dahl & Peter Hohlfeld & Benjamin Lojak & Achim Truger & Andrew Wa, 2016. "The Elusive Recovery," Post-Print hal-03459084, HAL.
      • Georg Feigl & Markus Marterbauer & Miriam Rehm & Matthias Schnetzer & Sepp Zuckerstätter & Lars Nørvang Andersen & Thea Nissen & Signe Dahl & Peter Hohlfeld & Benjamin Lojak & Thomas Theobald & Achim , 2016. "The Elusive Recovery," Working Papers hal-03612850, HAL.
      • Georg Feigl & Markus Marterbauer & Miriam Rehm & Matthias Schnetzer & Sepp Zuckerstätter & Lars Nørvang Andersen & Thea Nissen & Signe Dahl & Peter Hohlfeld & Benjamin Lojak & Thomas Theobald & Achim , 2016. "The Elusive Recovery," SciencePo Working papers Main hal-03612850, HAL.
      • Georg Feigl & Markus Marterbauer & Miriam Rehm & Matthias Schnetzer & Sepp Zuckerstätter & Lars Nørvang Andersen & Thea Nissen & Signe Dahl & Peter Hohlfeld & Benjamin Lojak & Thomas Theobald & Achim , 2016. "The Elusive Recovery," PSE-Ecole d'économie de Paris (Postprint) hal-03612850, HAL.
    19. Billari, Francesco C. & Giuntella, Osea & Stella, Luca, 2018. "Broadband internet, digital temptations, and sleep," Journal of Economic Behavior & Organization, Elsevier, vol. 153(C), pages 58-76.
    20. Anastasios Evgenidis & Apostolos Fasianos, 2019. "Monetary Policy and Wealth Inequalities in Great Britain: Assessing the role of unconventional policies for a decade of household data," Papers 1912.09702, arXiv.org.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:wsi:ijitdm:v:19:y:2020:i:01:n:s0219622019300064. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Tai Tone Lim (email available below). General contact details of provider: http://www.worldscinet.com/ijitdm/ijitdm.shtml .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.