IDEAS home Printed from https://ideas.repec.org/a/gam/jftint/v15y2023i10p341-d1262259.html
   My bibliography  Save this article

kClusterHub: An AutoML-Driven Tool for Effortless Partition-Based Clustering over Varied Data Types

Author

Listed:
  • Konstantinos Gratsos

    (Department of Information and Electronic Engineering, School of Engineering, International Hellenic University, Sindos, 57400 Thessaloniki, Greece)

  • Stefanos Ougiaroglou

    (Department of Information and Electronic Engineering, School of Engineering, International Hellenic University, Sindos, 57400 Thessaloniki, Greece)

  • Dionisis Margaris

    (Department of Digital Systems, School of Economics and Technology, University of the Peloponnese, 23100 Sparta, Greece)

Abstract

Partition-based clustering is widely applied over diverse domains. Researchers and practitioners from various scientific disciplines engage with partition-based algorithms relying on specialized software or programming libraries. Addressing the need to bridge the knowledge gap associated with these tools, this paper introduces kClusterHub, an AutoML-driven web tool that simplifies the execution of partition-based clustering over numerical, categorical and mixed data types, while facilitating the identification of the optimal number of clusters, using the elbow method. Through automatic feature analysis, kClusterHub selects the most appropriate algorithm from the trio of k-means, k-modes, and k-prototypes. By empowering users to seamlessly upload datasets and select features, kClusterHub selects the algorithm, provides the elbow graph, recommends the optimal number of clusters, executes clustering, and presents the cluster assignment, through tabular representations and exploratory plots. Therefore, kClusterHub reduces the need for specialized software and programming skills, making clustering more accessible to non-experts. For further enhancing its utility, kClusterHub integrates a REST API to support the programmatic execution of cluster analysis. The paper concludes with an evaluation of kClusterHub’s usability via the System Usability Scale and CPU performance experiments. The results emerge that kClusterHub is a streamlined, efficient and user-friendly AutoML-inspired tool for cluster analysis.

Suggested Citation

  • Konstantinos Gratsos & Stefanos Ougiaroglou & Dionisis Margaris, 2023. "kClusterHub: An AutoML-Driven Tool for Effortless Partition-Based Clustering over Varied Data Types," Future Internet, MDPI, vol. 15(10), pages 1-22, October.
  • Handle: RePEc:gam:jftint:v:15:y:2023:i:10:p:341-:d:1262259
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/1999-5903/15/10/341/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/1999-5903/15/10/341/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Ziqi Jia & Ling Song, 2020. "Weighted k-Prototypes Clustering Algorithm Based on the Hybrid Dissimilarity Coefficient," Mathematical Problems in Engineering, Hindawi, vol. 2020, pages 1-13, July.
    2. Brock, Guy & Pihur, Vasyl & Datta, Susmita & Datta, Somnath, 2008. "clValid: An R Package for Cluster Validation," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 25(i04).
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Patrick Zschech & Kai Heinrich & Raphael Bink & Janis S. Neufeld, 2019. "Prognostic Model Development with Missing Labels," Business & Information Systems Engineering: The International Journal of WIRTSCHAFTSINFORMATIK, Springer;Gesellschaft für Informatik e.V. (GI), vol. 61(3), pages 327-343, June.
    2. Gainbi Park & Zengwang Xu, 2022. "The constituent components and local indicator variables of social vulnerability index," Natural Hazards: Journal of the International Society for the Prevention and Mitigation of Natural Hazards, Springer;International Society for the Prevention and Mitigation of Natural Hazards, vol. 110(1), pages 95-120, January.
    3. Ana Alina Tudoran, 2022. "A machine learning approach to identifying decision-making styles for managing customer relationships," Electronic Markets, Springer;IIM University of St. Gallen, vol. 32(1), pages 351-374, March.
    4. Wu, Han-Ming, 2011. "On biological validity indices for soft clustering algorithms for gene expression data," Computational Statistics & Data Analysis, Elsevier, vol. 55(5), pages 1969-1979, May.
    5. Drago, Carlo & Fortuna, Fabio, 2023. "Investigating the Corporate Governance and Sustainability Relationship: A Bibliometric Analysis Using Keyword-Ensemble Community Detection," FEEM Working Papers 336985, Fondazione Eni Enrico Mattei (FEEM).
    6. Wu, Tong & Rocha, Juan C. & Berry, Kevin & Chaigneau, Tomas & Hamann, Maike & Lindkvist, Emilie & Qiu, Jiangxiao & Schill, Caroline & Shepon, Alon & Crépin, Anne-Sophie & Folke, Carl, 2024. "Triple Bottom Line or Trilemma? Global Tradeoffs Between Prosperity, Inequality, and the Environment," World Development, Elsevier, vol. 178(C).
    7. Titov Sergei & Trachuk Arkady & Linder Natalya & RD Pathak & Danny Samson & Zafar Husain & S Sushil, 2023. "Digital transformation enablers in high-tech and low-tech companies: A comparative analysis," Australian Journal of Management, Australian School of Business, vol. 48(4), pages 801-843, November.
    8. Volodymyr Melnykov & Xuwen Zhu, 2019. "An extension of the K-means algorithm to clustering skewed data," Computational Statistics, Springer, vol. 34(1), pages 373-394, March.
    9. Sara Dolnicar & Friedrich Leisch, 2017. "Using segment level stability to select target segments in data-driven market segmentation studies," Marketing Letters, Springer, vol. 28(3), pages 423-436, September.
    10. Lynde Tan & Russell Thomson & Joyce Hwee Ling Koh & Alice Chik, 2023. "Teaching Multimodal Literacies with Digital Technologies and Augmented Reality: A Cluster Analysis of Australian Teachers’ TPACK," Sustainability, MDPI, vol. 15(13), pages 1-15, June.
    11. Humphreys, John M. & Srygley, Robert B. & Lawton, Douglas & Hudson, Amy R. & Branson, David H., 2022. "Grasshoppers exhibit asynchrony and spatial non-stationarity in response to the El Niño/Southern and Pacific Decadal Oscillations," Ecological Modelling, Elsevier, vol. 471(C).
    12. Carmen Llorente-Barroso & María Sánchez-Valle & Mónica Viñarás-Abad, 2023. "The role of the Internet in later life autonomy: Silver surfers in Spain," Palgrave Communications, Palgrave Macmillan, vol. 10(1), pages 1-20, December.
    13. Guiomar, N. & Godinho, S. & Pinto-Correia, T. & Almeida, M. & Bartolini, F. & Bezák, P. & Biró, M. & Bjørkhaug, H. & Bojnec, Š. & Brunori, G. & Corazzin, M. & Czekaj, M. & Davidova, S. & Kania, J. & K, 2018. "Typology and distribution of small farms in Europe: Towards a better picture," Land Use Policy, Elsevier, vol. 75(C), pages 784-798.
    14. Bongiorno, Christian & Miccichè, Salvatore & Mantegna, Rosario N., 2022. "Statistically validated hierarchical clustering: Nested partitions in hierarchical trees," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 593(C).
    15. Roberto Benocci & H. Eduardo Roman & Alessandro Bisceglie & Fabio Angelini & Giovanni Brambilla & Giovanni Zambon, 2021. "Eco-Acoustic Assessment of an Urban Park by Statistical Analysis," Sustainability, MDPI, vol. 13(14), pages 1-19, July.
    16. Roberto Benocci & Giovanni Brambilla & Alessandro Bisceglie & Giovanni Zambon, 2020. "Eco-Acoustic Indices to Evaluate Soundscape Degradation Due to Human Intrusion," Sustainability, MDPI, vol. 12(24), pages 1-19, December.
    17. Trudie Strauss & Michael Johan von Maltitz, 2017. "Generalising Ward’s Method for Use with Manhattan Distances," PLOS ONE, Public Library of Science, vol. 12(1), pages 1-21, January.
    18. Nuri Cihat Onat & Galal M. Abdella & Murat Kucukvar & Adeeb A. Kutty & Munera Al‐Nuaimi & Gürkan Kumbaroğlu & Melih Bulu, 2021. "How eco‐efficient are electric vehicles across Europe? A regionalized life cycle assessment‐based eco‐efficiency analysis," Sustainable Development, John Wiley & Sons, Ltd., vol. 29(5), pages 941-956, September.
    19. Robert Darkins & Emma J Cooke & Zoubin Ghahramani & Paul D W Kirk & David L Wild & Richard S Savage, 2013. "Accelerating Bayesian Hierarchical Clustering of Time Series Data with a Randomised Algorithm," PLOS ONE, Public Library of Science, vol. 8(4), pages 1-9, April.
    20. Marta Rocchi & Guglielmo Pescatore, 2022. "Modeling narrative features in TV series: coding and clustering analysis," Palgrave Communications, Palgrave Macmillan, vol. 9(1), pages 1-11, December.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jftint:v:15:y:2023:i:10:p:341-:d:1262259. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.