IDEAS home Printed from https://ideas.repec.org/a/gam/jstats/v5y2021i1p1-11d709232.html
   My bibliography  Save this article

Spectral Clustering of Mixed-Type Data

Author

Listed:
  • Felix Mbuga

    (Department of Mathematics and Statistics, San José State University, San Jose, CA 95116, USA
    These authors contributed equally to this work.)

  • Cristina Tortora

    (Department of Mathematics and Statistics, San José State University, San Jose, CA 95116, USA
    These authors contributed equally to this work.)

Abstract

Cluster analysis seeks to assign objects with similar characteristics into groups called clusters so that objects within a group are similar to each other and dissimilar to objects in other groups. Spectral clustering has been shown to perform well in different scenarios on continuous data: it can detect convex and non-convex clusters, and can detect overlapping clusters. However, the constraint on continuous data can be limiting in real applications where data are often of mixed-type, i.e., data that contains both continuous and categorical features. This paper looks at extending spectral clustering to mixed-type data. The new method replaces the Euclidean-based similarity distance used in conventional spectral clustering with different dissimilarity measures for continuous and categorical variables. A global dissimilarity measure is than computed using a weighted sum, and a Gaussian kernel is used to convert the dissimilarity matrix into a similarity matrix. The new method includes an automatic tuning of the variable weight and kernel parameter. The performance of spectral clustering in different scenarios is compared with that of two state-of-the-art mixed-type data clustering methods, k -prototypes and KAMILA, using several simulated and real data sets.

Suggested Citation

  • Felix Mbuga & Cristina Tortora, 2021. "Spectral Clustering of Mixed-Type Data," Stats, MDPI, vol. 5(1), pages 1-11, December.
  • Handle: RePEc:gam:jstats:v:5:y:2021:i:1:p:1-11:d:709232
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2571-905X/5/1/1/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2571-905X/5/1/1/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Hornik, Kurt & Grün, Bettina, 2014. "movMF: An R Package for Fitting Mixtures of von Mises-Fisher Distributions," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 58(i10).
    2. Alexander H. Foss & Marianthi Markatou & Bonnie Ray, 2019. "Distance Metrics and Clustering Methods for Mixed‐type Data," International Statistical Review, International Statistical Institute, vol. 87(1), pages 80-109, April.
    3. J. A. Hartigan & M. A. Wong, 1979. "A K‐Means Clustering Algorithm," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 28(1), pages 100-108, March.
    4. Damien McParland & Isobel Claire Gormley, 2016. "Model based clustering for mixed data: clustMD," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 10(2), pages 155-169, June.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Jamotton, Charlotte & Hainaut, Donatien & Hames, Thomas, 2023. "Insurance analytics with clustering techniques," LIDAM Discussion Papers ISBA 2023002, Université catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Xu, Jing & Wang, Xiaoying & Gu, Yujiong & Ma, Suxia, 2023. "A data-based day-ahead scheduling optimization approach for regional integrated energy systems with varying operating conditions," Energy, Elsevier, vol. 283(C).
    2. Carlos Carrasco-Farré, 2022. "The fingerprints of misinformation: how deceptive content differs from reliable sources in terms of cognitive effort and appeal to emotions," Palgrave Communications, Palgrave Macmillan, vol. 9(1), pages 1-18, December.
    3. Zhang, Weibin & Zha, Huazhu & Zhang, Shuai & Ma, Lei, 2023. "Road section traffic flow prediction method based on the traffic factor state network," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 618(C).
    4. Michal Bernardelli & Zbigniew Korzeb & Pawel Niedziolka, 2021. "The banking sector as the absorber of the COVID-19 crisis’ economic consequences: perception of WSE investors," Oeconomia Copernicana, Institute of Economic Research, vol. 12(2), pages 335-374, June.
    5. Jelle R Dalenberg & Luca Nanetti & Remco J Renken & René A de Wijk & Gert J ter Horst, 2014. "Dealing with Consumer Differences in Liking during Repeated Exposure to Food; Typical Dynamics in Rating Behavior," PLOS ONE, Public Library of Science, vol. 9(3), pages 1-11, March.
    6. Custodio João, Igor & Lucas, André & Schaumburg, Julia & Schwaab, Bernd, 2023. "Dynamic clustering of multivariate panel data," Journal of Econometrics, Elsevier, vol. 237(2).
    7. Carlos Fernández-Hernández & Carmelo J. León & Jorge E. Araña & Flora Díaz-Pére, 2016. "Market segmentation, activities and environmental behaviour in rural tourism," Tourism Economics, , vol. 22(5), pages 1033-1054, October.
    8. Hafid Kadi & Mohammed Rebbah & Boudjelal Meftah & Olivier Lézoray, 2021. "A Data Representation Model for Personalized Medicine," International Journal of Healthcare Information Systems and Informatics (IJHISI), IGI Global, vol. 16(4), pages 1-25, October.
    9. Zhang, Tonglin & Lin, Ge, 2021. "Generalized k-means in GLMs with applications to the outbreak of COVID-19 in the United States," Computational Statistics & Data Analysis, Elsevier, vol. 159(C).
    10. Selosse, Margot & Jacques, Julien & Biernacki, Christophe, 2020. "Model-based co-clustering for mixed type data," Computational Statistics & Data Analysis, Elsevier, vol. 144(C).
    11. Arthur Pewsey & Eduardo García-Portugués, 2021. "Recent advances in directional statistics," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 30(1), pages 1-58, March.
    12. Andreas Lackner & Michael Müller & Magdalena Gamperl & Delyana Stoeva & Olivia Langmann & Henrieta Papuchova & Elisabeth Roitinger & Gerhard Dürnberger & Richard Imre & Karl Mechtler & Paulina A. Lato, 2023. "The Fgf/Erf/NCoR1/2 repressive axis controls trophoblast cell fate," Nature Communications, Nature, vol. 14(1), pages 1-20, December.
    13. Utkarsh J. Dang & Michael P.B. Gallaugher & Ryan P. Browne & Paul D. McNicholas, 2023. "Model-Based Clustering and Classification Using Mixtures of Multivariate Skewed Power Exponential Distributions," Journal of Classification, Springer;The Classification Society, vol. 40(1), pages 145-167, April.
    14. Beibei Yu & Zhonghui Wang & Haowei Mu & Li Sun & Fengning Hu, 2019. "Identification of Urban Functional Regions Based on Floating Car Track Data and POI Data," Sustainability, MDPI, vol. 11(23), pages 1-18, November.
    15. You, Kisung & Suh, Changhee, 2022. "Parameter estimation and model-based clustering with spherical normal distribution on the unit hypersphere," Computational Statistics & Data Analysis, Elsevier, vol. 171(C).
    16. Liguo Fei & Jun Xia & Yuqiang Feng & Luning Liu, 2019. "A novel method to determine basic probability assignment in Dempster–Shafer theory and its application in multi-sensor information fusion," International Journal of Distributed Sensor Networks, , vol. 15(7), pages 15501477198, July.
    17. Giuseppe Pandolfo & Antonio D’ambrosio, 2023. "Clustering directional data through depth functions," Computational Statistics, Springer, vol. 38(3), pages 1487-1506, September.
    18. Mantas Svazas & Valentinas Navickas & Yuriy Bilan & Joanna Nakonieczny & Jana Spankova, 2021. "Biomass Clusterization from a Regional Perspective: The Case of Lithuania," Energies, MDPI, vol. 14(21), pages 1-15, October.
    19. Bernd Scherer & Diogo Judice & Stephan Kessler, 2010. "Price reversals in global equity markets," Journal of Asset Management, Palgrave Macmillan, vol. 11(5), pages 332-345, December.
    20. Christophe Biernacki & Alexandre Lourme, 2019. "Unifying data units and models in (co-)clustering," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(1), pages 7-31, March.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jstats:v:5:y:2021:i:1:p:1-11:d:709232. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.