IDEAS home Printed from https://ideas.repec.org/a/spr/jclass/v39y2022i2d10.1007_s00357-022-09411-1.html
   My bibliography  Save this article

Batch Self-Organizing Maps for Distributional Data with an Automatic Weighting of Variables and Components

Author

Listed:
  • Francisco de A. T. Carvalho

    (Universidade Federal de Pernambuco)

  • Antonio Irpino

    (University of Campania L. Vanvitelli)

  • Rosanna Verde

    (University of Campania L. Vanvitelli)

  • Antonio Balzanella

    (University of Campania L. Vanvitelli)

Abstract

This paper deals with a batch self organizing map algorithm for data described by distributional-valued variables (DBSOM). Such variables are characterized to take as values probability or frequency distributions on numeric support. According to the nature of the data, the loss function is based on the L2 Wasserstein distance, that is one of the most used metrics to compare distributions in the context of distributional data analysis. Besides, to consider the different contributions of the variables, four adaptive versions of the DBSOM algorithm are proposed. Relevance weights are automatically learned, one for each distributional-valued variable, in an additional step of the algorithm. Since the L2 Wasserstein metric allows a decomposition of the distance into two components, one related to the means and one related to the size and shape of the distributions, relevance weights are automatically learned for each of the two components to emphasize the importance of the different characteristics, related to the moments of the distributions, on the distance value. The proposed algorithms are corroborated by applications on real distributional-valued data sets.

Suggested Citation

  • Francisco de A. T. Carvalho & Antonio Irpino & Rosanna Verde & Antonio Balzanella, 2022. "Batch Self-Organizing Maps for Distributional Data with an Automatic Weighting of Variables and Components," Journal of Classification, Springer;The Classification Society, vol. 39(2), pages 343-375, July.
  • Handle: RePEc:spr:jclass:v:39:y:2022:i:2:d:10.1007_s00357-022-09411-1
    DOI: 10.1007/s00357-022-09411-1
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s00357-022-09411-1
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s00357-022-09411-1?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Antonio Irpino & Rosanna Verde, 2015. "Basic statistics for distributional symbolic variables: a new metric-based approach," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 9(2), pages 143-175, June.
    2. Glenn Milligan & Martha Cooper, 1988. "A study of standardization of variables in cluster analysis," Journal of Classification, Springer;The Classification Society, vol. 5(2), pages 181-204, September.
    3. M. Vrac & L. Billard & E. Diday & A. Chédin, 2012. "Copula analysis of mixture models," Computational Statistics, Springer, vol. 27(3), pages 427-457, September.
    4. Kim, Jaejik & Billard, L., 2011. "A polythetic clustering process and cluster validity indexes for histogram-valued objects," Computational Statistics & Data Analysis, Elsevier, vol. 55(7), pages 2250-2262, July.
    5. Alison L. Gibbs & Francis Edward Su, 2002. "On Choosing and Bounding Probability Metrics," International Statistical Review, International Statistical Institute, vol. 70(3), pages 419-435, December.
    6. Lawrence Hubert & Phipps Arabie, 1985. "Comparing partitions," Journal of Classification, Springer;The Classification Society, vol. 2(1), pages 193-218, December.
    7. Meila, Marina, 2007. "Comparing clusterings--an information based distance," Journal of Multivariate Analysis, Elsevier, vol. 98(5), pages 873-895, May.
    8. Jerome H. Friedman & Jacqueline J. Meulman, 2004. "Clustering objects on subsets of attributes (with discussion)," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 66(4), pages 815-849, November.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Mohammed Sabri & Rosanna Verde & Antonio Balzanella & Fabrizio Maturo & Hamid Tairi & Ali Yahyaouy & Jamal Riffi, 2024. "A Novel Classification Algorithm Based on the Synergy Between Dynamic Clustering with Adaptive Distances and K-Nearest Neighbors," Journal of Classification, Springer;The Classification Society, vol. 41(2), pages 264-288, July.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Roberto Rocci & Stefano Antonio Gattone & Roberto Di Mari, 2018. "A data driven equivariant approach to constrained Gaussian mixture modeling," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 12(2), pages 235-260, June.
    2. Juan Lucio & Raúl Mínguez & Asier Minondo & Francisco Requena, 2016. "Networks and the Dynamics of Firms' Export Portfolio: Evidence for Mexico," The World Economy, Wiley Blackwell, vol. 39(5), pages 708-736, May.
    3. Aurora Torrente & Juan Romo, 2021. "Initializing k-means Clustering by Bootstrap and Data Depth," Journal of Classification, Springer;The Classification Society, vol. 38(2), pages 232-256, July.
    4. Stefano Tonellato, 2019. "Bayesian nonparametric clustering as a community detection problem," Working Papers 2019: 20, Department of Economics, University of Venice "Ca' Foscari".
    5. Luis Lorenzo & Javier Arroyo, 2022. "Analysis of the cryptocurrency market using different prototype-based clustering techniques," Financial Innovation, Springer;Southwestern University of Finance and Economics, vol. 8(1), pages 1-46, December.
    6. O’Hagan, Adrian & Murphy, Thomas Brendan & Gormley, Isobel Claire & McNicholas, Paul D. & Karlis, Dimitris, 2016. "Clustering with the multivariate normal inverse Gaussian distribution," Computational Statistics & Data Analysis, Elsevier, vol. 93(C), pages 18-30.
    7. Michael C. Thrun & Alfred Ultsch, 2021. "Using Projection-Based Clustering to Find Distance- and Density-Based Clusters in High-Dimensional Data," Journal of Classification, Springer;The Classification Society, vol. 38(2), pages 280-312, July.
    8. Julian Rossbroich & Jeffrey Durieux & Tom F. Wilderjans, 2022. "Model Selection Strategies for Determining the Optimal Number of Overlapping Clusters in Additive Overlapping Partitional Clustering," Journal of Classification, Springer;The Classification Society, vol. 39(2), pages 264-301, July.
    9. Nilsen Gro & Borgan Ørnulf & LiestØl Knut & Lingjærde Ole Christian, 2013. "Identifying clusters in genomics data by recursive partitioning," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 12(5), pages 637-652, October.
    10. Weinand, J.M. & McKenna, R. & Fichtner, W., 2019. "Developing a municipality typology for modelling decentralised energy systems," Utilities Policy, Elsevier, vol. 57(C), pages 75-96.
    11. Grn, Bettina & Leisch, Friedrich, 2009. "Dealing with label switching in mixture models under genuine multimodality," Journal of Multivariate Analysis, Elsevier, vol. 100(5), pages 851-861, May.
    12. Alan Lee & Bobby Willcox, 2014. "Minkowski Generalizations of Ward’s Method in Hierarchical Clustering," Journal of Classification, Springer;The Classification Society, vol. 31(2), pages 194-218, July.
    13. Isabella Morlini & Sergio Zani, 2012. "Dissimilarity and similarity measures for comparing dendrograms and their applications," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 6(2), pages 85-105, July.
    14. Ying Liu & Sudha Ram & Robert F. Lusch & Michael Brusco, 2010. "Multicriterion Market Segmentation: A New Model, Implementation, and Evaluation," Marketing Science, INFORMS, vol. 29(5), pages 880-894, 09-10.
    15. Kemmawadee Preedalikit & Daniel Fernández & Ivy Liu & Louise McMillan & Marta Nai Ruscone & Roy Costilla, 2024. "Row mixture-based clustering with covariates for ordinal responses," Computational Statistics, Springer, vol. 39(5), pages 2511-2555, July.
    16. Ekaterina Kovaleva & Boris Mirkin, 2015. "Bisecting K-Means and 1D Projection Divisive Clustering: A Unified Framework and Experimental Comparison," Journal of Classification, Springer;The Classification Society, vol. 32(3), pages 414-442, October.
    17. Douglas Steinley, 2007. "Validating Clusters with the Lower Bound for Sum-of-Squares Error," Psychometrika, Springer;The Psychometric Society, vol. 72(1), pages 93-106, March.
    18. Galimberti, Giuliano & Soffritti, Gabriele, 2007. "Model-based methods to identify multiple cluster structures in a data set," Computational Statistics & Data Analysis, Elsevier, vol. 52(1), pages 520-536, September.
    19. Christian Hennig, 2022. "An empirical comparison and characterisation of nine popular clustering methods," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 16(1), pages 201-229, March.
    20. Tsai, Chieh-Yuan & Chiu, Chuang-Cheng, 2008. "Developing a feature weight self-adjustment mechanism for a K-means clustering algorithm," Computational Statistics & Data Analysis, Elsevier, vol. 52(10), pages 4658-4672, June.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:jclass:v:39:y:2022:i:2:d:10.1007_s00357-022-09411-1. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.