IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v12y2024i2p346-d1323394.html
   My bibliography  Save this article

A Formalization of Multilabel Classification in Terms of Lattice Theory and Information Theory: Concerning Datasets

Author

Listed:
  • Francisco J. Valverde-Albacete

    (Department of Signal Theory and Communications, Telematic Systems and Computation, Universidad Rey Juan Carlos, 28942 Fuenlabrada, Madrid, Spain
    These authors contributed equally to this work.)

  • Carmen Peláez-Moreno

    (Department of Signal Theory and Communications, Universidad Carlos III de Madrid, 28911 Leganés, Madrid, Spain
    These authors contributed equally to this work.)

Abstract

Multilabel classification is a recently conceptualized task in machine learning. Contrary to most of the research that has so far focused on classification machinery, we take a data-centric approach and provide an integrative framework that blends qualitative and quantitative descriptions of multilabel data sources. By combining lattice theory, in the form of formal concept analysis, and entropy triangles, obtained from information theory, we explain from first principles the fundamental issues of multilabel datasets such as the dependencies of the labels, their imbalances, or the effects of the presence of hapaxes. This allows us to provide guidelines for resampling and new data collection and their relationship with broad modelling approaches. We have empirically validated our framework using 56 open datasets, challenging previous characterizations that prove that our formalization brings useful insights into the task of multilabel classification. Further work will consider the extension of this formalization to understand the relationship between the data sources, the classification methods, and ways to assess their performance.

Suggested Citation

  • Francisco J. Valverde-Albacete & Carmen Peláez-Moreno, 2024. "A Formalization of Multilabel Classification in Terms of Lattice Theory and Information Theory: Concerning Datasets," Mathematics, MDPI, vol. 12(2), pages 1-31, January.
  • Handle: RePEc:gam:jmathe:v:12:y:2024:i:2:p:346-:d:1323394
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/12/2/346/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/12/2/346/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Meila, Marina, 2007. "Comparing clusterings--an information based distance," Journal of Multivariate Analysis, Elsevier, vol. 98(5), pages 873-895, May.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Assaf Almog & Ferry Besamusca & Mel MacMahon & Diego Garlaschelli, 2015. "Mesoscopic Community Structure of Financial Markets Revealed by Price and Sign Fluctuations," PLOS ONE, Public Library of Science, vol. 10(7), pages 1-16, July.
    2. Huaylla, Claudia A. & Kuperman, Marcelo N. & Garibaldi, Lucas A., 2024. "Comparison of two statistical measures of complexity applied to ecological bipartite networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 642(C).
    3. Juan Lucio & Raúl Mínguez & Asier Minondo & Francisco Requena, 2016. "Networks and the Dynamics of Firms' Export Portfolio: Evidence for Mexico," The World Economy, Wiley Blackwell, vol. 39(5), pages 708-736, May.
    4. Assaf Almog & Ferry Besamusca & Mel MacMahon & Diego Garlaschelli, 2015. "Mesoscopic Community Structure of Financial Markets Revealed by Price and Sign Fluctuations," Papers 1504.00590, arXiv.org.
    5. Federico Botta & Charo I del Genio, 2017. "Analysis of the communities of an urban mobile phone network," PLOS ONE, Public Library of Science, vol. 12(3), pages 1-14, March.
    6. Stefano Tonellato, 2019. "Bayesian nonparametric clustering as a community detection problem," Working Papers 2019: 20, Department of Economics, University of Venice "Ca' Foscari".
    7. Lovro Šubelj & Nees Jan van Eck & Ludo Waltman, 2016. "Clustering Scientific Publications Based on Citation Relations: A Systematic Comparison of Different Methods," PLOS ONE, Public Library of Science, vol. 11(4), pages 1-23, April.
    8. Damien A Fair & Alexander L Cohen & Jonathan D Power & Nico U F Dosenbach & Jessica A Church & Francis M Miezin & Bradley L Schlaggar & Steven E Petersen, 2009. "Functional Brain Networks Develop from a “Local to Distributed” Organization," PLOS Computational Biology, Public Library of Science, vol. 5(5), pages 1-14, May.
    9. O’Hagan, Adrian & Murphy, Thomas Brendan & Gormley, Isobel Claire & McNicholas, Paul D. & Karlis, Dimitris, 2016. "Clustering with the multivariate normal inverse Gaussian distribution," Computational Statistics & Data Analysis, Elsevier, vol. 93(C), pages 18-30.
    10. Daniel Straulino & Mattie Landman & Neave O'Clery, 2020. "A bi-directional approach to comparing the modular structure of networks," Papers 2010.06568, arXiv.org.
    11. Alessandro Chessa & Pierpaolo D’Urso & Livia Giovanni & Vincenzina Vitale & Alfonso Gebbia, 2023. "Complex networks for community detection of basketball players," Annals of Operations Research, Springer, vol. 325(1), pages 363-389, June.
    12. Piccardi, Carlo & Calatroni, Lisa & Bertoni, Fabio, 2010. "Communities in Italian corporate networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 389(22), pages 5247-5258.
    13. Luciana Crosilla & Marco Malgarini, 2011. "Behavioural models for manufacturing firms: analysing survey data," ECONOMIA E POLITICA INDUSTRIALE, FrancoAngeli Editore, vol. 2011(4), pages 139-163.
    14. Claudio Conversano & Massimo Cannas & Francesco Mola & Emiliano Sironi, 2019. "Random effects clustering in multilevel modeling: choosing a proper partition," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(1), pages 279-301, March.
    15. Lou, Hao & Li, Shenghong & Zhao, Yuxin, 2013. "Detecting community structure using label propagation with weighted coherent neighborhood propinquity," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 392(14), pages 3095-3105.
    16. Alan Lee & Bobby Willcox, 2014. "Minkowski Generalizations of Ward’s Method in Hierarchical Clustering," Journal of Classification, Springer;The Classification Society, vol. 31(2), pages 194-218, July.
    17. Francisco de A. T. Carvalho & Antonio Irpino & Rosanna Verde & Antonio Balzanella, 2022. "Batch Self-Organizing Maps for Distributional Data with an Automatic Weighting of Variables and Components," Journal of Classification, Springer;The Classification Society, vol. 39(2), pages 343-375, July.
    18. Elena Farahbakhsh Touli & Hoang Nguyen & Olha Bodnar, 2022. "Monitoring the Dynamic Networks of Stock Returns," Papers 2210.16679, arXiv.org.
    19. David Samu & Anil K Seth & Thomas Nowotny, 2014. "Influence of Wiring Cost on the Large-Scale Architecture of Human Cortical Connectivity," PLOS Computational Biology, Public Library of Science, vol. 10(4), pages 1-24, April.
    20. Neave O'Clery & Samuel Heroy & Francois Hulot & Mariano Beguerisse-D'iaz, 2019. "Unravelling the forces underlying urban industrial agglomeration," Papers 1903.09279, arXiv.org, revised Jun 2019.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:12:y:2024:i:2:p:346-:d:1323394. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.