IDEAS home Printed from https://ideas.repec.org/a/spr/metcap/v25y2023i4d10.1007_s11009-023-10055-w.html
   My bibliography  Save this article

A New Separation Index and Classification Techniques Based on Shannon Entropy

Author

Listed:
  • Jorge Navarro

    (Universidad de Murcia)

  • Francesco Buono

    (Università di Napoli Federico II
    RWTH Aachen University)

  • Jorge M. Arevalillo

    (UC3M-Santander Big Data Institute
    University Nacional Educación a Distancia (UNED))

Abstract

The purpose is to use Shannon entropy measures to develop classification techniques and an index which estimates the separation of the groups in a finite mixture model. These measures can be applied to machine learning techniques such as discriminant analysis, cluster analysis, exploratory data analysis, etc. If we know the number of groups and we have training samples from each group (supervised learning) the index is used to measure the separation of the groups. Here some entropy measures are used to classify new individuals in one of these groups. If we are not sure about the number of groups (unsupervised learning), the index can be used to determine the optimal number of groups from an entropy (information/uncertainty) criterion. It can also be used to determine the best variables in order to separate the groups. In all the cases we assume that we have absolutely continuous random variables and we use the Shannon entropy based on the probability density function. Theoretical, parametric and non-parametric techniques are proposed to get approximations of these entropy measures in practice. An application to gene selection in a colon cancer discrimination study with a lot of variables is provided as well.

Suggested Citation

  • Jorge Navarro & Francesco Buono & Jorge M. Arevalillo, 2023. "A New Separation Index and Classification Techniques Based on Shannon Entropy," Methodology and Computing in Applied Probability, Springer, vol. 25(4), pages 1-24, December.
  • Handle: RePEc:spr:metcap:v:25:y:2023:i:4:d:10.1007_s11009-023-10055-w
    DOI: 10.1007/s11009-023-10055-w
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11009-023-10055-w
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11009-023-10055-w?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Gilles Celeux & Gilda Soromenho, 1996. "An entropy criterion for assessing the number of clusters in a mixture model," Journal of Classification, Springer;The Classification Society, vol. 13(2), pages 195-212, September.
    2. Narayanaswamy Balakrishnan & Francesco Buono & Maria Longobardi, 2022. "On Cumulative Entropies in Terms of Moments of Order Statistics," Methodology and Computing in Applied Probability, Springer, vol. 24(1), pages 345-359, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Julian Aichholzer & Sylvia Kritzinger & Carolina Plescia, 2021. "National identity profiles and support for the European Union," European Union Politics, , vol. 22(2), pages 293-315, June.
    2. Adrian Bruhin & Ernst Fehr & Daniel Schunk, 2019. "The many Faces of Human Sociality: Uncovering the Distribution and Stability of Social Preferences," Journal of the European Economic Association, European Economic Association, vol. 17(4), pages 1025-1069.
    3. Nicoleta Serban & Huijing Jiang, 2012. "Multilevel Functional Clustering Analysis," Biometrics, The International Biometric Society, vol. 68(3), pages 805-814, September.
    4. Jacky C. K. Ng & Joanne Y. H. Chong & Hilary K. Y. Ng, 2023. "The way I see the world, the way I envy others: a person-centered investigation of worldviews and the malicious and benign forms of envy among adolescents and adults," Palgrave Communications, Palgrave Macmillan, vol. 10(1), pages 1-11, December.
    5. Gillian C. Williams & Karen A. Patte & Mark A. Ferro & Scott T. Leatherdale, 2021. "Associations between Longitudinal Patterns of Substance Use and Anxiety and Depression Symptoms among a Sample of Canadian Secondary School Students," IJERPH, MDPI, vol. 18(19), pages 1-14, October.
    6. Mélissa Lemoine & Gerhard Gmel & Simon Foster & Simon Marmet & Joseph Studer, 2020. "Multiple trajectories of alcohol use and the development of alcohol use disorder: Do Swiss men mature-out of problematic alcohol use during emerging adulthood?," PLOS ONE, Public Library of Science, vol. 15(1), pages 1-17, January.
    7. Sarstedt, Marko & Salcher, André, 2007. "Modellselektion in Finite Mixture PLS-Modellen," Discussion Papers in Business Administration 1394, University of Munich, Munich School of Management.
    8. Lebret, Rémi & Iovleff, Serge & Langrognet, Florent & Biernacki, Christophe & Celeux, Gilles & Govaert, Gérard, 2015. "Rmixmod: The R Package of the Model-Based Unsupervised, Supervised, and Semi-Supervised Classification Mixmod Library," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 67(i06).
    9. Andrew Clark & Fabien Postel-Vinay, 2009. "Job security and job protection," Oxford Economic Papers, Oxford University Press, vol. 61(2), pages 207-239, April.
    10. Ellen Bouchery & Monica Farid, "undated". "Variation in Staff Salary Costs Associated with Characteristics of Substance Use Disorder Treatment Facilities," Mathematica Policy Research Reports 65b1484724354c0ca8270d1c6, Mathematica Policy Research.
    11. Wijesundera, Isuri & Halgamuge, Malka N. & Nirmalathas, Ampalavanapillai & Nanayakkara, Thrishantha, 2016. "MFPT calculation for random walks in inhomogeneous networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 462(C), pages 986-1002.
    12. Wang, Kun & Marbut, Alexander R. & Suntai, Zainab & Zheng, Dianhan & Chen, Xiayu, 2022. "Patterns in older adults' perceived chronic stressor types and cognitive functioning trajectories: Are perceived chronic stressors always bad?," Social Science & Medicine, Elsevier, vol. 311(C).
    13. Tolu O Oyesanya & Roger L Brown & Lyn S Turkstra, 2017. "Caring for Patients with traumatic brain injury: a survey of nurses' perceptions," Journal of Clinical Nursing, John Wiley & Sons, vol. 26(11-12), pages 1562-1574, June.
    14. Omar N. Solinger & Woody van Olffen & Robert A. Roe & Joeri Hofmans, 2013. "On Becoming (Un)Committed: A Taxonomy and Test of Newcomer Onboarding Scenarios," Organization Science, INFORMS, vol. 24(6), pages 1640-1661, December.
    15. Anne Mäkikangas & Wilmar Schaufeli & Esko Leskinen & Ulla Kinnunen & Katriina Hyvönen & Taru Feldt, 2016. "Long-Term Development of Employee Well-Being: A Latent Transition Approach," Journal of Happiness Studies, Springer, vol. 17(6), pages 2325-2345, December.
    16. Mengya Xia & Caitlin M. Hudac, 2023. "Social Connection Constellations and Individual Well-Being Typologies: Using the Loglinear Modeling Approach with Latent Variables," Journal of Happiness Studies, Springer, vol. 24(6), pages 1991-2012, August.
    17. Laura Dal Corso & Alessandro De Carlo & Francesca Carluccio & Daiana Colledani & Alessandra Falco, 2020. "Employee burnout and positive dimensions of well-being: A latent workplace spirituality profile analysis," PLOS ONE, Public Library of Science, vol. 15(11), pages 1-17, November.
    18. Yedith B. Guillén-Fernández, 2024. "Socioeconomic Factors Determining Multidimensional Child Poverty Groups in Central America: A Measurement Proposal from the Wellbeing Approach Using a Comprehensive Set of Children’s Rights," Child Indicators Research, Springer;The International Society of Child Indicators (ISCI), vol. 17(5), pages 2175-2217, October.
    19. Meng Li & Sijia Xiang & Weixin Yao, 2016. "Robust estimation of the number of components for mixtures of linear regression models," Computational Statistics, Springer, vol. 31(4), pages 1539-1555, December.
    20. Thomas Bassetti & Raul Caruso & Darwin Cortes, 2015. "Behavioral differences in violence: The case of intra-group differences of Paramilitaries and Guerrillas in Colombia," DISCE - Quaderni del Dipartimento di Politica Economica ispe0073, Università Cattolica del Sacro Cuore, Dipartimenti e Istituti di Scienze Economiche (DISCE).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:metcap:v:25:y:2023:i:4:d:10.1007_s11009-023-10055-w. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.