IDEAS home Printed from https://ideas.repec.org/a/gam/jsusta/v15y2023i3p1995-d1042327.html
   My bibliography  Save this article

Novel Features and Neighborhood Complexity Measures for Multiclass Classification of Hybrid Data

Author

Listed:
  • Francisco J. Camacho-Urriolagoitia

    (Centro de Investigación en Computación del Instituto Politécnico Nacional, Juan de Dios Bátiz s/n, Gustavo A. Madero, Mexico City 07738, Mexico)

  • Yenny Villuendas-Rey

    (Centro de Innovación y Desarrollo Tecnológico en Cómputo del Instituto Politécnico Nacional, Juan de Dios Bátiz s/n, Gustavo A. Madero, Mexico City 07700, Mexico)

  • Cornelio Yáñez-Márquez

    (Centro de Investigación en Computación del Instituto Politécnico Nacional, Juan de Dios Bátiz s/n, Gustavo A. Madero, Mexico City 07738, Mexico)

  • Miltiadis Lytras

    (School of Business, Deree—The American College of Greece, 6 Gravias Street, GR-153 42 Aghia Paraskevi, 15342 Athens, Greece
    College of Engineering, Effat University, Jeddah 21478, Saudi Arabia)

Abstract

The present capabilities for collecting and storing all kinds of data exceed the collective ability to analyze, summarize, and extract knowledge from this data. Knowledge management aims to automatically organize a systematic process of learning. Most meta-learning strategies are based on determining data characteristics, usually by computing data complexity measures. Such measures describe data characteristics related to size, shape, density, and other factors. However, most of the data complexity measures in the literature assume the classification problem is binary (just two decision classes), and that the data is numeric and has no missing values. The main contribution of this paper is that we extend four data complexity measures to overcome these drawbacks for characterizing multiclass, hybrid, and incomplete supervised data. We change the formulation of Feature-based measures by maintaining the essence of the original measures, and we use a maximum similarity graph-based approach for designing Neighborhood measures. We also use ordering weighting average operators to avoid biases in the proposed measures. We included the proposed measures in the EPIC software for computational availability, and we computed the measures for publicly available multiclass hybrid and incomplete datasets. In addition, the performance of the proposed measures was analyzed, and we can confirm that they solve some of the biases of previous ones and are capable of natively handling mixed, incomplete, and multiclass data without any preprocessing needed.

Suggested Citation

  • Francisco J. Camacho-Urriolagoitia & Yenny Villuendas-Rey & Cornelio Yáñez-Márquez & Miltiadis Lytras, 2023. "Novel Features and Neighborhood Complexity Measures for Multiclass Classification of Hybrid Data," Sustainability, MDPI, vol. 15(3), pages 1-18, January.
  • Handle: RePEc:gam:jsusta:v:15:y:2023:i:3:p:1995-:d:1042327
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2071-1050/15/3/1995/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2071-1050/15/3/1995/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Francisco J. Camacho-Urriolagoitia & Yenny Villuendas-Rey & Itzamá López-Yáñez & Oscar Camacho-Nieto & Cornelio Yáñez-Márquez, 2022. "Correlation Assessment of the Performance of Associative Classifiers on Credit Datasets Based on Data Complexity Measures," Mathematics, MDPI, vol. 10(9), pages 1-16, April.
    2. Chloe Satinet & François Fouss, 2022. "A Supervised Machine Learning Classification Framework for Clothing Products’ Sustainability," Sustainability, MDPI, vol. 14(3), pages 1-26, January.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Paula Ziyeh & Marco Cinelli, 2023. "A Framework to Navigate Eco-Labels in the Textile and Clothing Industry," Sustainability, MDPI, vol. 15(19), pages 1-29, September.
    2. Lorena Espina-Romero & José Gregorio Noroño Sánchez & Humberto Gutiérrez Hurtado & Helga Dworaczek Conde & Yessenia Solier Castro & Luz Emérita Cervera Cajo & Jose Rio Corredoira, 2023. "Which Industrial Sectors Are Affected by Artificial Intelligence? A Bibliometric Analysis of Trends and Perspectives," Sustainability, MDPI, vol. 15(16), pages 1-18, August.
    3. Jose Cruz & Christian Romero & Oscar Vera & Saul Huaquipaco & Norman Beltran & Wilson Mamani, 2023. "Multiparameter Regression of a Photovoltaic System by Applying Hybrid Methods with Variable Selection and Stacking Ensembles under Extreme Conditions of Altitudes Higher than 3800 Meters above Sea Lev," Energies, MDPI, vol. 16(12), pages 1-21, June.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jsusta:v:15:y:2023:i:3:p:1995-:d:1042327. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.