IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v11y2023i22p4602-d1277673.html
   My bibliography  Save this article

Ensemble Learning for Multi-Label Classification with Unbalanced Classes: A Case Study of a Curing Oven in Glass Wool Production

Author

Listed:
  • Minh Hung Ho

    (Université de Technologie de Compiègne (UTC), CS 60319, CEDEX, 60203 Compiègne, France)

  • Amélie Ponchet Durupt

    (Université de Technologie de Compiègne (UTC), CS 60319, CEDEX, 60203 Compiègne, France)

  • Hai Canh Vu

    (Université de Technologie de Compiègne (UTC), CS 60319, CEDEX, 60203 Compiègne, France)

  • Nassim Boudaoud

    (Université de Technologie de Compiègne (UTC), CS 60319, CEDEX, 60203 Compiègne, France)

  • Arnaud Caracciolo

    (Centre Technique des Industries Mécaniques (CETIM), 52 Avenue Félix Louat, CEDEX, 60304 Senlis, France)

  • Sophie Sieg-Zieba

    (Centre Technique des Industries Mécaniques (CETIM), 52 Avenue Félix Louat, CEDEX, 60304 Senlis, France)

  • Yun Xu

    (ALFI ADLER, 6 Route de la Borde, 60360 Crèvecœur-Le-Grand, France)

  • Patrick Leduc

    (ALFI ADLER, 6 Route de la Borde, 60360 Crèvecœur-Le-Grand, France)

Abstract

The Industrial Internet of Things (IIoT), which integrates sensors into the manufacturing system, provides new paradigms and technologies to industry. The massive acquisition of data, in an industrial context, brings with it a number of challenges to guarantee its quality and reliability, and to ensure that the results of data analysis and modelling are accurate, reliable, and reflect the real phenomena being studied. Common problems encountered with real industrial databases are missing data, outliers, anomalies, unbalanced classes, and non-exhaustive historical data. Unlike papers present in the literature that respond to those problems in a dissociated way, the work performed in this article aims to address all these problems at once. A comprehensive framework for data flow encompassing data acquisition, preprocessing, and machine class classification is proposed. The challenges of missing data, outliers, and anomalies are addressed with critical and novel class outliers distinguished. The study also tackles unbalanced class classification and evaluates the impact of missing data on classification accuracy. Several machine learning models for the operating state classification are implemented. The study also compares the performance of the proposed framework with two existing methods: the Histogram Gradient Boosting Classifier and the Extreme Gradient Boosting classifier. It is shown that using “hard voting” ensemble learning methods to combine several classifiers makes the final classifier more robust to missing data. An application is carried out on data from a real industrial dataset. This research contributes to narrowing the theory–practice gap in leveraging IIoT technologies, offering practical insights into data analytics implementation in real industrial scenarios.

Suggested Citation

  • Minh Hung Ho & Amélie Ponchet Durupt & Hai Canh Vu & Nassim Boudaoud & Arnaud Caracciolo & Sophie Sieg-Zieba & Yun Xu & Patrick Leduc, 2023. "Ensemble Learning for Multi-Label Classification with Unbalanced Classes: A Case Study of a Curing Oven in Glass Wool Production," Mathematics, MDPI, vol. 11(22), pages 1-24, November.
  • Handle: RePEc:gam:jmathe:v:11:y:2023:i:22:p:4602-:d:1277673
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/11/22/4602/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/11/22/4602/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Di Zio, Marco & Guarnera, Ugo & Luzi, Orietta, 2007. "Imputation through finite Gaussian mixture models," Computational Statistics & Data Analysis, Elsevier, vol. 51(11), pages 5305-5316, July.
    2. Titterington, D. M. & Sedransk, J., 1989. "Imputation of missing values using density estimation," Statistics & Probability Letters, Elsevier, vol. 8(5), pages 411-418, October.
    3. Ting Lin, 2010. "A comparison of multiple imputation with EM algorithm and MCMC method for quality of life missing data," Quality & Quantity: International Journal of Methodology, Springer, vol. 44(2), pages 277-287, February.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jared S. Murray & Jerome P. Reiter, 2016. "Multiple Imputation of Missing Categorical and Continuous Values via Bayesian Mixture Models With Local Dependence," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(516), pages 1466-1479, October.
    2. Jeongsub Choi & Youngdoo Son & Myong K. Jeong, 2024. "Gaussian kernel with correlated variables for incomplete data," Annals of Operations Research, Springer, vol. 341(1), pages 223-244, October.
    3. Khaled Khatab & Maruf A Raheem & Benn Sartorius & Mubarak Ismail, 2019. "Prevalence and risk factors for child labour and violence against children in Egypt using Bayesian geospatial modelling with multiple imputation," PLOS ONE, Public Library of Science, vol. 14(5), pages 1-20, May.
    4. Ahmad R. Alsaber & Jiazhu Pan & Adeeba Al-Hurban, 2021. "Handling Complex Missing Data Using Random Forest Approach for an Air Quality Monitoring Dataset: A Case Study of Kuwait Environmental Data (2012 to 2018)," IJERPH, MDPI, vol. 18(3), pages 1-25, February.
    5. Marco Di Zio & Ugo Guarnera, 2008. "A multiple imputation method for non-Gaussian data," Metron - International Journal of Statistics, Dipartimento di Statistica, Probabilità e Statistiche Applicate - University of Rome, vol. 0(1), pages 75-90.
    6. Hanson, Rochelle F. & Saunders, Benjamin E. & Peer, Samuel O. & Ralston, Elizabeth & Moreland, Angela D. & Schoenwald, Sonja & Chapman, Jason, 2018. "Community-based learning collaboratives and participant reports of interprofessional collaboration, barriers to, and utilization of child trauma services," Children and Youth Services Review, Elsevier, vol. 94(C), pages 306-314.
    7. Hamid Heidarian Miri & Jafar Hassanzadeh & Abdolreza Rajaeefard & Majid Mirmohammadkhani & Kambiz Ahmadi Angali, 2016. "Multiple Imputation to Correct for Nonresponse Bias: Application in Non-communicable Disease Risk Factors Survey," Global Journal of Health Science, Canadian Center of Science and Education, vol. 8(1), pages 133-133, January.
    8. Wang, Wan-Lun, 2013. "Mixtures of common factor analyzers for high-dimensional data with missing information," Journal of Multivariate Analysis, Elsevier, vol. 117(C), pages 120-133.
    9. Pendharkar, Parag C., 2008. "Maximum entropy and least square error minimizing procedures for estimating missing conditional probabilities in Bayesian networks," Computational Statistics & Data Analysis, Elsevier, vol. 52(7), pages 3583-3602, March.
    10. Tzy-Chy Lin & Tsung-I Lin, 2010. "Supervised learning of multivariate skew normal mixture models with missing information," Computational Statistics, Springer, vol. 25(2), pages 183-201, June.
    11. Kamrul Islam Shahin & Christophe Simon & Philippe Weber & Aslak Johansen & Mikkel Baun Kjærgaard, 2023. "Prognostic considering missing data: An input output hidden Markov model based solution," Journal of Risk and Reliability, , vol. 237(5), pages 980-993, October.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:11:y:2023:i:22:p:4602-:d:1277673. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.