IDEAS home Printed from https://ideas.repec.org/a/wut/journl/v34y2024i4p157-183id10.html
   My bibliography  Save this article

Classification with machine learning algorithms after hybrid feature selection in imbalanced data sets

Author

Listed:
  • Meryem Pulat
  • İpek Deveci Kocakoç

Abstract

The efficacy of machine learning algorithms significantly depends on the adequacy and relevance of features in the data set. Hence, feature selection precedes the classification process. In this study, a hybrid feature selection approach, integrating filter and wrapper methods was employed. This approach not only enhances classification accuracy, surpassing the results achievable with filter methods alone, but also reduces processing time compared to exclusive reliance on wrapper methods. Results indicate a general improvement in algorithm performance with the application of the hybrid feature selection approach. The study utilized the Taiwanese Bankruptcy and Statlog (German Credit Data) datasets from the UCI Machine Learning Repository. These datasets exhibit an unbalanced distribution, necessitating data preprocessing that considers this unbalance. After acknowledging the datasets’ unbalanced nature, feature selection and subsequent classification processes were executed.

Suggested Citation

  • Meryem Pulat & İpek Deveci Kocakoç, 2024. "Classification with machine learning algorithms after hybrid feature selection in imbalanced data sets," Operations Research and Decisions, Wroclaw University of Science and Technology, Faculty of Management, vol. 34(4), pages 157-183.
  • Handle: RePEc:wut:journl:v:34:y:2024:i:4:p:157-183:id:10
    DOI: 10.37190/ord240410
    as

    Download full text from publisher

    File URL: https://ord.pwr.edu.pl/assets/papers_archive/ord2024vol34no4_10.pdf
    Download Restriction: no

    File URL: https://libkey.io/10.37190/ord240410?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Grubinger, Thomas & Zeileis, Achim & Pfeiffer, Karl-Peter, 2014. "evtree: Evolutionary Learning of Globally Optimal Classification and Regression Trees in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 61(i01).
    2. Jing Quan & Xuelian Sun, 2024. "Credit risk assessment using the factorization machine model with feature interactions," Palgrave Communications, Palgrave Macmillan, vol. 11(1), pages 1-10, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Emmanuel Jordy Menvouta & Jolien Ponnet & Robin Van Oirbeek & Tim Verdonck, 2022. "mCube: Multinomial Micro-level reserving Model," Papers 2212.00101, arXiv.org.
    2. Fernandez Martinez, Roberto & Lostado Lorza, Ruben & Santos Delgado, Ana Alexandra & Piedra, Nelson, 2021. "Use of classification trees and rule-based models to optimize the funding assignment to research projects: A case study of UTPL," Journal of Informetrics, Elsevier, vol. 15(1).
    3. Höppner, Sebastiaan & Stripling, Eugen & Baesens, Bart & Broucke, Seppe vanden & Verdonck, Tim, 2020. "Profit driven decision trees for churn prediction," European Journal of Operational Research, Elsevier, vol. 284(3), pages 920-933.
    4. Patrick Rehill, 2024. "Distilling interpretable causal trees from causal forests," Papers 2408.01023, arXiv.org.
    5. Hajko, Vladimír, 2017. "The failure of Energy-Economy Nexus: A meta-analysis of 104 studies," Energy, Elsevier, vol. 125(C), pages 771-787.
    6. Susan Athey & Stefan Wager, 2021. "Policy Learning With Observational Data," Econometrica, Econometric Society, vol. 89(1), pages 133-161, January.
    7. Yagli, Gokhan Mert & Yang, Dazhi & Srinivasan, Dipti, 2019. "Automatic hourly solar forecasting using machine learning models," Renewable and Sustainable Energy Reviews, Elsevier, vol. 105(C), pages 487-498.
    8. Davide Natalini & Giangiacomo Bravo & Aled Wynne Jones, 2019. "Global food security and food riots – an agent-based modelling approach," Food Security: The Science, Sociology and Economics of Food Production and Access to Food, Springer;The International Society for Plant Pathology, vol. 11(5), pages 1153-1173, October.
    9. Yves Staudt & Joël Wagner, 2021. "Assessing the Performance of Random Forests for Modeling Claim Severity in Collision Car Insurance," Risks, MDPI, vol. 9(3), pages 1-28, March.
    10. Federico Divina & Aude Gilson & Francisco Goméz-Vela & Miguel García Torres & José F. Torres, 2018. "Stacking Ensemble Learning for Short-Term Electricity Consumption Forecasting," Energies, MDPI, vol. 11(4), pages 1-31, April.
    11. Max Tabord-Meehan, 2023. "Stratification Trees for Adaptive Randomisation in Randomised Controlled Trials," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 90(5), pages 2646-2673.
    12. Anja Breuer & Yves Staudt, 2022. "Equalization Reserves for Reinsurance and Non-Life Undertakings in Switzerland," Risks, MDPI, vol. 10(3), pages 1-41, March.
    13. Patrick Rehill & Nicholas Biddle, 2022. "Policy learning for many outcomes of interest: Combining optimal policy trees with multi-objective Bayesian optimisation," Papers 2212.06312, arXiv.org, revised Oct 2023.
    14. Alvarez-Iglesias, Alberto & Hinde, John & Ferguson, John & Newell, John, 2017. "An alternative pruning based approach to unbiased recursive partitioning," Computational Statistics & Data Analysis, Elsevier, vol. 106(C), pages 90-102.
    15. Vrigazova Borislava, 2021. "The Proportion for Splitting Data into Training and Test Set for the Bootstrap in Classification Problems," Business Systems Research, Sciendo, vol. 12(1), pages 228-242, May.
    16. Chi-Chang Chang & Tse-Hung Huang & Pei-Wei Shueng & Ssu-Han Chen & Chun-Chia Chen & Chi-Jie Lu & Yi-Ju Tseng, 2021. "Developing a Stacked Ensemble-Based Classification Scheme to Predict Second Primary Cancers in Head and Neck Cancer Survivors," IJERPH, MDPI, vol. 18(23), pages 1-10, November.
    17. Emilio Carrizosa & Cristina Molero-Río & Dolores Romero Morales, 2021. "Mathematical optimization in classification and regression trees," TOP: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 29(1), pages 5-33, April.
    18. Islam, Towhidul & Meade, Nigel & Carson, Richard T. & Louviere, Jordan J. & Wang, Juan, 2022. "The usefulness of socio-demographic variables in predicting purchase decisions: Evidence from machine learning procedures," Journal of Business Research, Elsevier, vol. 151(C), pages 324-338.
    19. Ronilo Ragodos & Tong Wang, 2022. "Disjunctive Rule Lists," INFORMS Journal on Computing, INFORMS, vol. 34(6), pages 3259-3276, November.
    20. Roberto Chiosa & Marco Savino Piscitelli & Alfonso Capozzoli, 2021. "A Data Analytics-Based Energy Information System (EIS) Tool to Perform Meter-Level Anomaly Detection and Diagnosis in Buildings," Energies, MDPI, vol. 14(1), pages 1-28, January.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:wut:journl:v:34:y:2024:i:4:p:157-183:id:10. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Adam Kasperski (email available below). General contact details of provider: https://edirc.repec.org/data/iopwrpl.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.