Author
Listed:
- Franklin Parrales-Bravo
(Grupo de Investigación en Inteligencia Artificial, Facultad de Ciencias Matemáticas y Físicas, Universidad de Guayaquil, Guayaquil 090514, Ecuador)
- Rosangela Caicedo-Quiroz
(Centro de Estudios para el Cuidado Integral y la Promoción de la Salud, Universidad Bolivariana del Ecuador, km 5 ½ vía Durán—Yaguachi, Durán 092405, Ecuador)
- Elena Tolozano-Benitez
(Centro de Estudios en Tecnologías Aplicadas, Universidad Bolivariana del Ecuador, km 5 ½ vía Durán—Yaguachi, Durán 092405, Ecuador)
- Víctor Gómez-Rodríguez
(Instituto Superior Tecnológico Urdesa (ITSU), Av. Pdte. Carlos Julio Arosemena Tola km 2 ½, Guayaquil 090615, Ecuador)
- Lorenzo Cevallos-Torres
(Grupo de Investigación en Inteligencia Artificial, Facultad de Ciencias Matemáticas y Físicas, Universidad de Guayaquil, Guayaquil 090514, Ecuador)
- Jorge Charco-Aguirre
(Grupo de Investigación en Inteligencia Artificial, Facultad de Ciencias Matemáticas y Físicas, Universidad de Guayaquil, Guayaquil 090514, Ecuador)
- Leonel Vasquez-Cevallos
(SIMUEES Simulation Clinic, Universidad Espíritu Santo, Samborondón 092301, Ecuador)
Abstract
Unbalanced data can have an impact on the machine learning (ML) algorithms that build predictive models. This manuscript studies the influence of oversampling and undersampling strategies on the learning of the Bayesian classification models that predict the risk of suffering preeclampsia. Given the properties of our dataset, only the oversampling and undersampling methods that operate with numerical and categorical attributes will be taken into consideration. In particular, synthetic minority oversampling techniques for nominal and continuous data (SMOTE-NC), SMOTE—Encoded Nominal and Continuous (SMOTE-ENC), random oversampling examples (ROSE), random undersampling examples (UNDER), and random oversampling techniques (OVER) are considered. According to the results, when balancing the class in the training dataset, the accuracy percentages do not improve. However, in the test dataset, both positive and negative cases of preeclampsia were accurately classified by the models, which were built on a balanced training dataset. In contrast, models built on the imbalanced training dataset were not good at detecting positive cases of preeclampsia. We can conclude that while imbalanced training datasets can be addressed by using oversampling and undersampling techniques before building prediction models, an improvement in model accuracy is not always guaranteed. Despite this, the sensitivity and specificity percentages improve in binary classification problems in most cases, such as the one we are dealing with in this manuscript.
Suggested Citation
Franklin Parrales-Bravo & Rosangela Caicedo-Quiroz & Elena Tolozano-Benitez & Víctor Gómez-Rodríguez & Lorenzo Cevallos-Torres & Jorge Charco-Aguirre & Leonel Vasquez-Cevallos, 2024.
"OUCH: Oversampling and Undersampling Cannot Help Improve Accuracy in Our Bayesian Classifiers That Predict Preeclampsia,"
Mathematics, MDPI, vol. 12(21), pages 1-14, October.
Handle:
RePEc:gam:jmathe:v:12:y:2024:i:21:p:3351-:d:1506782
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:12:y:2024:i:21:p:3351-:d:1506782. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.