OUCH: Oversampling and Undersampling Cannot Help Improve Accuracy in Our Bayesian Classifiers That Predict Preeclampsia

My bibliography Save this article

OUCH: Oversampling and Undersampling Cannot Help Improve Accuracy in Our Bayesian Classifiers That Predict Preeclampsia

Author

Listed:

Franklin Parrales-Bravo
(Grupo de Investigación en Inteligencia Artificial, Facultad de Ciencias Matemáticas y Físicas, Universidad de Guayaquil, Guayaquil 090514, Ecuador)
Rosangela Caicedo-Quiroz
(Centro de Estudios para el Cuidado Integral y la Promoción de la Salud, Universidad Bolivariana del Ecuador, km 5 ½ vía Durán—Yaguachi, Durán 092405, Ecuador)
Elena Tolozano-Benitez
(Centro de Estudios en Tecnologías Aplicadas, Universidad Bolivariana del Ecuador, km 5 ½ vía Durán—Yaguachi, Durán 092405, Ecuador)
Víctor Gómez-Rodríguez
(Instituto Superior Tecnológico Urdesa (ITSU), Av. Pdte. Carlos Julio Arosemena Tola km 2 ½, Guayaquil 090615, Ecuador)
Lorenzo Cevallos-Torres
(Grupo de Investigación en Inteligencia Artificial, Facultad de Ciencias Matemáticas y Físicas, Universidad de Guayaquil, Guayaquil 090514, Ecuador)
Jorge Charco-Aguirre
(Grupo de Investigación en Inteligencia Artificial, Facultad de Ciencias Matemáticas y Físicas, Universidad de Guayaquil, Guayaquil 090514, Ecuador)
Leonel Vasquez-Cevallos
(SIMUEES Simulation Clinic, Universidad Espíritu Santo, Samborondón 092301, Ecuador)

Registered:

Abstract

Unbalanced data can have an impact on the machine learning (ML) algorithms that build predictive models. This manuscript studies the influence of oversampling and undersampling strategies on the learning of the Bayesian classification models that predict the risk of suffering preeclampsia. Given the properties of our dataset, only the oversampling and undersampling methods that operate with numerical and categorical attributes will be taken into consideration. In particular, synthetic minority oversampling techniques for nominal and continuous data (SMOTE-NC), SMOTE—Encoded Nominal and Continuous (SMOTE-ENC), random oversampling examples (ROSE), random undersampling examples (UNDER), and random oversampling techniques (OVER) are considered. According to the results, when balancing the class in the training dataset, the accuracy percentages do not improve. However, in the test dataset, both positive and negative cases of preeclampsia were accurately classified by the models, which were built on a balanced training dataset. In contrast, models built on the imbalanced training dataset were not good at detecting positive cases of preeclampsia. We can conclude that while imbalanced training datasets can be addressed by using oversampling and undersampling techniques before building prediction models, an improvement in model accuracy is not always guaranteed. Despite this, the sensitivity and specificity percentages improve in binary classification problems in most cases, such as the one we are dealing with in this manuscript.

Suggested Citation

Franklin Parrales-Bravo & Rosangela Caicedo-Quiroz & Elena Tolozano-Benitez & Víctor Gómez-Rodríguez & Lorenzo Cevallos-Torres & Jorge Charco-Aguirre & Leonel Vasquez-Cevallos, 2024. "OUCH: Oversampling and Undersampling Cannot Help Improve Accuracy in Our Bayesian Classifiers That Predict Preeclampsia," Mathematics, MDPI, vol. 12(21), pages 1-14, October.

Handle: RePEc:gam:jmathe:v:12:y:2024:i:21:p:3351-:d:1506782

Download full text from publisher

More about this item

Keywords

preeclampsia; bayesian network classifiers; class imbalance; oversampling; undersampling; SMOTE-NC; ROSE; SMOTE-ENC;
All these keywords.

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:12:y:2024:i:21:p:3351-:d:1506782. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

We have no bibliographic references for this item. You can help adding them by using this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

OUCH: Oversampling and Undersampling Cannot Help Improve Accuracy in Our Bayesian Classifiers That Predict Preeclampsia

Author

Abstract

Suggested Citation

Download full text from publisher

More about this item

Keywords

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data