IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v12y2024i4p538-d1336221.html
   My bibliography  Save this article

Decision-Making on the Diagnosis of Oncological Diseases Using Cost-Sensitive SVM Classifiers Based on Datasets with a Variety of Features of Different Natures

Author

Listed:
  • Liliya A. Demidova

    (Institute of Information Technologies, Federal State Budget Educational Institution of Higher Education, MIREA—Russian Technological University, 78, Vernadsky Avenue, 119454 Moscow, Russia)

Abstract

This paper discusses the problem of detecting cancer using such biomarkers as blood protein markers. The purpose of this research is to propose an approach for making decisions in the diagnosis of cancer through the creation of cost-sensitive SVM classifiers on the basis of datasets with a variety of features of different nature. Such datasets may include compositions of known features corresponding to blood protein markers and new features constructed using methods for calculating entropy and fractal dimensions, as well as using the UMAP algorithm. Based on these datasets, multiclass SVM classifiers were developed. They use cost-sensitive learning principles to overcome the class imbalance problem, which is typical for medical datasets. When implementing the UMAP algorithm, various variants of the loss function were considered. This was performed in order to select those that provide the formation of such new features that ultimately allow us to develop the best cost-sensitive SVM classifiers in terms of maximizing the mean value of the metric M a c r o F 1 − s c o r e . The experimental results proved the possibility of applying the UMAP algorithm, approximate entropy and, in addition, Higuchi and Katz fractal dimensions to construct new features using blood protein markers. It turned out that when working with the UMAP algorithm, the most promising is the application of a loss function on the basis of fuzzy cross-entropy, and the least promising is the application of a loss function on the basis of intuitionistic fuzzy cross-entropy. Augmentation of the original dataset with either features on the basis of the UMAP algorithm, features on the basis of the UMAP algorithm and approximate entropy, or features on the basis of approximate entropy provided the creation of the three best cost-sensitive SVM classifiers with mean values of the metric M a c r o F 1 − s c o r e increased by 5.359%, 5.245% and 4.675%, respectively, compared to the mean values of this metric in the case when only the original dataset was utilized for creating the base SVM classifier (without performing any manipulations to overcome the class imbalance problem, and also without introducing new features).

Suggested Citation

  • Liliya A. Demidova, 2024. "Decision-Making on the Diagnosis of Oncological Diseases Using Cost-Sensitive SVM Classifiers Based on Datasets with a Variety of Features of Different Natures," Mathematics, MDPI, vol. 12(4), pages 1-40, February.
  • Handle: RePEc:gam:jmathe:v:12:y:2024:i:4:p:538-:d:1336221
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/12/4/538/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/12/4/538/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Artyom V. Gorchakov & Liliya A. Demidova & Peter N. Sovietov, 2023. "Analysis of Program Representations Based on Abstract Syntax Trees and Higher-Order Markov Chains for Source Code Classification Task," Future Internet, MDPI, vol. 15(9), pages 1-28, September.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.

      Corrections

      All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:12:y:2024:i:4:p:538-:d:1336221. See general information about how to correct material in RePEc.

      If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

      If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

      If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

      For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

      Please note that corrections may take a couple of weeks to filter through the various RePEc services.

      IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.