IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v9y2021i5p570-d512368.html
   My bibliography  Save this article

Feature Selection for Colon Cancer Detection Using K-Means Clustering and Modified Harmony Search Algorithm

Author

Listed:
  • Jin Hee Bae

    (College of IT Convergence, Gachon University, Seongnam 13120, Korea)

  • Minwoo Kim

    (College of IT Convergence, Gachon University, Seongnam 13120, Korea)

  • J.S. Lim

    (College of IT Convergence, Gachon University, Seongnam 13120, Korea)

  • Zong Woo Geem

    (College of IT Convergence, Gachon University, Seongnam 13120, Korea)

Abstract

This paper proposes a feature selection method that is effective in distinguishing colorectal cancer patients from normal individuals using K-means clustering and the modified harmony search algorithm. As the genetic cause of colorectal cancer originates from mutations in genes, it is important to classify the presence or absence of colorectal cancer through gene information. The proposed methodology consists of four steps. First, the original data are Z-normalized by data preprocessing. Candidate genes are then selected using the Fisher score. Next, one representative gene is selected from each cluster after candidate genes are clustered using K-means clustering. Finally, feature selection is carried out using the modified harmony search algorithm. The gene combination created by feature selection is then applied to the classification model and verified using 5-fold cross-validation. The proposed model obtained a classification accuracy of up to 94.36%. Furthermore, on comparing the proposed method with other methods, we prove that the proposed method performs well in classifying colorectal cancer. Moreover, we believe that the proposed model can be applied not only to colorectal cancer but also to other gene-related diseases.

Suggested Citation

  • Jin Hee Bae & Minwoo Kim & J.S. Lim & Zong Woo Geem, 2021. "Feature Selection for Colon Cancer Detection Using K-Means Clustering and Modified Harmony Search Algorithm," Mathematics, MDPI, vol. 9(5), pages 1-14, March.
  • Handle: RePEc:gam:jmathe:v:9:y:2021:i:5:p:570-:d:512368
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/9/5/570/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/9/5/570/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Bry, X. & Trottier, C. & Verron, T. & Mortier, F., 2013. "Supervised component generalized linear regression using a PLS-extension of the Fisher scoring algorithm," Journal of Multivariate Analysis, Elsevier, vol. 119(C), pages 47-60.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Meng, Ming & Mander, Sarah & Zhao, Xiaoli & Niu, Dongxiao, 2016. "Have market-oriented reforms improved the electricity generation efficiency of China's thermal power industry? An empirical analysis," Energy, Elsevier, vol. 114(C), pages 734-741.
    2. Stéphane Mussard & Fattouma Souissi-Benrejab, 2019. "Gini-PLS Regressions," Journal of Quantitative Economics, Springer;The Indian Econometric Society (TIES), vol. 17(3), pages 477-512, September.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:9:y:2021:i:5:p:570-:d:512368. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.