IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0006416.html
   My bibliography  Save this article

SlimPLS: A Method for Feature Selection in Gene Expression-Based Disease Classification

Author

Listed:
  • Michael Gutkin
  • Ron Shamir
  • Gideon Dror

Abstract

A major challenge in biomedical studies in recent years has been the classification of gene expression profiles into categories, such as cases and controls. This is done by first training a classifier by using a labeled training set containing labeled samples from the two populations, and then using that classifier to predict the labels of new samples. Such predictions have recently been shown to improve the diagnosis and treatment selection practices for several diseases. This procedure is complicated, however, by the high dimensionality if the data. While microarrays can measure the levels of thousands of genes per sample, case-control microarray studies usually involve no more than several dozen samples. Standard classifiers do not work well in these situations where the number of features (gene expression levels measured in these microarrays) far exceeds the number of samples. Selecting only the features that are most relevant for discriminating between the two categories can help construct better classifiers, in terms of both accuracy and efficiency. In this work we developed a novel method for multivariate feature selection based on the Partial Least Squares algorithm. We compared the method's variants with common feature selection techniques across a large number of real case-control datasets, using several classifiers. We demonstrate the advantages of the method and the preferable combinations of classifier and feature selection technique.

Suggested Citation

  • Michael Gutkin & Ron Shamir & Gideon Dror, 2009. "SlimPLS: A Method for Feature Selection in Gene Expression-Based Disease Classification," PLOS ONE, Public Library of Science, vol. 4(7), pages 1-12, July.
  • Handle: RePEc:plo:pone00:0006416
    DOI: 10.1371/journal.pone.0006416
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0006416
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0006416&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0006416?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Lê Cao Kim-Anh & Rossouw Debra & Robert-Granié Christèle & Besse Philippe, 2008. "A Sparse PLS for Variable Selection when Integrating Omics Data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 7(1), pages 1-32, November.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Gift Dumedah, 2019. "Hydro Genome Mapping: An Approach for the Diagnosis, Evaluation and Improving Prediction Capability of Hydro-Meteorological Models," Water Resources Management: An International Journal, Published for the European Water Resources Association (EWRA), Springer;European Water Resources Association (EWRA), vol. 33(11), pages 3851-3872, September.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Daniele, Bertolozzi-Caredio & Barbara, Soriano & Isabel, Bardaji & Alberto, Garrido, 2022. "Analysis of perceived robustness, adaptability and transformability of Spanish extensive livestock farms under alternative challenging scenarios," Agricultural Systems, Elsevier, vol. 202(C).
    2. Marttinen Pekka & Gillberg Jussi & Havulinna Aki & Corander Jukka & Kaski Samuel, 2013. "Genome-wide association studies with high-dimensional phenotypes," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 12(4), pages 413-431, August.
    3. Minji Lee & Zhihua Su, 2020. "A Review of Envelope Models," International Statistical Review, International Statistical Institute, vol. 88(3), pages 658-676, December.
    4. Dmitry Kobak & Yves Bernaerts & Marissa A. Weis & Federico Scala & Andreas S. Tolias & Philipp Berens, 2021. "Sparse reduced‐rank regression for exploratory visualisation of paired multivariate data," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 70(4), pages 980-1000, August.
    5. Cemal Erdem & Sean M. Gross & Laura M. Heiser & Marc R. Birtwistle, 2023. "MOBILE pipeline enables identification of context-specific networks and regulatory mechanisms," Nature Communications, Nature, vol. 14(1), pages 1-16, December.
    6. Feuerriegel, Stefan & Gordon, Julius, 2019. "News-based forecasts of macroeconomic indicators: A semantic path model for interpretable predictions," European Journal of Operational Research, Elsevier, vol. 272(1), pages 162-175.
    7. Hernandez Roig, Harold Antonio & Aguilera Morillo, María del Carmen & Aguilera, Ana M. & Preda, Cristian, 2023. "Penalized function-on-function partial leastsquares regression," DES - Working Papers. Statistics and Econometrics. WS 37758, Universidad Carlos III de Madrid. Departamento de Estadística.
    8. Zhang Fan & Miecznikowski Jeffrey C. & Tritchler David L., 2020. "Identification of supervised and sparse functional genomic pathways," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 19(1), pages 1-27, February.
    9. Xavier Bry & Ndèye Niang & Thomas Verron & Stéphanie Bougeard, 2023. "Clusterwise elastic-net regression based on a combined information criterion," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 17(1), pages 75-107, March.
    10. Marc Schoeler & Sandrine Ellero-Simatos & Till Birkner & Jordi Mayneris-Perxachs & Lisa Olsson & Harald Brolin & Ulrike Loeber & Jamie D. Kraft & Arnaud Polizzi & Marian Martí-Navas & Josep Puig & Ant, 2023. "The interplay between dietary fatty acids and gut microbiota influences host metabolism and hepatic steatosis," Nature Communications, Nature, vol. 14(1), pages 1-16, December.
    11. Jain Yashita & Ding Shanshan & Qiu Jing, 2019. "Sliced inverse regression for integrative multi-omics data analysis," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 18(1), pages 1-13, February.
    12. Chung Dongjun & Keles Sunduz, 2010. "Sparse Partial Least Squares Classification for High Dimensional Data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 9(1), pages 1-32, March.
    13. Perrin, Augustine & Cristobal, Magali San & Milestad, Rebecka & Martin, Guillaume, 2020. "Identification of resilience factors of organic dairy cattle farms," Agricultural Systems, Elsevier, vol. 183(C).

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0006416. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.