IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0212127.html
   My bibliography  Save this article

Leukemia multiclass assessment and classification from Microarray and RNA-seq technologies integration at gene expression level

Author

Listed:
  • Daniel Castillo
  • Juan Manuel Galvez
  • Luis J Herrera
  • Fernando Rojas
  • Olga Valenzuela
  • Octavio Caba
  • Jose Prados
  • Ignacio Rojas

Abstract

In more recent years, a significant increase in the number of available biological experiments has taken place due to the widespread use of massive sequencing data. Furthermore, the continuous developments in the machine learning and in the high performance computing areas, are allowing a faster and more efficient analysis and processing of this type of data. However, biological information about a certain disease is normally widespread due to the use of different sequencing technologies and different manufacturers, in different experiments along the years around the world. Thus, nowadays it is of paramount importance to attain a correct integration of biologically-related data in order to achieve genuine benefits from them. For this purpose, this work presents an integration of multiple Microarray and RNA-seq platforms, which has led to the design of a multiclass study by collecting samples from the main four types of leukemia, quantified at gene expression. Subsequently, in order to find a set of differentially expressed genes with the highest discernment capability among different types of leukemia, an innovative parameter referred to as coverage is presented here. This parameter allows assessing the number of different pathologies that a certain gen is able to discern. It has been evaluated together with other widely known parameters under assessment of an ANOVA statistical test which corroborated its filtering power when the identified genes are subjected to a machine learning process at multiclass level. The optimal tuning of gene extraction evaluated parameters by means of this statistical test led to the selection of 42 highly relevant expressed genes. By the use of minimum-Redundancy Maximum-Relevance (mRMR) feature selection algorithm, these genes were reordered and assessed under the operation of four different classification techniques. Outstanding results were achieved by taking exclusively the first ten genes of the ranking into consideration. Finally, specific literature was consulted on this last subset of genes, revealing the occurrence of practically all of them with biological processes related to leukemia. At sight of these results, this study underlines the relevance of considering a new parameter which facilitates the identification of highly valid expressed genes for simultaneously discerning multiple types of leukemia.

Suggested Citation

  • Daniel Castillo & Juan Manuel Galvez & Luis J Herrera & Fernando Rojas & Olga Valenzuela & Octavio Caba & Jose Prados & Ignacio Rojas, 2019. "Leukemia multiclass assessment and classification from Microarray and RNA-seq technologies integration at gene expression level," PLOS ONE, Public Library of Science, vol. 14(2), pages 1-25, February.
  • Handle: RePEc:plo:pone00:0212127
    DOI: 10.1371/journal.pone.0212127
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0212127
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0212127&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0212127?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0212127. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.