IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1008295.html
   My bibliography  Save this article

Increased comparability between RNA-Seq and microarray data by utilization of gene sets

Author

Listed:
  • Frans M van der Kloet
  • Jeroen Buurmans
  • Martijs J Jonker
  • Age K Smilde
  • Johan A Westerhuis

Abstract

The field of transcriptomics uses and measures mRNA as a proxy of gene expression. There are currently two major platforms in use for quantifying mRNA, microarray and RNA-Seq. Many comparative studies have shown that their results are not always consistent. In this study we aim to find a robust method to increase comparability of both platforms enabling data analysis of merged data from both platforms. We transformed high dimensional transcriptomics data from two different platforms into a lower dimensional, and biologically relevant dataset by calculating enrichment scores based on gene set collections for all samples. We compared the similarity between data from both platforms based on the raw data and on the enrichment scores. We show that the performed data transforms the data in a biologically relevant way and filters out noise which leads to increased platform concordance. We validate the procedure using predictive models built with microarray based enrichment scores to predict subtypes of breast cancer using enrichment scores based on sequenced data. Although microarray and RNA-Seq expression levels might appear different, transforming them into biologically relevant gene set enrichment scores significantly increases their correlation, which is a step forward in data integration of the two platforms. The gene set collections were shown to contain biologically relevant gene sets. More in-depth investigation on the effect of the composition, size, and number of gene sets that are used for the transformation is suggested for future research.Author summary: The field of transcriptomics uses and measures mRNA as a proxy of gene expression. There are currently two major platforms in use for quantifying mRNA, microarray and RNA-Seq. Many comparative studies have shown that their results are not always consistent. In this study we aim to find a robust method to increase comparability of both platforms enabling data analysis of merged data from both platforms. We transformed the high dimensional transcriptomics data from the two different platforms into lower dimensional, and biologically relevant gene set scores. These gene sets were defined a-priori as specific combination of genes (e.g. up-regulated in a certain pathway). We observed that although microarray and RNA-Seq expression levels might appear different, using these gene sets to transform the data significantly increases their correlation. This is a step forward in data integration of the two platforms. More in-depth investigation on the effect of the composition, size, and number of gene sets that are used for the transformation is suggested for future research.

Suggested Citation

  • Frans M van der Kloet & Jeroen Buurmans & Martijs J Jonker & Age K Smilde & Johan A Westerhuis, 2020. "Increased comparability between RNA-Seq and microarray data by utilization of gene sets," PLOS Computational Biology, Public Library of Science, vol. 16(9), pages 1-21, September.
  • Handle: RePEc:plo:pcbi00:1008295
    DOI: 10.1371/journal.pcbi.1008295
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008295
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1008295&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1008295?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1008295. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.