IDEAS home Printed from https://ideas.repec.org/a/bpj/sagmbi/v16y2017i2p83-93n1005.html
   My bibliography  Save this article

No counts, no variance: allowing for loss of degrees of freedom when assessing biological variability from RNA-seq data

Author

Listed:
  • Lun Aaron T. L.

    (The Walter and Eliza Hall Institute of Medical Research, Bioinformatics Division, 1G Royal Parade, Parkville, VIC 3052, Australia)

  • Smyth Gordon K.

    (The Walter and Eliza Hall Institute of Medical Research, Bioinformatics Division, 1G Royal Parade, Parkville, VIC 3052, Australia)

Abstract

RNA sequencing (RNA-seq) is widely used to study gene expression changes associated with treatments or biological conditions. Many popular methods for detecting differential expression (DE) from RNA-seq data use generalized linear models (GLMs) fitted to the read counts across independent replicate samples for each gene. This article shows that the standard formula for the residual degrees of freedom (d.f.) in a linear model is overstated when the model contains fitted values that are exactly zero. Such fitted values occur whenever all the counts in a treatment group are zero as well as in more complex models such as those involving paired comparisons. This misspecification results in underestimation of the genewise variances and loss of type I error control. This article proposes a formula for the reduced residual d.f. that restores error control in simulated RNA-seq data and improves detection of DE genes in a real data analysis. The new approach is implemented in the quasi-likelihood framework of the edgeR software package. The results of this article also apply to RNA-seq analyses that apply linear models to log-transformed counts, such as those in the limma software package, and more generally to any count-based GLM where exactly zero fitted values are possible.

Suggested Citation

  • Lun Aaron T. L. & Smyth Gordon K., 2017. "No counts, no variance: allowing for loss of degrees of freedom when assessing biological variability from RNA-seq data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 16(2), pages 83-93.
  • Handle: RePEc:bpj:sagmbi:v:16:y:2017:i:2:p:83-93:n:1005
    DOI: 10.1515/sagmb-2017-0010
    as

    Download full text from publisher

    File URL: https://doi.org/10.1515/sagmb-2017-0010
    Download Restriction: For access to full text, subscription to the journal or payment for the individual article is required.

    File URL: https://libkey.io/10.1515/sagmb-2017-0010?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bpj:sagmbi:v:16:y:2017:i:2:p:83-93:n:1005. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Peter Golla (email available below). General contact details of provider: https://www.degruyter.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.