IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0119254.html
   My bibliography  Save this article

Goodness-of-Fit Tests and Model Diagnostics for Negative Binomial Regression of RNA Sequencing Data

Author

Listed:
  • Gu Mi
  • Yanming Di
  • Daniel W Schafer

Abstract

This work is about assessing model adequacy for negative binomial (NB) regression, particularly (1) assessing the adequacy of the NB assumption, and (2) assessing the appropriateness of models for NB dispersion parameters. Tools for the first are appropriate for NB regression generally; those for the second are primarily intended for RNA sequencing (RNA-Seq) data analysis. The typically small number of biological samples and large number of genes in RNA-Seq analysis motivate us to address the trade-offs between robustness and statistical power using NB regression models. One widely-used power-saving strategy, for example, is to assume some commonalities of NB dispersion parameters across genes via simple models relating them to mean expression rates, and many such models have been proposed. As RNA-Seq analysis is becoming ever more popular, it is appropriate to make more thorough investigations into power and robustness of the resulting methods, and into practical tools for model assessment. In this article, we propose simulation-based statistical tests and diagnostic graphics to address model adequacy. We provide simulated and real data examples to illustrate that our proposed methods are effective for detecting the misspecification of the NB mean-variance relationship as well as judging the adequacy of fit of several NB dispersion models.

Suggested Citation

  • Gu Mi & Yanming Di & Daniel W Schafer, 2015. "Goodness-of-Fit Tests and Model Diagnostics for Negative Binomial Regression of RNA Sequencing Data," PLOS ONE, Public Library of Science, vol. 10(3), pages 1-16, March.
  • Handle: RePEc:plo:pone00:0119254
    DOI: 10.1371/journal.pone.0119254
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0119254
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0119254&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0119254?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Di Yanming & Schafer Daniel W & Cumbie Jason S & Chang Jeff H, 2011. "The NBP Negative Binomial Model for Assessing Differential Gene Expression from RNA-Seq," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 10(1), pages 1-28, May.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Xie, Fang & Xiao, Zhijie, 2020. "Consistency of ℓ1 penalized negative binomial regressions," Statistics & Probability Letters, Elsevier, vol. 165(C).
    2. Xiaohong Li & Guy N Brock & Eric C Rouchka & Nigel G F Cooper & Dongfeng Wu & Timothy E O’Toole & Ryan S Gill & Abdallah M Eteleeb & Liz O’Brien & Shesh N Rai, 2017. "A comparison of per sample global scaling and per gene normalization methods for differential expression analysis of RNA-seq data," PLOS ONE, Public Library of Science, vol. 12(5), pages 1-22, May.
    3. Raydonal Ospina & Patrícia L. Espinheira & Leilo A. Arias & Cleber M. Xavier & Víctor Leiva & Cecilia Castro, 2024. "New Statistical Residuals for Regression Models in the Exponential Family: Characterization, Simulation, Computation, and Applications," Mathematics, MDPI, vol. 12(20), pages 1-44, October.
    4. Li Xiaohong & Wu Dongfeng & Cooper Nigel G.F. & Rai Shesh N., 2019. "Sample size calculations for the differential expression analysis of RNA-seq data using a negative binomial regression model," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 18(1), pages 1-17, February.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Chen, Xiongzhi, 2019. "Uniformly consistently estimating the proportion of false null hypotheses via Lebesgue–Stieltjes integral equations," Journal of Multivariate Analysis, Elsevier, vol. 173(C), pages 724-744.
    2. Lund Steven P. & Nettleton Dan & McCarthy Davis J. & Smyth Gordon K., 2012. "Detecting Differential Expression in RNA-sequence Data Using Quasi-likelihood with Shrunken Dispersion Estimates," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 11(5), pages 1-44, October.
    3. Gu Mi & Yanming Di, 2015. "The Level of Residual Dispersion Variation and the Power of Differential Expression Tests for RNA-Seq Data," PLOS ONE, Public Library of Science, vol. 10(4), pages 1-25, April.
    4. Gu Mi & Yanming Di & Sarah Emerson & Jason S Cumbie & Jeff H Chang, 2012. "Length Bias Correction in Gene Ontology Enrichment Analysis Using Logistic Regression," PLOS ONE, Public Library of Science, vol. 7(10), pages 1-10, October.
    5. Jungsoo Gim & Sungho Won & Taesung Park, 2016. "LPEseq: Local-Pooled-Error Test for RNA Sequencing Experiments with a Small Number of Replicates," PLOS ONE, Public Library of Science, vol. 11(8), pages 1-15, August.
    6. Kotoka Ekua & Orr Megan, 2017. "Modifying SAMseq to account for asymmetry in the distribution of effect sizes when identifying differentially expressed genes," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 16(5-6), pages 333-347, December.
    7. Di Yanming & Emerson Sarah C. & Schafer Daniel W. & Kimbrel Jeffrey A. & Chang Jeff H., 2013. "Higher order asymptotics for negative binomial regression inferences from RNA-sequencing data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 12(1), pages 49-70, March.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0119254. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.