IDEAS home Printed from https://ideas.repec.org/a/bpj/sagmbi/v10y2011i1n24.html
   My bibliography  Save this article

The NBP Negative Binomial Model for Assessing Differential Gene Expression from RNA-Seq

Author

Listed:
  • Di Yanming
  • Schafer Daniel W
  • Cumbie Jason S
  • Chang Jeff H

Abstract

We propose a new statistical test for assessing differential gene expression using RNA sequencing (RNA-Seq) data. Commonly used probability distributions, such as binomial or Poisson, cannot appropriately model the count variability in RNA-Seq data due to overdispersion. The small sample size that is typical in this type of data also prevents the uncritical use of tools derived from large-sample asymptotic theory. The test we propose is based on the NBP parameterization of the negative binomial distribution. It extends an exact test proposed by Robinson and Smyth (2007, 2008). In one version of Robinson and Smyth’s test, a constant dispersion parameter is used to model the count variability between biological replicates. We introduce an additional parameter to allow the dispersion parameter to depend on the mean. Our parametric method complements nonparametric regression approaches for modeling the dispersion parameter. We apply the test we propose to an Arabidopsis data set and a range of simulated data sets. The results show that the test is simple, powerful and reasonably robust against departures from model assumptions.

Suggested Citation

  • Di Yanming & Schafer Daniel W & Cumbie Jason S & Chang Jeff H, 2011. "The NBP Negative Binomial Model for Assessing Differential Gene Expression from RNA-Seq," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 10(1), pages 1-28, May.
  • Handle: RePEc:bpj:sagmbi:v:10:y:2011:i:1:n:24
    DOI: 10.2202/1544-6115.1637
    as

    Download full text from publisher

    File URL: https://doi.org/10.2202/1544-6115.1637
    Download Restriction: For access to full text, subscription to the journal or payment for the individual article is required.

    File URL: https://libkey.io/10.2202/1544-6115.1637?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Smyth Gordon K, 2004. "Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 3(1), pages 1-28, February.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Di Yanming & Emerson Sarah C. & Schafer Daniel W. & Kimbrel Jeffrey A. & Chang Jeff H., 2013. "Higher order asymptotics for negative binomial regression inferences from RNA-sequencing data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 12(1), pages 49-70, March.
    2. Gu Mi & Yanming Di, 2015. "The Level of Residual Dispersion Variation and the Power of Differential Expression Tests for RNA-Seq Data," PLOS ONE, Public Library of Science, vol. 10(4), pages 1-25, April.
    3. Gu Mi & Yanming Di & Sarah Emerson & Jason S Cumbie & Jeff H Chang, 2012. "Length Bias Correction in Gene Ontology Enrichment Analysis Using Logistic Regression," PLOS ONE, Public Library of Science, vol. 7(10), pages 1-10, October.
    4. Chen, Xiongzhi, 2019. "Uniformly consistently estimating the proportion of false null hypotheses via Lebesgue–Stieltjes integral equations," Journal of Multivariate Analysis, Elsevier, vol. 173(C), pages 724-744.
    5. Lund Steven P. & Nettleton Dan & McCarthy Davis J. & Smyth Gordon K., 2012. "Detecting Differential Expression in RNA-sequence Data Using Quasi-likelihood with Shrunken Dispersion Estimates," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 11(5), pages 1-44, October.
    6. Jungsoo Gim & Sungho Won & Taesung Park, 2016. "LPEseq: Local-Pooled-Error Test for RNA Sequencing Experiments with a Small Number of Replicates," PLOS ONE, Public Library of Science, vol. 11(8), pages 1-15, August.
    7. Kotoka Ekua & Orr Megan, 2017. "Modifying SAMseq to account for asymmetry in the distribution of effect sizes when identifying differentially expressed genes," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 16(5-6), pages 333-347, December.
    8. Gu Mi & Yanming Di & Daniel W Schafer, 2015. "Goodness-of-Fit Tests and Model Diagnostics for Negative Binomial Regression of RNA Sequencing Data," PLOS ONE, Public Library of Science, vol. 10(3), pages 1-16, March.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Aaron C Ericsson & J Wade Davis & William Spollen & Nathan Bivens & Scott Givan & Catherine E Hagan & Mark McIntosh & Craig L Franklin, 2015. "Effects of Vendor and Genetic Background on the Composition of the Fecal Microbiota of Inbred Mice," PLOS ONE, Public Library of Science, vol. 10(2), pages 1-19, February.
    2. Hossain, Ahmed & Beyene, Joseph & Willan, Andrew R. & Hu, Pingzhao, 2009. "A flexible approximate likelihood ratio test for detecting differential expression in microarray data," Computational Statistics & Data Analysis, Elsevier, vol. 53(10), pages 3685-3695, August.
    3. Xiaohong Li & Guy N Brock & Eric C Rouchka & Nigel G F Cooper & Dongfeng Wu & Timothy E O’Toole & Ryan S Gill & Abdallah M Eteleeb & Liz O’Brien & Shesh N Rai, 2017. "A comparison of per sample global scaling and per gene normalization methods for differential expression analysis of RNA-seq data," PLOS ONE, Public Library of Science, vol. 12(5), pages 1-22, May.
    4. Kerr Kathleen F., 2012. "Optimality Criteria for the Design of 2-Color Microarray Studies," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 11(1), pages 1-9, January.
    5. Ambroise Jérôme & Bearzatto Bertrand & Robert Annie & Macq Benoit & Gala Jean-Luc, 2012. "Combining Multiple Laser Scans of Spotted Microarrays by Means of a Two-Way ANOVA Model," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 11(3), pages 1-20, February.
    6. J. McClatchy & R. Strogantsev & E. Wolfe & H. Y. Lin & M. Mohammadhosseini & B. A. Davis & C. Eden & D. Goldman & W. H. Fleming & P. Conley & G. Wu & L. Cimmino & H. Mohammed & A. Agarwal, 2023. "Clonal hematopoiesis related TET2 loss-of-function impedes IL1β-mediated epigenetic reprogramming in hematopoietic stem and progenitor cells," Nature Communications, Nature, vol. 14(1), pages 1-17, December.
    7. Alexandra Gyurdieva & Stefan Zajic & Ya-Fang Chang & E. Andres Houseman & Shan Zhong & Jaegil Kim & Michael Nathenson & Thomas Faitg & Mary Woessner & David C. Turner & Aisha N. Hasan & John Glod & Ro, 2022. "Biomarker correlates with response to NY-ESO-1 TCR T cells in patients with synovial sarcoma," Nature Communications, Nature, vol. 13(1), pages 1-18, December.
    8. Sora Yoon & Seon-Young Kim & Dougu Nam, 2016. "Improving Gene-Set Enrichment Analysis of RNA-Seq Data with Small Replicates," PLOS ONE, Public Library of Science, vol. 11(11), pages 1-16, November.
    9. Yu Lianbo & Gulati Parul & Fernandez Soledad & Pennell Michael & Kirschner Lawrence & Jarjoura David, 2011. "Fully Moderated T-statistic for Small Sample Size Gene Expression Arrays," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 10(1), pages 1-22, September.
    10. Sinan Xiong & Jianbiao Zhou & Tze King Tan & Tae-Hoon Chung & Tuan Zea Tan & Sabrina Hui-Min Toh & Nicole Xin Ning Tang & Yunlu Jia & Yi Xiang See & Melissa Jane Fullwood & Takaomi Sanda & Wee-Joo Chn, 2024. "Super enhancer acquisition drives expression of oncogenic PPP1R15B that regulates protein homeostasis in multiple myeloma," Nature Communications, Nature, vol. 15(1), pages 1-21, December.
    11. Chaofeng Yuan & Wensheng Zhu & Xuming He & Jianhua Guo, 2019. "A mixture factor model with applications to microarray data," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 28(1), pages 60-76, March.
    12. Nan Li & Matthew N. McCall & Zhijin Wu, 2017. "Establishing Informative Prior for Gene Expression Variance from Public Databases," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 9(1), pages 160-177, June.
    13. Brian Caffo & Liu Dongmei & Giovanni Parmigiani, 2004. "Power Conjugate Multilevel Models with Applications to Genomics," Johns Hopkins University Dept. of Biostatistics Working Paper Series 1062, Berkeley Electronic Press.
    14. Nott, David J. & Yu, Zeming & Chan, Eva & Cotsapas, Chris & Cowley, Mark J. & Pulvers, Jeremy & Williams, Rohan & Little, Peter, 2007. "Hierarchical Bayes variable selection and microarray experiments," Journal of Multivariate Analysis, Elsevier, vol. 98(4), pages 852-872, April.
    15. Santu Ghosh & Alan M. Polansky, 2022. "Large-Scale Simultaneous Testing Using Kernel Density Estimation," Sankhya A: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 84(2), pages 808-843, August.
    16. Qianxing Mo & Faming Liang, 2010. "Bayesian Modeling of ChIP-chip Data Through a High-Order Ising Model," Biometrics, The International Biometric Society, vol. 66(4), pages 1284-1294, December.
    17. Ahmed Hossain & Hafiz T.A. Khan, 2016. "Identification of genomic markers correlated with sensitivity in solid tumors to Dasatinib using sparse principal components," Journal of Applied Statistics, Taylor & Francis Journals, vol. 43(14), pages 2538-2549, October.
    18. Alexander Kaever & Manuel Landesfeind & Kirstin Feussner & Burkhard Morgenstern & Ivo Feussner & Peter Meinicke, 2014. "Meta-Analysis of Pathway Enrichment: Combining Independent and Dependent Omics Data Sets," PLOS ONE, Public Library of Science, vol. 9(2), pages 1-12, February.
    19. Iqbal Mahmud & Guimei Tian & Jia Wang & Tarun E. Hutchinson & Brandon J. Kim & Nikee Awasthee & Seth Hale & Chengcheng Meng & Allison Moore & Liming Zhao & Jessica E. Lewis & Aaron Waddell & Shangtao , 2023. "DAXX drives de novo lipogenesis and contributes to tumorigenesis," Nature Communications, Nature, vol. 14(1), pages 1-20, December.
    20. Nyangoma Stephen O. & Collins Stuart I. & Altman Douglas G. & Johnson Philip & Billingham Lucinda J., 2012. "Sample Size Calculations for Designing Clinical Proteomic Profiling Studies Using Mass Spectrometry," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 11(3), pages 1-42, February.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bpj:sagmbi:v:10:y:2011:i:1:n:24. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Peter Golla (email available below). General contact details of provider: https://www.degruyter.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.