IDEAS home Printed from https://ideas.repec.org/a/bpj/sagmbi/v12y2013i4p449-467n3.html
   My bibliography  Save this article

General power and sample size calculations for high-dimensional genomic data

Author

Listed:
  • van Iterson Maarten

    (Center for Human and Clinical Genetics, Leiden University Medical Center, Leiden, The Netherlands)

  • van de Wiel Mark A.

    (Department of Epidemiology and Biostatistics, VU University Medical Center, Amsterdam, The Netherlands)

  • Boer Judith M.

    (Center for Human and Clinical Genetics, Leiden University Medical Center, Leiden, The Netherlands Erasmus Medical Centre, Sophia Children’s Hospital, Laboratory of Pediatric Oncology/Hematology, Rotterdam, The Netherlands Netherlands Bioinformatics Centre, Nijmegen)

  • de Menezes Renée X.

    (Department of Epidemiology and Biostatistics, VU University Medical Center, Amsterdam, The Netherlands Netherlands Bioinformatics Centre, Nijmegen)

Abstract

In the design of microarray or next-generation sequencing experiments it is crucial to choose the appropriate number of biological replicates. As often the number of differentially expressed genes and their effect sizes are small and too few replicates will lead to insufficient power to detect these. On the other hand, too many replicates unnecessary leads to high experimental costs. Power and sample size analysis can guide experimentalist in choosing the appropriate number of biological replicates. Several methods for power and sample size analysis have recently been proposed for microarray data. However, most of these are restricted to two group comparisons and require user-defined effect sizes. Here we propose a pilot-data based method for power and sample size analysis which can handle more general experimental designs and uses pilot-data to obtain estimates of the effect sizes. The method can also handle χ2 distributed test statistics which enables power and sample size calculations for a much wider class of models, including high-dimensional generalized linear models which are used, e.g., for RNA-seq data analysis. The performance of the method is evaluated using simulated and experimental data from several microarray and next-generation sequencing experiments. Furthermore, we compare our proposed method for estimation of the density of effect sizes from pilot data with a recent proposed method specific for two group comparisons.

Suggested Citation

  • van Iterson Maarten & van de Wiel Mark A. & Boer Judith M. & de Menezes Renée X., 2013. "General power and sample size calculations for high-dimensional genomic data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 12(4), pages 449-467, August.
  • Handle: RePEc:bpj:sagmbi:v:12:y:2013:i:4:p:449-467:n:3
    DOI: 10.1515/sagmb-2012-0046
    as

    Download full text from publisher

    File URL: https://doi.org/10.1515/sagmb-2012-0046
    Download Restriction: For access to full text, subscription to the journal or payment for the individual article is required.

    File URL: https://libkey.io/10.1515/sagmb-2012-0046?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Shigeyuki Matsui & Hisashi Noma, 2011. "Estimating Effect Sizes of Differentially Expressed Genes for Power and Sample-Size Assessments in Microarray Experiments," Biometrics, The International Biometric Society, vol. 67(4), pages 1225-1235, December.
    2. Mark A. van de Wiel & Kyung In Kim, 2007. "Estimating the False Discovery Rate Using Nonparametric Deconvolution," Biometrics, The International Biometric Society, vol. 63(3), pages 806-815, September.
    3. Long Qu & Dan Nettleton & Jack C. M. Dekkers, 2012. "Improved Estimation of the Noncentrality Parameter Distribution from a Large Number of t-Statistics, with Applications to False Discovery Rate Estimation in Microarray Data Analysis," Biometrics, The International Biometric Society, vol. 68(4), pages 1178-1187, December.
    4. Mette Langaas & Bo Henry Lindqvist & Egil Ferkingstad, 2005. "Estimating the proportion of true null hypotheses, with application to DNA microarray data," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(4), pages 555-572, September.
    5. Ruppert,David & Wand,M. P. & Carroll,R. J., 2003. "Semiparametric Regression," Cambridge Books, Cambridge University Press, number 9780521785167, October.
    6. Efron, Bradley, 2009. "Empirical Bayes Estimates for Large-Scale Prediction Problems," Journal of the American Statistical Association, American Statistical Association, vol. 104(487), pages 1015-1028.
    7. Gwowen Shieh, 2000. "On Power and Sample Size Calculations for Likelihood Ratio Tests in Generalized Linear Models," Biometrics, The International Biometric Society, vol. 56(4), pages 1192-1196, December.
    8. Ruppert,David & Wand,M. P. & Carroll,R. J., 2003. "Semiparametric Regression," Cambridge Books, Cambridge University Press, number 9780521780506, October.
    9. Inyoung Kim & Noah D. Cohen & Raymond J. Carroll, 2003. "Semiparametric Regression Splines in Matched Case-Control Studies," Biometrics, The International Biometric Society, vol. 59(4), pages 1158-1169, December.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Zehetmayer Sonja & Graf Alexandra C. & Posch Martin, 2015. "Sample size reassessment for a two-stage design controlling the false discovery rate," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 14(5), pages 429-442, November.
    2. Katharina T. Schmid & Barbara Höllbacher & Cristiana Cruceanu & Anika Böttcher & Heiko Lickert & Elisabeth B. Binder & Fabian J. Theis & Matthias Heinig, 2021. "scPower accelerates and optimizes the design of multi-sample single cell transcriptomic studies," Nature Communications, Nature, vol. 12(1), pages 1-18, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Otto-Sobotka, Fabian & Salvati, Nicola & Ranalli, Maria Giovanna & Kneib, Thomas, 2019. "Adaptive semiparametric M-quantile regression," Econometrics and Statistics, Elsevier, vol. 11(C), pages 116-129.
    2. Timothy K.M. Beatty & Erling Røed Larsen, 2005. "Using Engel curves to estimate bias in the Canadian CPI as a cost of living index," Canadian Journal of Economics/Revue canadienne d'économique, John Wiley & Sons, vol. 38(2), pages 482-499, May.
    3. Arthur Charpentier & Emmanuel Flachaire & Antoine Ly, 2017. "Econom\'etrie et Machine Learning," Papers 1708.06992, arXiv.org, revised Mar 2018.
    4. Hyunju Son & Youyi Fong, 2021. "Fast grid search and bootstrap‐based inference for continuous two‐phase polynomial regression models," Environmetrics, John Wiley & Sons, Ltd., vol. 32(3), May.
    5. Michael Wegener & Göran Kauermann, 2017. "Forecasting in nonlinear univariate time series using penalized splines," Statistical Papers, Springer, vol. 58(3), pages 557-576, September.
    6. Dlugosz, Stephan & Mammen, Enno & Wilke, Ralf A., 2017. "Generalized partially linear regression with misclassified data and an application to labour market transitions," Computational Statistics & Data Analysis, Elsevier, vol. 110(C), pages 145-159.
    7. Bernhard Baumgartner & Daniel Guhl & Thomas Kneib & Winfried J. Steiner, 2018. "Flexible estimation of time-varying effects for frequently purchased retail goods: a modeling approach based on household panel data," OR Spectrum: Quantitative Approaches in Management, Springer;Gesellschaft für Operations Research e.V., vol. 40(4), pages 837-873, October.
    8. Zi Ye & Giles Hooker & Stephen P. Ellner, 2021. "Generalized Single Index Models and Jensen Effects on Reproduction and Survival," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 26(3), pages 492-512, September.
    9. Ferraccioli, Federico & Sangalli, Laura M. & Finos, Livio, 2022. "Some first inferential tools for spatial regression with differential regularization," Journal of Multivariate Analysis, Elsevier, vol. 189(C).
    10. Alexander Dokumentov & Rob J. Hyndman, 2022. "STR: Seasonal-Trend Decomposition Using Regression," INFORMS Joural on Data Science, INFORMS, vol. 1(1), pages 50-62, April.
    11. Kalogridis, Ioannis & Van Aelst, Stefan, 2023. "Robust penalized estimators for functional linear regression," Journal of Multivariate Analysis, Elsevier, vol. 194(C).
    12. Krisztin, Tamás, 2018. "Semi-parametric spatial autoregressive models in freight generation modeling," Transportation Research Part E: Logistics and Transportation Review, Elsevier, vol. 114(C), pages 121-143.
    13. Lauren N. Berry & Nathaniel E. Helwig, 2021. "Cross-Validation, Information Theory, or Maximum Likelihood? A Comparison of Tuning Methods for Penalized Splines," Stats, MDPI, vol. 4(3), pages 1-24, September.
    14. Nagler Thomas & Schellhase Christian & Czado Claudia, 2017. "Nonparametric estimation of simplified vine copula models: comparison of methods," Dependence Modeling, De Gruyter, vol. 5(1), pages 99-120, January.
    15. Yukun Zhang & Haocheng Li & Sarah Kozey Keadle & Charles E. Matthews & Raymond J. Carroll, 2019. "A Review of Statistical Analyses on Physical Activity Data Collected from Accelerometers," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 11(2), pages 465-476, July.
    16. Wei Huang & Oliver Linton & Zheng Zhang, 2021. "A Unified Framework for Specification Tests of Continuous Treatment Effect Models," Papers 2102.08063, arXiv.org, revised Sep 2021.
    17. Massimiliano Mazzanti & Antonio Musolesi, 2020. "Modeling Green Knowledge Production and Environmental Policies with Semiparametric Panel Data Regression models," SEEDS Working Papers 1420, SEEDS, Sustainability Environmental Economics and Dynamics Studies, revised Sep 2020.
    18. Basile, Roberto & Durbán, María & Mínguez, Román & María Montero, Jose & Mur, Jesús, 2014. "Modeling regional economic dynamics: Spatial dependence, spatial heterogeneity and nonlinearities," Journal of Economic Dynamics and Control, Elsevier, vol. 48(C), pages 229-245.
    19. Morteza Amini & Mahdi Roozbeh & Nur Anisah Mohamed, 2024. "Separation of the Linear and Nonlinear Covariates in the Sparse Semi-Parametric Regression Model in the Presence of Outliers," Mathematics, MDPI, vol. 12(2), pages 1-17, January.
    20. Wahba, Jackline & Schluter, Christian, 2009. "Illegal migration, wages and remittances- semi-parametric estimation of illegality effects," Discussion Paper Series In Economics And Econometrics 913, Economics Division, School of Social Sciences, University of Southampton.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bpj:sagmbi:v:12:y:2013:i:4:p:449-467:n:3. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Peter Golla (email available below). General contact details of provider: https://www.degruyter.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.