IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v180y2023ics0167947322002444.html
   My bibliography  Save this article

Covariate-modulated large-scale multiple testing under dependence

Author

Listed:
  • Wang, Jiangzhou
  • Cui, Tingting
  • Zhu, Wensheng
  • Wang, Pengfei

Abstract

Large-scale multiple testing, which calls for conducting tens of thousands of hypothesis testings simultaneously, has been applied in many scientific fields. Most conventional multiple testing procedures often focused on the control of false discovery rate (FDR) and largely ignored covariate information and the dependence structure among tests. A FDR control procedure, termed as Covariate-Modulated Local Index of Significance (cmLIS) procedure, which not only takes into account local correlations among tests but also accommodates the covariate information by leveraging a covariate-modulated hidden Markov model (HMM), has been proposed. In the oracle case where all parameters of the covariate-modulated HMM are known, the cmLIS procedure is shown to be valid and optimal in some sense. According to whether the number of mixed components in the non-null distribution is known, two Bayesian sampling algorithms are provided for parameter estimation. Extensive simulations are conducted to demonstrate the effectiveness of the cmLIS procedure over state-of-the-art multiple testing procedures. Finally, the cmLIS procedure is applied to an RNA sequencing data and a schizophrenia (SCZ) data.

Suggested Citation

  • Wang, Jiangzhou & Cui, Tingting & Zhu, Wensheng & Wang, Pengfei, 2023. "Covariate-modulated large-scale multiple testing under dependence," Computational Statistics & Data Analysis, Elsevier, vol. 180(C).
  • Handle: RePEc:eee:csdana:v:180:y:2023:i:c:s0167947322002444
    DOI: 10.1016/j.csda.2022.107664
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947322002444
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2022.107664?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Art B. Owen, 2005. "Variance of the number of false discoveries," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(3), pages 411-426, June.
    2. Wensheng Zhu & Yuan Jiang & Heping Zhang, 2012. "Nonparametric Covariate-Adjusted Association Tests Based on the Generalized Kendall's Tau," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 107(497), pages 1-11, March.
    3. Armin Schwartzman & Xihong Lin, 2011. "The effect of correlation in false discovery rate estimation," Biometrika, Biometrika Trust, vol. 98(1), pages 199-214.
    4. repec:dau:papers:123456789/14578 is not listed on IDEAS
    5. Wenguang Sun & Brian J. Reich & T. Tony Cai & Michele Guindani & Armin Schwartzman, 2015. "False discovery control in large-scale spatial multiple testing," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 77(1), pages 59-83, January.
    6. Teh, Yee Whye & Jordan, Michael I. & Beal, Matthew J. & Blei, David M., 2006. "Hierarchical Dirichlet Processes," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 1566-1581, December.
    7. Thomas A. Murray & Ying Yuan & Peter F. Thall, 2018. "A Bayesian Machine Learning Approach for Optimizing Dynamic Treatment Regimes," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(523), pages 1255-1267, July.
    8. Tingting Cui & Pengfei Wang & Wensheng Zhu, 2021. "Covariate-adjusted multiple testing in genome-wide association studies via factorial hidden Markov models," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 30(3), pages 737-757, September.
    9. Lihua Lei & William Fithian, 2018. "AdaPT: an interactive procedure for multiple testing with side information," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 80(4), pages 649-679, September.
    10. Ang Li & Rina Foygel Barber, 2017. "Accumulation Tests for FDR Control in Ordered Hypothesis Testing," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 112(518), pages 837-849, April.
    11. repec:dau:papers:123456789/8332 is not listed on IDEAS
    12. C. Yau & O. Papaspiliopoulos & G. O. Roberts & C. Holmes, 2011. "Bayesian non‐parametric hidden Markov models with applications in genomics," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 73(1), pages 37-57, January.
    13. Wang, Xia & Shojaie, Ali & Zou, Jian, 2019. "Bayesian hidden Markov models for dependent large-scale multiple testing," Computational Statistics & Data Analysis, Elsevier, vol. 136(C), pages 123-136.
    14. Hai Shu & Bin Nan & Robert Koeppe, 2015. "Multiple testing for neuroimaging via hidden Markov random field," Biometrics, The International Biometric Society, vol. 71(3), pages 741-750, September.
    15. Wenguang Sun & T. Tony Cai, 2009. "Large‐scale multiple testing under dependence," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 71(2), pages 393-424, April.
    16. Pei Fen Kuan & Derek Y. Chiang, 2012. "Integrating Prior Knowledge in Multiple Testing under Dependence with Applications to Detecting Differential DNA Methylation," Biometrics, The International Biometric Society, vol. 68(3), pages 774-783, September.
    17. Robert, Christian P. & Celeux, Gilles & Diebolt, Jean, 1993. "Bayesian estimation of hidden Markov chains: a stochastic implementation," Statistics & Probability Letters, Elsevier, vol. 16(1), pages 77-83, January.
    18. Christopher Genovese & Larry Wasserman, 2002. "Operating characteristics and extensions of the false discovery rate procedure," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 64(3), pages 499-517, August.
    19. Efron, Bradley, 2007. "Correlation and Large-Scale Simultaneous Significance Testing," Journal of the American Statistical Association, American Statistical Association, vol. 102, pages 93-103, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Wang, Xia & Shojaie, Ali & Zou, Jian, 2019. "Bayesian hidden Markov models for dependent large-scale multiple testing," Computational Statistics & Data Analysis, Elsevier, vol. 136(C), pages 123-136.
    2. Tingting Cui & Pengfei Wang & Wensheng Zhu, 2021. "Covariate-adjusted multiple testing in genome-wide association studies via factorial hidden Markov models," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 30(3), pages 737-757, September.
    3. Sairam Rayaprolu & Zhiyi Chi, 2021. "False Discovery Variance Reduction in Large Scale Simultaneous Hypothesis Tests," Methodology and Computing in Applied Probability, Springer, vol. 23(3), pages 711-733, September.
    4. Noirrit Kiran Chandra & Sourabh Bhattacharya, 2021. "Asymptotic theory of dependent Bayesian multiple testing procedures under possible model misspecification," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 73(5), pages 891-920, October.
    5. Pengfei Wang & Wensheng Zhu, 2022. "Large‐scale covariate‐assisted two‐sample inference under dependence," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 49(4), pages 1421-1447, December.
    6. Jianqing Fan & Xu Han, 2017. "Estimation of the false discovery proportion with unknown dependence," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 79(4), pages 1143-1164, September.
    7. Chen, Xiongzhi & Doerge, R.W., 2020. "A strong law of large numbers related to multiple testing normal means," Statistics & Probability Letters, Elsevier, vol. 159(C).
    8. Wenguang Sun & T. Tony Cai, 2009. "Large‐scale multiple testing under dependence," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 71(2), pages 393-424, April.
    9. Ghosh Debashis, 2012. "Incorporating the Empirical Null Hypothesis into the Benjamini-Hochberg Procedure," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 11(4), pages 1-21, July.
    10. Jelle J Goeman & Aldo Solari, 2024. "On selection and conditioning in multiple testing and selective inference," Biometrika, Biometrika Trust, vol. 111(2), pages 393-416.
    11. Jeong Hwan Kook & Michele Guindani & Linlin Zhang & Marina Vannucci, 2019. "NPBayes-fMRI: Non-parametric Bayesian General Linear Models for Single- and Multi-Subject fMRI Data," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 11(1), pages 3-21, April.
    12. Gordon, Alexander & Chen, Linlin & Glazko, Galina & Yakovlev, Andrei, 2009. "Balancing type one and two errors in multiple testing for differential expression of genes," Computational Statistics & Data Analysis, Elsevier, vol. 53(5), pages 1622-1629, March.
    13. Lee, Donghwan & Lee, Youngjo, 2016. "Extended likelihood approach to multiple testing with directional error control under a hidden Markov random field model," Journal of Multivariate Analysis, Elsevier, vol. 151(C), pages 1-13.
    14. Lim Johan & Kim Jayoun & Kim Sang-cheol & Yu Donghyeon & Kim Kyunga & Kim Byung Soo, 2012. "Detection of Differentially Expressed Gene Sets in a Partially Paired Microarray Data Set," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 11(3), pages 1-30, February.
    15. Michele Guindani & Wesley O. Johnson, 2018. "More nonparametric Bayesian inference in applications," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 27(2), pages 239-251, June.
    16. Hai Shu & Bin Nan & Robert Koeppe, 2015. "Multiple testing for neuroimaging via hidden Markov random field," Biometrics, The International Biometric Society, vol. 71(3), pages 741-750, September.
    17. Elisa C. J. Maria & Isabel Salazar & Luis Sanz & Miguel A. Gómez-Villegas, 2020. "Using Copula to Model Dependence When Testing Multiple Hypotheses in DNA Microarray Experiments: A Bayesian Approximation," Mathematics, MDPI, vol. 8(9), pages 1-22, September.
    18. Yin Xia, 2017. "Testing and support recovery of multiple high-dimensional covariance matrices with false discovery rate control," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 26(4), pages 782-801, December.
    19. Joungyoun Kim & Donghyeon Yu & Johan Lim & Joong-Ho Won, 2018. "A peeling algorithm for multiple testing on a random field," Computational Statistics, Springer, vol. 33(1), pages 503-525, March.
    20. T. Tony Cai & Weidong Liu, 2016. "Large-Scale Multiple Testing of Correlations," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(513), pages 229-240, March.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:180:y:2023:i:c:s0167947322002444. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.