IDEAS home Printed from https://ideas.repec.org/a/bla/scjsta/v49y2022i4p1421-1447.html
   My bibliography  Save this article

Large‐scale covariate‐assisted two‐sample inference under dependence

Author

Listed:
  • Pengfei Wang
  • Wensheng Zhu

Abstract

The problems of large‐scale two‐sample inference often arise from the statistical analysis of “high throughput" data. Conventional multiple testing procedures usually suffer from loss of testing efficiency when conducting two‐sample t$$ t $$‐tests directly. To some extent, this is because of the ignorance of sparsity information. Moreover, the two‐sample tests commonly have local correlations, and neglecting the dependence structure may decrease the statistical accuracy. Therefore, it is imperative to develop a procedure that considers both sparsity information and dependence structure among the tests. We start by introducing a novel dependence model to allow for sparsity information and dependence structure. Based on the dependence model, we propose a covariate‐assisted local index of significance (COALIS)$$ \left(\mathbf{COALIS}\right) $$ procedure and show that it is valid and optimal. Then a data‐driven procedure is developed to mimic the oracle procedure. Both simulations and real data analysis show that the COALIS procedure outperforms its competitors.

Suggested Citation

  • Pengfei Wang & Wensheng Zhu, 2022. "Large‐scale covariate‐assisted two‐sample inference under dependence," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 49(4), pages 1421-1447, December.
  • Handle: RePEc:bla:scjsta:v:49:y:2022:i:4:p:1421-1447
    DOI: 10.1111/sjos.12608
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/sjos.12608
    Download Restriction: no

    File URL: https://libkey.io/10.1111/sjos.12608?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Art B. Owen, 2005. "Variance of the number of false discoveries," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(3), pages 411-426, June.
    2. Hai Shu & Bin Nan & Robert Koeppe, 2015. "Multiple testing for neuroimaging via hidden Markov random field," Biometrics, The International Biometric Society, vol. 71(3), pages 741-750, September.
    3. Max Grazier G'Sell & Stefan Wager & Alexandra Chouldechova & Robert Tibshirani, 2016. "Sequential selection procedures and false discovery rate control," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 78(2), pages 423-444, March.
    4. Sun, Wenguang & Cai, T. Tony, 2007. "Oracle and Adaptive Compound Decision Rules for False Discovery Rate Control," Journal of the American Statistical Association, American Statistical Association, vol. 102, pages 901-912, September.
    5. Pei Fen Kuan & Derek Y. Chiang, 2012. "Integrating Prior Knowledge in Multiple Testing under Dependence with Applications to Detecting Differential DNA Methylation," Biometrics, The International Biometric Society, vol. 68(3), pages 774-783, September.
    6. Ang Li & Rina Foygel Barber, 2017. "Accumulation Tests for FDR Control in Ordered Hypothesis Testing," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 112(518), pages 837-849, April.
    7. He, Li & Sarkar, Sanat K. & Zhao, Zhigen, 2015. "Capturing the severity of type II errors in high-dimensional multiple testing," Journal of Multivariate Analysis, Elsevier, vol. 142(C), pages 106-116.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Wang, Jiangzhou & Cui, Tingting & Zhu, Wensheng & Wang, Pengfei, 2023. "Covariate-modulated large-scale multiple testing under dependence," Computational Statistics & Data Analysis, Elsevier, vol. 180(C).
    2. Tingting Cui & Pengfei Wang & Wensheng Zhu, 2021. "Covariate-adjusted multiple testing in genome-wide association studies via factorial hidden Markov models," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 30(3), pages 737-757, September.
    3. Shiyun Chen & Ery Arias-Castro, 2021. "On the power of some sequential multiple testing procedures," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 73(2), pages 311-336, April.
    4. Wang, Xia & Shojaie, Ali & Zou, Jian, 2019. "Bayesian hidden Markov models for dependent large-scale multiple testing," Computational Statistics & Data Analysis, Elsevier, vol. 136(C), pages 123-136.
    5. Damian Kozbur, 2020. "Analysis of Testing‐Based Forward Model Selection," Econometrica, Econometric Society, vol. 88(5), pages 2147-2173, September.
    6. Zhigen Zhao, 2022. "Where to find needles in a haystack?," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 31(1), pages 148-174, March.
    7. Joungyoun Kim & Donghyeon Yu & Johan Lim & Joong-Ho Won, 2018. "A peeling algorithm for multiple testing on a random field," Computational Statistics, Springer, vol. 33(1), pages 503-525, March.
    8. Wenguang Sun & T. Tony Cai, 2009. "Large‐scale multiple testing under dependence," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 71(2), pages 393-424, April.
    9. Jeng, X. Jessie & Chen, Xiongzhi, 2019. "Predictor ranking and false discovery proportion control in high-dimensional regression," Journal of Multivariate Analysis, Elsevier, vol. 171(C), pages 163-175.
    10. Shonosuke Sugasawa & Hisashi Noma, 2021. "Efficient screening of predictive biomarkers for individual treatment selection," Biometrics, The International Biometric Society, vol. 77(1), pages 249-257, March.
    11. Liang, Weijuan & Zhang, Qingzhao & Ma, Shuangge, 2024. "Hierarchical false discovery rate control for high-dimensional survival analysis with interactions," Computational Statistics & Data Analysis, Elsevier, vol. 192(C).
    12. X. Jessie Jeng & Huimin Peng & Wenbin Lu, 2021. "Model Selection With Mixed Variables on the Lasso Path," Sankhya B: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 83(1), pages 170-184, May.
    13. Daniel Yekutieli, 2015. "Bayesian tests for composite alternative hypotheses in cross-tabulated data," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 24(2), pages 287-301, June.
    14. Ghosh Debashis, 2012. "Incorporating the Empirical Null Hypothesis into the Benjamini-Hochberg Procedure," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 11(4), pages 1-21, July.
    15. Ruth Heller & Saharon Rosset, 2021. "Optimal control of false discovery criteria in the two‐group model," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 83(1), pages 133-155, February.
    16. Won, Joong-Ho & Lim, Johan & Yu, Donghyeon & Kim, Byung Soo & Kim, Kyunga, 2014. "Monotone false discovery rate," Statistics & Probability Letters, Elsevier, vol. 87(C), pages 86-93.
    17. Gómez-Villegas Miguel A. & Salazar Isabel & Sanz Luis, 2014. "A Bayesian decision procedure for testing multiple hypotheses in DNA microarray experiments," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 13(1), pages 49-65, February.
    18. Chen, Yunxiao & Li, Xiaoou, 2023. "Compound sequential change-point detection in parallel data streams," LSE Research Online Documents on Economics 111010, London School of Economics and Political Science, LSE Library.
    19. Edsel Peña & Joshua Habiger & Wensong Wu, 2015. "Classes of multiple decision functions strongly controlling FWER and FDR," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 78(5), pages 563-595, July.
    20. Xu Zhao & Zhongxian Zhang & Weihu Cheng & Pengyue Zhang, 2019. "A New Parameter Estimator for the Generalized Pareto Distribution under the Peaks over Threshold Framework," Mathematics, MDPI, vol. 7(5), pages 1-18, May.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:scjsta:v:49:y:2022:i:4:p:1421-1447. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: http://www.blackwellpublishing.com/journal.asp?ref=0303-6898 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.