IDEAS home Printed from https://ideas.repec.org/a/bla/jorssb/v81y2019i2p187-234.html
   My bibliography  Save this article

Covariate‐assisted ranking and screening for large‐scale two‐sample inference

Author

Listed:
  • T. Tony Cai
  • Wenguang Sun
  • Weinan Wang

Abstract

Two‐sample multiple testing has a wide range of applications. The conventional practice first reduces the original observations to a vector of p‐values and then chooses a cut‐off to adjust for multiplicity. However, this data reduction step could cause significant loss of information and thus lead to suboptimal testing procedures. We introduce a new framework for two‐sample multiple testing by incorporating a carefully constructed auxiliary variable in inference to improve the power. A data‐driven multiple‐testing procedure is developed by employing a covariate‐assisted ranking and screening (CARS) approach that optimally combines the information from both the primary and the auxiliary variables. The proposed CARS procedure is shown to be asymptotically valid and optimal for false discovery rate control. The procedure is implemented in the R package CARS. Numerical results confirm the effectiveness of CARS in false discovery rate control and show that it achieves substantial power gain over existing methods. CARS is also illustrated through an application to the analysis of a satellite imaging data set for supernova detection.

Suggested Citation

  • T. Tony Cai & Wenguang Sun & Weinan Wang, 2019. "Covariate‐assisted ranking and screening for large‐scale two‐sample inference," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 81(2), pages 187-234, April.
  • Handle: RePEc:bla:jorssb:v:81:y:2019:i:2:p:187-234
    DOI: 10.1111/rssb.12304
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/rssb.12304
    Download Restriction: no

    File URL: https://libkey.io/10.1111/rssb.12304?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Yoav Benjamini & Ruth Heller, 2008. "Screening for Partial Conjunction Hypotheses," Biometrics, The International Biometric Society, vol. 64(4), pages 1215-1222, December.
    2. Sun, Wenguang & Cai, T. Tony, 2007. "Oracle and Adaptive Compound Decision Rules for False Discovery Rate Control," Journal of the American Statistical Association, American Statistical Association, vol. 102, pages 901-912, September.
    3. Rubin Daniel & Dudoit Sandrine & van der Laan Mark, 2006. "A Method to Increase the Power of Multiple Testing Procedures Through Sample Splitting," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 5(1), pages 1-20, August.
    4. Mette Langaas & Bo Henry Lindqvist & Egil Ferkingstad, 2005. "Estimating the proportion of true null hypotheses, with application to DNA microarray data," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(4), pages 555-572, September.
    5. Hu, James X. & Zhao, Hongyu & Zhou, Harrison H., 2010. "False Discovery Rate Control With Groups," Journal of the American Statistical Association, American Statistical Association, vol. 105(491), pages 1215-1227.
    6. Hongyuan Cao & Wenguang Sun & Michael R. Kosorok, 2013. "The optimal power puzzle: scrutiny of the monotone likelihood ratio assumption in multiple testing," Biometrika, Biometrika Trust, vol. 100(2), pages 495-502.
    7. Sun, Wenguang & Wei, Zhi, 2011. "Multiple Testing for Pattern Identification, With Applications to Microarray Time-Course Experiments," Journal of the American Statistical Association, American Statistical Association, vol. 106(493), pages 73-88.
    8. Cai, T. Tony & Sun, Wenguang, 2009. "Simultaneous Testing of Grouped Hypotheses: Finding Needles in Multiple Haystacks," Journal of the American Statistical Association, American Statistical Association, vol. 104(488), pages 1467-1481.
    9. Efron, Bradley, 2004. "Large-Scale Simultaneous Hypothesis Testing: The Choice of a Null Hypothesis," Journal of the American Statistical Association, American Statistical Association, vol. 99, pages 96-104, January.
    10. Efron B. & Tibshirani R. & Storey J.D. & Tusher V., 2001. "Empirical Bayes Analysis of a Microarray Experiment," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 1151-1160, December.
    11. Rina Foygel Barber & Aaditya Ramdas, 2017. "The p-filter: multilayer false discovery rate control for grouped hypotheses," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 79(4), pages 1247-1268, September.
    12. James G. Scott & Ryan C. Kelly & Matthew A. Smith & Pengcheng Zhou & Robert E. Kass, 2015. "False Discovery Rate Regression: An Application to Neural Synchrony Detection in Primary Visual Cortex," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(510), pages 459-471, June.
    13. Pallavi Basu & T. Tony Cai & Kiranmoy Das & Wenguang Sun, 2018. "Weighted False Discovery Rate Control in Large-Scale Multiple Testing," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(523), pages 1172-1183, July.
    14. Christopher Genovese & Larry Wasserman, 2002. "Operating characteristics and extensions of the false discovery rate procedure," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 64(3), pages 499-517, August.
    15. John D. Storey, 2002. "A direct approach to false discovery rates," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 64(3), pages 479-498, August.
    16. Jin, Jiashun & Cai, T. Tony, 2007. "Estimating the Null and the Proportion of Nonnull Effects in Large-Scale Multiple Comparisons," Journal of the American Statistical Association, American Statistical Association, vol. 102, pages 495-506, June.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Ruth Heller & Saharon Rosset, 2021. "Optimal control of false discovery criteria in the two‐group model," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 83(1), pages 133-155, February.
    2. Tingting Cui & Pengfei Wang & Wensheng Zhu, 2021. "Covariate-adjusted multiple testing in genome-wide association studies via factorial hidden Markov models," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 30(3), pages 737-757, September.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Zhao, Haibing & Fung, Wing Kam, 2016. "A powerful FDR control procedure for multiple hypotheses," Computational Statistics & Data Analysis, Elsevier, vol. 98(C), pages 60-70.
    2. Habiger, Joshua D. & Peña, Edsel A., 2014. "Compound p-value statistics for multiple testing procedures," Journal of Multivariate Analysis, Elsevier, vol. 126(C), pages 153-166.
    3. Nikolaos Ignatiadis & Wolfgang Huber, 2021. "Covariate powered cross‐weighted multiple testing," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 83(4), pages 720-751, September.
    4. Ruth Heller & Saharon Rosset, 2021. "Optimal control of false discovery criteria in the two‐group model," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 83(1), pages 133-155, February.
    5. Alejandro Ochoa & John D Storey & Manuel Llinás & Mona Singh, 2015. "Beyond the E-Value: Stratified Statistics for Protein Domain Prediction," PLOS Computational Biology, Public Library of Science, vol. 11(11), pages 1-21, November.
    6. Chen, Xiongzhi, 2019. "Uniformly consistently estimating the proportion of false null hypotheses via Lebesgue–Stieltjes integral equations," Journal of Multivariate Analysis, Elsevier, vol. 173(C), pages 724-744.
    7. Cipolli III, William & Hanson, Timothy & McLain, Alexander C., 2016. "Bayesian nonparametric multiple testing," Computational Statistics & Data Analysis, Elsevier, vol. 101(C), pages 64-79.
    8. T. Tony Cai & Wenguang Sun, 2017. "Optimal screening and discovery of sparse signals with applications to multistage high throughput studies," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 79(1), pages 197-223, January.
    9. Jiaying Gu & Roger Koenker, 2020. "Invidious Comparisons: Ranking and Selection as Compound Decisions," Papers 2012.12550, arXiv.org, revised Sep 2021.
    10. Haibing Zhao & Xinping Cui, 2020. "Constructing confidence intervals for selected parameters," Biometrics, The International Biometric Society, vol. 76(4), pages 1098-1108, December.
    11. Li Wang, 2019. "Weighted multiple testing procedure for grouped hypotheses with k-FWER control," Computational Statistics, Springer, vol. 34(2), pages 885-909, June.
    12. Wen Shi & Xi Chen & Jennifer Shang, 2019. "An Efficient Morris Method-Based Framework for Simulation Factor Screening," INFORMS Journal on Computing, INFORMS, vol. 31(4), pages 745-770, October.
    13. Long Qu & Dan Nettleton & Jack C. M. Dekkers, 2012. "Improved Estimation of the Noncentrality Parameter Distribution from a Large Number of t-Statistics, with Applications to False Discovery Rate Estimation in Microarray Data Analysis," Biometrics, The International Biometric Society, vol. 68(4), pages 1178-1187, December.
    14. Zhigen Zhao, 2022. "Where to find needles in a haystack?," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 31(1), pages 148-174, March.
    15. Izmirlian, Grant, 2020. "Strong consistency and asymptotic normality for quantities related to the Benjamini–Hochberg false discovery rate procedure," Statistics & Probability Letters, Elsevier, vol. 160(C).
    16. Dennis Leung & Wenguang Sun, 2022. "ZAP: Z$$ Z $$‐value adaptive procedures for false discovery rate control with side information," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 84(5), pages 1886-1946, November.
    17. Hai Shu & Bin Nan & Robert Koeppe, 2015. "Multiple testing for neuroimaging via hidden Markov random field," Biometrics, The International Biometric Society, vol. 71(3), pages 741-750, September.
    18. Haibing Zhao & Wing Kam Fung, 2018. "Controlling mixed directional false discovery rate in multidimensional decisions with applications to microarray studies," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 27(2), pages 316-337, June.
    19. Chang Yu & Daniel Zelterman, 2020. "Distributions associated with simultaneous multiple hypothesis testing," Journal of Statistical Distributions and Applications, Springer, vol. 7(1), pages 1-17, December.
    20. Wenguang Sun & T. Tony Cai, 2009. "Large‐scale multiple testing under dependence," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 71(2), pages 393-424, April.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:jorssb:v:81:y:2019:i:2:p:187-234. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: https://edirc.repec.org/data/rssssea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.