IDEAS home Printed from https://ideas.repec.org/a/oup/biomet/v109y2022i2p277-293..html
   My bibliography  Save this article

Fast and powerful conditional randomization testing via distillation
[Controlling the false discovery rate via knockoffs]

Author

Listed:
  • Molei Liu
  • Eugene Katsevich
  • Lucas Janson
  • Aaditya Ramdas

Abstract

SummaryWe consider the problem of conditional independence testing: given a responseand covariates , we test the null hypothesis that . The conditional randomization test was recently proposed as a way to use distributional information aboutto exactly and nonasymptotically control Type-I error using any test statistic in any dimensionality without assuming anything about . This flexibility, in principle, allows one to derive powerful test statistics from complex prediction algorithms while maintaining statistical validity. Yet the direct use of such advanced test statistics in the conditional randomization test is prohibitively computationally expensive, especially with multiple testing, due to the requirement to recompute the test statistic many times on resampled data. We propose the distilled conditional randomization test, a novel approach to using state-of-the-art machine learning algorithms in the conditional randomization test while drastically reducing the number of times those algorithms need to be run, thereby taking advantage of their power and the conditional randomization test’s statistical guarantees without suffering the usual computational expense. In addition to distillation, we propose a number of other tricks, like screening and recycling computations, to further speed up the conditional randomization test without sacrificing its high power and exact validity. Indeed, we show in simulations that all our proposals combined lead to a test that has similar power to most powerful existing conditional randomization test implementations, but requires orders of magnitude less computation, making it a practical tool even for large datasets. We demonstrate these benefits on a breast cancer dataset by identifying biomarkers related to cancer stage.

Suggested Citation

  • Molei Liu & Eugene Katsevich & Lucas Janson & Aaditya Ramdas, 2022. "Fast and powerful conditional randomization testing via distillation [Controlling the false discovery rate via knockoffs]," Biometrika, Biometrika Trust, vol. 109(2), pages 277-293.
  • Handle: RePEc:oup:biomet:v:109:y:2022:i:2:p:277-293.
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1093/biomet/asab039
    Download Restriction: Access to full text is restricted to subscribers.
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:oup:biomet:v:109:y:2022:i:2:p:277-293.. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Oxford University Press (email available below). General contact details of provider: https://academic.oup.com/biomet .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.