IDEAS home Printed from https://ideas.repec.org/a/taf/jnlasa/v115y2020i529p362-379.html
   My bibliography  Save this article

RANK: Large-Scale Inference With Graphical Nonlinear Knockoffs

Author

Listed:
  • Yingying Fan
  • Emre Demirkaya
  • Gaorong Li
  • Jinchi Lv

Abstract

Power and reproducibility are key to enabling refined scientific discoveries in contemporary big data applications with general high-dimensional nonlinear models. In this article, we provide theoretical foundations on the power and robustness for the model-X knockoffs procedure introduced recently in Candès, Fan, Janson and Lv in high-dimensional setting when the covariate distribution is characterized by Gaussian graphical model. We establish that under mild regularity conditions, the power of the oracle knockoffs procedure with known covariate distribution in high-dimensional linear models is asymptotically one as sample size goes to infinity. When moving away from the ideal case, we suggest the modified model-X knockoffs method called graphical nonlinear knockoffs (RANK) to accommodate the unknown covariate distribution. We provide theoretical justifications on the robustness of our modified procedure by showing that the false discovery rate (FDR) is asymptotically controlled at the target level and the power is asymptotically one with the estimated covariate distribution. To the best of our knowledge, this is the first formal theoretical result on the power for the knockoffs procedure. Simulation results demonstrate that compared to existing approaches, our method performs competitively in both FDR control and power. A real dataset is analyzed to further assess the performance of the suggested knockoffs procedure. Supplementary materials for this article are available online.

Suggested Citation

  • Yingying Fan & Emre Demirkaya & Gaorong Li & Jinchi Lv, 2020. "RANK: Large-Scale Inference With Graphical Nonlinear Knockoffs," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 115(529), pages 362-379, January.
  • Handle: RePEc:taf:jnlasa:v:115:y:2020:i:529:p:362-379
    DOI: 10.1080/01621459.2018.1546589
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1080/01621459.2018.1546589
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1080/01621459.2018.1546589?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Emre Demirkaya & Yang Feng & Pallavi Basu & Jinchi Lv, 2022. "Large-scale model selection in misspecified generalized linear models [Information theory and an extension of the maximum likelihood principle]," Biometrika, Biometrika Trust, vol. 109(1), pages 123-136.
    2. Dong, Ruipeng & Zhou, Jia & Zheng, Zemin, 2021. "Controlling the false discovery rate for latent factors via unit-rank deflation," Statistics & Probability Letters, Elsevier, vol. 178(C).
    3. Guo, Xu & Li, Runze & Liu, Jingyuan & Zeng, Mudong, 2023. "Statistical inference for linear mediation models with high-dimensional mediators and application to studying stock reaction to COVID-19 pandemic," Journal of Econometrics, Elsevier, vol. 235(1), pages 166-179.
    4. Jinzhou Li & Marloes H. Maathuis, 2021. "GGM knockoff filter: False discovery rate control for Gaussian graphical models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 83(3), pages 534-558, July.
    5. Dong, Yan & Li, Daoji & Zheng, Zemin & Zhou, Jia, 2022. "Reproducible feature selection in high-dimensional accelerated failure time models," Statistics & Probability Letters, Elsevier, vol. 181(C).
    6. Panxu Yuan & Yinfei Kong & Gaorong Li, 2024. "FDR control and power analysis for high-dimensional logistic regression via StabKoff," Statistical Papers, Springer, vol. 65(5), pages 2719-2749, July.
    7. Pan, Yingli, 2022. "Feature screening and FDR control with knockoff features for ultrahigh-dimensional right-censored data," Computational Statistics & Data Analysis, Elsevier, vol. 173(C).
    8. Guo, Xu & Li, Runze & Liu, Jingyuan & Zeng, Mudong, 2024. "Reprint: Statistical inference for linear mediation models with high-dimensional mediators and application to studying stock reaction to COVID-19 pandemic," Journal of Econometrics, Elsevier, vol. 239(2).
    9. Wen, Xin & Li, Yang & Zheng, Zemin, 2024. "Scalable efficient reproducible multi-task learning via data splitting," Statistics & Probability Letters, Elsevier, vol. 208(C).
    10. Zemin Zheng & Jinchi Lv & Wei Lin, 2021. "Nonsparse Learning with Latent Variables," Operations Research, INFORMS, vol. 69(1), pages 346-359, January.
    11. Zhou, Jia & Li, Yang & Zheng, Zemin & Li, Daoji, 2022. "Reproducible learning in large-scale graphical models," Journal of Multivariate Analysis, Elsevier, vol. 189(C).
    12. Zhang, Yaowu & Zhou, Yeqing & Zhu, Liping, 2024. "A post-screening diagnostic study for ultrahigh dimensional data," Journal of Econometrics, Elsevier, vol. 239(2).

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:taf:jnlasa:v:115:y:2020:i:529:p:362-379. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Longhurst (email available below). General contact details of provider: http://www.tandfonline.com/UASA20 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.