IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2408.10825.html
   My bibliography  Save this paper

Conditional nonparametric variable screening by neural factor regression

Author

Listed:
  • Jianqing Fan

    (Princeton University)

  • Weining Wang

    (University of Groningen)

  • Yue Zhao

    (University of York)

Abstract

High-dimensional covariates often admit linear factor structure. To effectively screen correlated covariates in high-dimension, we propose a conditional variable screening test based on non-parametric regression using neural networks due to their representation power. We ask the question whether individual covariates have additional contributions given the latent factors or more generally a set of variables. Our test statistics are based on the estimated partial derivative of the regression function of the candidate variable for screening and a observable proxy for the latent factors. Hence, our test reveals how much predictors contribute additionally to the non-parametric regression after accounting for the latent factors. Our derivative estimator is the convolution of a deep neural network regression estimator and a smoothing kernel. We demonstrate that when the neural network size diverges with the sample size, unlike estimating the regression function itself, it is necessary to smooth the partial derivative of the neural network estimator to recover the desired convergence rate for the derivative. Moreover, our screening test achieves asymptotic normality under the null after finely centering our test statistics that makes the biases negligible, as well as consistency for local alternatives under mild conditions. We demonstrate the performance of our test in a simulation study and two real world applications.

Suggested Citation

  • Jianqing Fan & Weining Wang & Yue Zhao, 2024. "Conditional nonparametric variable screening by neural factor regression," Papers 2408.10825, arXiv.org.
  • Handle: RePEc:arx:papers:2408.10825
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2408.10825
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Andrew Y. Chen & Tom Zimmermann, 2022. "Open Source Cross-Sectional Asset Pricing," Critical Finance Review, now publishers, vol. 11(2), pages 207-264, May.
    2. L. Baringhaus & B. Ebner & N. Henze, 2017. "The limit distribution of weighted $$L^2$$ L 2 -goodness-of-fit statistics under fixed alternatives, with applications," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 69(5), pages 969-995, October.
    3. Michael W. McCracken & Serena Ng, 2016. "FRED-MD: A Monthly Database for Macroeconomic Research," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 34(4), pages 574-589, October.
    4. Emre Barut & Jianqing Fan & Anneleen Verhasselt, 2016. "Conditional Sure Independence Screening," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(515), pages 1266-1277, July.
    5. Bryc, Wlodzimierz & Smolenski, Wlodzimierz, 1992. "On the stability problem for conditional expectation," Statistics & Probability Letters, Elsevier, vol. 15(1), pages 41-46, September.
    6. Bruno Ebner & Norbert Henze, 2020. "Rejoinder on: Tests for multivariate normality—a critical review with emphasis on weighted $$L^2$$ L 2 -statistics," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 29(4), pages 911-913, December.
    7. Fan, Jianqing & Feng, Yang & Song, Rui, 2011. "Nonparametric Independence Screening in Sparse Ultra-High-Dimensional Additive Models," Journal of the American Statistical Association, American Statistical Association, vol. 106(494), pages 544-557.
    8. Chen, Xiaohong & Liao, Zhipeng & Sun, Yixiao, 2014. "Sieve inference on possibly misspecified semi-nonparametric time series models," Journal of Econometrics, Elsevier, vol. 178(P3), pages 639-658.
    9. Hall, Peter & Marron, J. S., 1987. "Estimation of integrated squared density derivatives," Statistics & Probability Letters, Elsevier, vol. 6(2), pages 109-115, November.
    10. Jianqing Fan & Jinchi Lv, 2008. "Sure independence screening for ultrahigh dimensional feature space," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 70(5), pages 849-911, November.
    11. Qi Li & Jeffrey Scott Racine, 2006. "Nonparametric Econometrics: Theory and Practice," Economics Books, Princeton University Press, edition 1, volume 1, number 8355.
    12. Qi Li & Jeffrey Scott Racine, 2006. "Density Estimation, from Nonparametric Econometrics: Theory and Practice," Introductory Chapters, in: Nonparametric Econometrics: Theory and Practice, Princeton University Press.
    13. Jianqing Fan & Yuan Liao, 2022. "Learning Latent Factors From Diversified Projections and Its Applications to Over-Estimated and Weak Factors," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 117(538), pages 909-924, April.
    14. Hardle, W. & Marron, J.S. & Wand, Mp., 1990. "Bandwith choice for density derivatives," LIDAM Reprints CORE 945, Université catholique de Louvain, Center for Operations Research and Econometrics (CORE).
    15. Volodymyr Mnih & Koray Kavukcuoglu & David Silver & Andrei A. Rusu & Joel Veness & Marc G. Bellemare & Alex Graves & Martin Riedmiller & Andreas K. Fidjeland & Georg Ostrovski & Stig Petersen & Charle, 2015. "Human-level control through deep reinforcement learning," Nature, Nature, vol. 518(7540), pages 529-533, February.
    16. Bruno Ebner & Norbert Henze, 2020. "Tests for multivariate normality—a critical review with emphasis on weighted $$L^2$$ L 2 -statistics," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 29(4), pages 845-892, December.
    17. Fan J. & Li R., 2001. "Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 1348-1360, December.
    18. Likai Chen & Weining Wang & Wei Biao Wu, 2022. "Inference of Breakpoints in High-dimensional Time Series," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 117(540), pages 1951-1963, October.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jingxuan Luo & Lili Yue & Gaorong Li, 2023. "Overview of High-Dimensional Measurement Error Regression Models," Mathematics, MDPI, vol. 11(14), pages 1-22, July.
    2. Li, Degui & Linton, Oliver & Lu, Zudi, 2015. "A flexible semiparametric forecasting model for time series," Journal of Econometrics, Elsevier, vol. 187(1), pages 345-357.
    3. Xiaochao Xia & Hao Ming, 2022. "A Flexibly Conditional Screening Approach via a Nonparametric Quantile Partial Correlation," Mathematics, MDPI, vol. 10(24), pages 1-32, December.
    4. Jing Zhang & Haibo Zhou & Yanyan Liu & Jianwen Cai, 2021. "Conditional screening for ultrahigh-dimensional survival data in case-cohort studies," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 27(4), pages 632-661, October.
    5. Yuan, Qingcong & Chen, Xianyan & Ke, Chenlu & Yin, Xiangrong, 2022. "Independence index sufficient variable screening for categorical responses," Computational Statistics & Data Analysis, Elsevier, vol. 174(C).
    6. Shan Luo & Zehua Chen, 2014. "Sequential Lasso Cum EBIC for Feature Selection With Ultra-High Dimensional Feature Space," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(507), pages 1229-1240, September.
    7. Wang, Christina Dan & Chen, Zhao & Lian, Yimin & Chen, Min, 2022. "Asset selection based on high frequency Sharpe ratio," Journal of Econometrics, Elsevier, vol. 227(1), pages 168-188.
    8. Laurent Ferrara & Anna Simoni, 2023. "When are Google Data Useful to Nowcast GDP? An Approach via Preselection and Shrinkage," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 41(4), pages 1188-1202, October.
    9. Borup, Daniel & Christensen, Bent Jesper & Mühlbach, Nicolaj Søndergaard & Nielsen, Mikkel Slot, 2023. "Targeting predictors in random forest regression," International Journal of Forecasting, Elsevier, vol. 39(2), pages 841-868.
    10. Loann David Denis Desboulets, 2018. "A Review on Variable Selection in Regression Analysis," Econometrics, MDPI, vol. 6(4), pages 1-27, November.
    11. Li, Xinyi & Wang, Li & Nettleton, Dan, 2019. "Sparse model identification and learning for ultra-high-dimensional additive partially linear models," Journal of Multivariate Analysis, Elsevier, vol. 173(C), pages 204-228.
    12. Jingyuan Liu & Runze Li & Rongling Wu, 2014. "Feature Selection for Varying Coefficient Models With Ultrahigh-Dimensional Covariates," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(505), pages 266-274, March.
    13. Jianqing Fan & Yang Feng & Jiancheng Jiang & Xin Tong, 2016. "Feature Augmentation via Nonparametrics and Selection (FANS) in High-Dimensional Classification," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(513), pages 275-287, March.
    14. Zhao, Bangxin & Liu, Xin & He, Wenqing & Yi, Grace Y., 2021. "Dynamic tilted current correlation for high dimensional variable screening," Journal of Multivariate Analysis, Elsevier, vol. 182(C).
    15. Zhenghui Feng & Lu Lin & Ruoqing Zhu & Lixing Zhu, 2020. "Nonparametric variable selection and its application to additive models," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 72(3), pages 827-854, June.
    16. He, Yong & Zhang, Liang & Ji, Jiadong & Zhang, Xinsheng, 2019. "Robust feature screening for elliptical copula regression model," Journal of Multivariate Analysis, Elsevier, vol. 173(C), pages 568-582.
    17. Karim M Abadir & Michel Lubrano, 2024. "Explicit solutions for the asymptotically optimal bandwidth in cross-validation," Post-Print hal-04678541, HAL.
    18. Min Chen & Yimin Lian & Zhao Chen & Zhengjun Zhang, 2017. "Sure explained variability and independence screening," Journal of Nonparametric Statistics, Taylor & Francis Journals, vol. 29(4), pages 849-883, October.
    19. Zhang, Shucong & Zhou, Yong, 2018. "Variable screening for ultrahigh dimensional heterogeneous data via conditional quantile correlations," Journal of Multivariate Analysis, Elsevier, vol. 165(C), pages 1-13.
    20. Haofeng Wang & Hongxia Jin & Xuejun Jiang & Jingzhi Li, 2022. "Model Selection for High Dimensional Nonparametric Additive Models via Ridge Estimation," Mathematics, MDPI, vol. 10(23), pages 1-22, December.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2408.10825. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.