IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2411.16666.html
   My bibliography  Save this paper

CatNet: Effective FDR Control in LSTM with Gaussian Mirrors and SHAP Feature Importance

Author

Listed:
  • Jiaan Han
  • Junxiao Chen
  • Yanzhe Fu

Abstract

We introduce CatNet, an algorithm that effectively controls False Discovery Rate (FDR) and selects significant features in LSTM with the Gaussian Mirror (GM) method. To evaluate the feature importance of LSTM in time series, we introduce a vector of the derivative of the SHapley Additive exPlanations (SHAP) to measure feature importance. We also propose a new kernel-based dependence measure to avoid multicollinearity in the GM algorithm, to make a robust feature selection with controlled FDR. We use simulated data to evaluate CatNet's performance in both linear models and LSTM models with different link functions. The algorithm effectively controls the FDR while maintaining a high statistical power in all cases. We also evaluate the algorithm's performance in different low-dimensional and high-dimensional cases, demonstrating its robustness in various input dimensions. To evaluate CatNet's performance in real world applications, we construct a multi-factor investment portfolio to forecast the prices of S\&P 500 index components. The results demonstrate that our model achieves superior predictive accuracy compared to traditional LSTM models without feature selection and FDR control. Additionally, CatNet effectively captures common market-driving features, which helps informed decision-making in financial markets by enhancing the interpretability of predictions. Our study integrates of the Gaussian Mirror algorithm with LSTM models for the first time, and introduces SHAP values as a new feature importance metric for FDR control methods, marking a significant advancement in feature selection and error control for neural networks.

Suggested Citation

  • Jiaan Han & Junxiao Chen & Yanzhe Fu, 2024. "CatNet: Effective FDR Control in LSTM with Gaussian Mirrors and SHAP Feature Importance," Papers 2411.16666, arXiv.org, revised Nov 2024.
  • Handle: RePEc:arx:papers:2411.16666
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2411.16666
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Stephen Bates & Emmanuel Candès & Lucas Janson & Wenshuo Wang, 2021. "Metropolized Knockoff Sampling," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 116(535), pages 1413-1427, July.
    2. Jasin Machkour & Daniel P. Palomar & Michael Muma, 2024. "FDR-Controlled Portfolio Optimization for Sparse Financial Index Tracking," Papers 2401.15139, arXiv.org, revised Jan 2024.
    3. Torsten Hothorn & Thomas Kneib & Peter Bühlmann, 2014. "Conditional transformation models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 76(1), pages 3-27, January.
    4. Rajchert, Andrew & Keich, Uri, 2023. "Controlling the false discovery rate via competition: Is the +1 needed?," Statistics & Probability Letters, Elsevier, vol. 197(C).
    5. Chenguang Dai & Buyu Lin & Xin Xing & Jun S. Liu, 2023. "A Scale-Free Approach for False Discovery Rate Control in Generalized Linear Models," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 118(543), pages 1551-1565, July.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Panxu Yuan & Yinfei Kong & Gaorong Li, 2024. "FDR control and power analysis for high-dimensional logistic regression via StabKoff," Statistical Papers, Springer, vol. 65(5), pages 2719-2749, July.
    2. Liang, Weijuan & Zhang, Qingzhao & Ma, Shuangge, 2024. "Hierarchical false discovery rate control for high-dimensional survival analysis with interactions," Computational Statistics & Data Analysis, Elsevier, vol. 192(C).
    3. Alexander Silbersdorff & Kai Sebastian Schneider, 2019. "Distributional Regression Techniques in Socioeconomic Research on the Inequality of Health with an Application on the Relationship between Mental Health and Income," IJERPH, MDPI, vol. 16(20), pages 1-28, October.
    4. Julien Hambuckers & Marie Kratz & Antoine Usseglio-Carleve, 2023. "Efficient Estimation In Extreme Value Regression Models Of Hedge Fund Tail Risks," Working Papers hal-04090916, HAL.
    5. Yuanhua Feng & Wolfgang Karl Härdle, 2021. "Uni- and multivariate extensions of the sinh-arcsinh normal distribution applied to distributional regression," Working Papers CIE 142, Paderborn University, CIE Center for International Economics.
    6. Miguel A Delgado & Andrés García-Suaza & Pedro H C Sant’Anna, 2022. "Distribution regression in duration analysis: an application to unemployment spells [Lecture notes in statistics: Proceedings]," The Econometrics Journal, Royal Economic Society, vol. 25(3), pages 675-698.
    7. Kneib, Thomas & Silbersdorff, Alexander & Säfken, Benjamin, 2023. "Rage Against the Mean – A Review of Distributional Regression Approaches," Econometrics and Statistics, Elsevier, vol. 26(C), pages 99-123.
    8. Alexander Silbersdorff & Julia Lynch & Stephan Klasen & Thomas Kneib, 2017. "Reconsidering the Income-Illness Relationship using Distributional Regression: An Application to Germany," Courant Research Centre: Poverty, Equity and Growth - Discussion Papers 231, Courant Research Centre PEG.
    9. Alexander Sohn, 2015. "Beyond Conventional Wage Discrimination Analysis: Assessing Comprehensive Wage Distributions of Males and Females Using Structured Additive Distributional Regression," SOEPpapers on Multidisciplinary Panel Data Research 802, DIW Berlin, The German Socio-Economic Panel (SOEP).
    10. Srinivasan, Arun & Xue, Lingzhou & Zhan, Xiang, 2023. "Identification of microbial features in multivariate regression under false discovery rate control," Computational Statistics & Data Analysis, Elsevier, vol. 181(C).
    11. Alexander Henzi & Johanna F. Ziegel & Tilmann Gneiting, 2021. "Isotonic distributional regression," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 83(5), pages 963-993, November.
    12. Souhaib Ben Taieb & James W. Taylor & Rob J. Hyndman, 2017. "Coherent Probabilistic Forecasts for Hierarchical Time Series," Monash Econometrics and Business Statistics Working Papers 3/17, Monash University, Department of Econometrics and Business Statistics.
    13. Alina Schenk & Moritz Berger & Matthias Schmid, 2024. "Pseudo-value regression trees," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 30(2), pages 439-471, April.
    14. Cheng Peng & Stanislav Uryasev, 2023. "Factor Model of Mixtures," Papers 2301.13843, arXiv.org, revised Mar 2023.
    15. Silius M. Vandeskog & Thordis L. Thorarinsdottir & Ingelin Steinsland & Finn Lindgren, 2022. "Quantile based modeling of diurnal temperature range with the five‐parameter lambda distribution," Environmetrics, John Wiley & Sons, Ltd., vol. 33(4), June.
    16. Wiemann, Paul F.V. & Klein, Nadja & Kneib, Thomas, 2022. "Correcting for sample selection bias in Bayesian distributional regression models," Computational Statistics & Data Analysis, Elsevier, vol. 168(C).
    17. Nadja Klein & Torsten Hothorn & Luisa Barbanti & Thomas Kneib, 2022. "Multivariate conditional transformation models," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 49(1), pages 116-142, March.
    18. Julien Hambuckers & Marie Kratz & Antoine Usseglio-Carleve, 2023. "Efficient Estimation in Extreme Value Regression Models of Hedge Fund Tail Risks," Papers 2304.06950, arXiv.org.
    19. Anatolyev, Stanislav & Baruník, Jozef, 2019. "Forecasting dynamic return distributions based on ordered binary choice," International Journal of Forecasting, Elsevier, vol. 35(3), pages 823-835.
    20. Samantha Leorato & Franco Peracchi, 2015. "Comparing Distribution and Quantile Regression," EIEF Working Papers Series 1511, Einaudi Institute for Economics and Finance (EIEF), revised Oct 2015.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2411.16666. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.