IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2501.00034.html
   My bibliography  Save this paper

Time Series Feature Redundancy Paradox: An Empirical Study Based on Mortgage Default Prediction

Author

Listed:
  • Chengyue Huang
  • Yahe Yang

Abstract

With the widespread application of machine learning in financial risk management, conventional wisdom suggests that longer training periods and more feature variables contribute to improved model performance. This paper, focusing on mortgage default prediction, empirically discovers a phenomenon that contradicts traditional knowledge: in time series prediction, increased training data timespan and additional non-critical features actually lead to significant deterioration in prediction effectiveness. Using Fannie Mae's mortgage data, the study compares predictive performance across different time window lengths (2012-2022) and feature combinations, revealing that shorter time windows (such as single-year periods) paired with carefully selected key features yield superior prediction results. The experimental results indicate that extended time spans may introduce noise from historical data and outdated market patterns, while excessive non-critical features interfere with the model's learning of core default factors. This research not only challenges the traditional "more is better" approach in data modeling but also provides new insights and practical guidance for feature selection and time window optimization in financial risk prediction.

Suggested Citation

  • Chengyue Huang & Yahe Yang, 2024. "Time Series Feature Redundancy Paradox: An Empirical Study Based on Mortgage Default Prediction," Papers 2501.00034, arXiv.org.
  • Handle: RePEc:arx:papers:2501.00034
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2501.00034
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Chen, Shunqin & Guo, Zhengfeng & Zhao, Xinlei, 2021. "Predicting mortgage early delinquency with machine learning methods," European Journal of Operational Research, Elsevier, vol. 290(1), pages 358-372.
    2. Fitzpatrick, Trevor & Mues, Christophe, 2016. "An empirical comparison of classification algorithms for mortgage default prediction: evidence from a distressed mortgage market," European Journal of Operational Research, Elsevier, vol. 249(2), pages 427-439.
    3. Salah Bouktif & Ali Fiaz & Ali Ouni & Mohamed Adel Serhani, 2018. "Optimal Deep Learning LSTM Model for Electric Load Forecasting using Feature Selection and Genetic Algorithm: Comparison with Machine Learning Approaches †," Energies, MDPI, vol. 11(7), pages 1-20, June.
    4. Florian Huber & Gary Koop & Luca Onorante, 2021. "Inducing Sparsity and Shrinkage in Time-Varying Parameter Models," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 39(3), pages 669-683, July.
    5. Boik, Robert J., 2013. "Model-based principal components of correlation matrices," Journal of Multivariate Analysis, Elsevier, vol. 116(C), pages 310-331.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Kriebel, Johannes & Stitz, Lennart, 2022. "Credit default prediction from user-generated text in peer-to-peer lending using deep learning," European Journal of Operational Research, Elsevier, vol. 302(1), pages 309-323.
    2. Monika Zimmermann & Florian Ziel, 2024. "Efficient mid-term forecasting of hourly electricity load using generalized additive models," Papers 2405.17070, arXiv.org, revised Feb 2025.
    3. Donglin Wang & Don Hong & Qiang Wu, 2023. "Prediction of Loan Rate for Mortgage Data: Deep Learning Versus Robust Regression," Computational Economics, Springer;Society for Computational Economics, vol. 61(3), pages 1137-1150, March.
    4. Martin Feldkircher & Florian Huber & Gary Koop & Michael Pfarrhofer, 2022. "APPROXIMATE BAYESIAN INFERENCE AND FORECASTING IN HUGE‐DIMENSIONAL MULTICOUNTRY VARs," International Economic Review, Department of Economics, University of Pennsylvania and Osaka University Institute of Social and Economic Research Association, vol. 63(4), pages 1625-1658, November.
    5. Bingjie Jin & Guihua Zeng & Zhilin Lu & Hongqiao Peng & Shuxin Luo & Xinhe Yang & Haojun Zhu & Mingbo Liu, 2022. "Hybrid LSTM–BPNN-to-BPNN Model Considering Multi-Source Information for Forecasting Medium- and Long-Term Electricity Peak Load," Energies, MDPI, vol. 15(20), pages 1-20, October.
    6. Pfarrhofer, Michael, 2022. "Modeling tail risks of inflation using unobserved component quantile regressions," Journal of Economic Dynamics and Control, Elsevier, vol. 143(C).
    7. Suriyan Jomthanachai & Wai Peng Wong & Khai Wah Khaw, 2024. "An Application of Machine Learning to Logistics Performance Prediction: An Economics Attribute-Based of Collective Instance," Computational Economics, Springer;Society for Computational Economics, vol. 63(2), pages 741-792, February.
    8. Gianluca Anese & Marco Corazza & Michele Costola & Loriana Pelizzon, 2023. "Impact of public news sentiment on stock market index return and volatility," Computational Management Science, Springer, vol. 20(1), pages 1-36, December.
    9. Mst. Shapna Akter & Hossain Shahriar & Reaz Chowdhury & M. R. C. Mahdy, 2022. "Forecasting the Risk Factor of Frontier Markets: A Novel Stacking Ensemble of Neural Network Approach," Future Internet, MDPI, vol. 14(9), pages 1-23, August.
    10. Shree Krishna Acharya & Young-Min Wi & Jaehee Lee, 2019. "Short-Term Load Forecasting for a Single Household Based on Convolution Neural Networks Using Data Augmentation," Energies, MDPI, vol. 12(18), pages 1-19, September.
    11. Aneta Dzik-Walczak & Mateusz Heba, 2019. "A comparison of credit scoring techniques in Peer-to-Peer lending," Working Papers 2019-16, Faculty of Economic Sciences, University of Warsaw.
    12. Gupta, Mukul & Kumar, Pradeep, 2020. "Recommendation generation using personalized weight of meta-paths in heterogeneous information networks," European Journal of Operational Research, Elsevier, vol. 284(2), pages 660-674.
    13. Balcilar, Mehmet & Berisha, Edmond & Gupta, Rangan & Pierdzioch, Christian, 2021. "Time-varying evidence of predictability of financial stress in the United States over a century: The role of inequality," Structural Change and Economic Dynamics, Elsevier, vol. 57(C), pages 87-92.
    14. Gael M. Martin & David T. Frazier & Ruben Loaiza-Maya & Florian Huber & Gary Koop & John Maheu & Didier Nibbering & Anastasios Panagiotelis, 2023. "Bayesian Forecasting in the 21st Century: A Modern Review," Monash Econometrics and Business Statistics Working Papers 1/23, Monash University, Department of Econometrics and Business Statistics.
    15. Florian Huber & Luca Rossini, 2020. "Inference in Bayesian Additive Vector Autoregressive Tree Models," Papers 2006.16333, arXiv.org, revised Mar 2021.
    16. Hao Wang & Chen Peng & Bolin Liao & Xinwei Cao & Shuai Li, 2023. "Wind Power Forecasting Based on WaveNet and Multitask Learning," Sustainability, MDPI, vol. 15(14), pages 1-22, July.
    17. Niko Hauzenberger & Florian Huber & Gary Koop & James Mitchell, 2020. "Bayesian Modelling of TVP-VARs Using Regression Trees," Working Papers 2308, University of Strathclyde Business School, Department of Economics, revised Aug 2023.
    18. Mahsa Tavakoli & Rohitash Chandra & Fengrui Tian & Cristi'an Bravo, 2023. "Multi-Modal Deep Learning for Credit Rating Prediction Using Text and Numerical Data Streams," Papers 2304.10740, arXiv.org, revised Nov 2024.
    19. Michael Bucker & Gero Szepannek & Alicja Gosiewska & Przemyslaw Biecek, 2020. "Transparency, Auditability and eXplainability of Machine Learning Models in Credit Scoring," Papers 2009.13384, arXiv.org.
    20. Meng, Qinglong & Wei, Ying'an & Fan, Jingjing & Li, Yanbo & Zhao, Fan & Lei, Yu & Sun, Hang & Jiang, Le & Yu, Lingli, 2024. "Peak regulation strategies for ground source heat pump demand response of based on load forecasting: A case study of rural building in China," Renewable Energy, Elsevier, vol. 224(C).

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2501.00034. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.