IDEAS home Printed from https://ideas.repec.org/a/spr/fininn/v11y2025i1d10.1186_s40854-024-00689-1.html
   My bibliography  Save this article

A dimension reduction assisted credit scoring method for big data with categorical features

Author

Listed:
  • Tatjana Miljkovic

    (Miami University, Department of Statistics)

  • Pei Wang

    (Bowling Green State University)

Abstract

In the past decade, financial institutions have invested significant efforts in the development of accurate analytical credit scoring models. The evidence suggests that even small improvements in the accuracy of existing credit-scoring models may optimize profits while effectively managing risk exposure. Despite continuing efforts, the majority of existing credit scoring models still include some judgment-based assumptions that are sometimes supported by the significant findings of previous studies but are not validated using the institution’s internal data. We argue that current studies related to the development of credit scoring models have largely ignored recent developments in statistical methods for sufficient dimension reduction. To contribute to the field of financial innovation, this study proposes a Dimension Reduction Assisted Credit Scoring (DRA-CS) method via distance covariance-based sufficient dimension reduction (DCOV-SDR) in Majorization-Minimization (MM) algorithm. First, in the presence of a large number of variables, the DRA-CS method results in greater dimension reduction and better prediction accuracy than the other methods used for dimension reduction. Second, when the DRA-CS method is employed with logistic regression, it outperforms existing methods based on different variable selection techniques. This study argues that the DRA-CS method should be used by financial institutions as a financial innovation tool to analyze high-dimensional customer datasets and improve the accuracy of existing credit scoring methods.

Suggested Citation

  • Tatjana Miljkovic & Pei Wang, 2025. "A dimension reduction assisted credit scoring method for big data with categorical features," Financial Innovation, Springer;Southwestern University of Finance and Economics, vol. 11(1), pages 1-30, December.
  • Handle: RePEc:spr:fininn:v:11:y:2025:i:1:d:10.1186_s40854-024-00689-1
    DOI: 10.1186/s40854-024-00689-1
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1186/s40854-024-00689-1
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1186/s40854-024-00689-1?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Zou, Hui, 2006. "The Adaptive Lasso and Its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 1418-1429, December.
    2. Y Liu & M Schumann, 2005. "Data mining feature selection for credit scoring models," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 56(9), pages 1099-1108, September.
    3. Gunnarsson, Björn Rafn & vanden Broucke, Seppe & Baesens, Bart & Óskarsdóttir, María & Lemahieu, Wilfried, 2021. "Deep learning for credit scoring: Do or don’t?," European Journal of Operational Research, Elsevier, vol. 295(1), pages 292-305.
    4. Luigi Guiso & Paola Sapienza & Luigi Zingales, 2013. "The Determinants of Attitudes toward Strategic Default on Mortgages," Journal of Finance, American Finance Association, vol. 68(4), pages 1473-1515, August.
    5. Wang, Pei & Yin, Xiangrong & Yuan, Qingcong & Kryscio, Richard, 2021. "Feature filter for estimating central mean subspace and its sparse solution," Computational Statistics & Data Analysis, Elsevier, vol. 163(C).
    6. Viaene, Stijn & Dedene, Guido, 2005. "Cost-sensitive learning and decision making revisited," European Journal of Operational Research, Elsevier, vol. 166(1), pages 212-220, October.
    7. Pranith Kumar Roy & Krishnendu Shaw, 2021. "A multicriteria credit scoring model for SMEs using hybrid BWM and TOPSIS," Financial Innovation, Springer;Southwestern University of Finance and Economics, vol. 7(1), pages 1-27, December.
    8. D. J. Hand & W. E. Henley, 1997. "Statistical Classification Methods in Consumer Credit Scoring: a Review," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 160(3), pages 523-541, September.
    9. Khandani, Amir E. & Kim, Adlar J. & Lo, Andrew W., 2010. "Consumer credit-risk models via machine-learning algorithms," Journal of Banking & Finance, Elsevier, vol. 34(11), pages 2767-2787, November.
    10. Dumitrescu, Elena & Hué, Sullivan & Hurlin, Christophe & Tokpavi, Sessi, 2022. "Machine learning for credit scoring: Improving logistic regression with non-linear decision-tree effects," European Journal of Operational Research, Elsevier, vol. 297(3), pages 1178-1192.
    11. Juan Laborda & Seyong Ryoo, 2021. "Feature Selection in a Credit Scoring Model," Mathematics, MDPI, vol. 9(7), pages 1-22, March.
    12. Tatjana Miljkovic & Bettina Grün, 2021. "Using Model Averaging to Determine Suitable Risk Measure Estimates," North American Actuarial Journal, Taylor & Francis Journals, vol. 25(4), pages 562-579, November.
    13. Qin Wang & Yuan Xue, 2023. "A structured covariance ensemble for sufficient dimension reduction," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 17(3), pages 777-800, September.
    14. Sheng, Wenhui & Yin, Xiangrong, 2013. "Direction estimation in single-index models via distance covariance," Journal of Multivariate Analysis, Elsevier, vol. 122(C), pages 148-161.
    15. Hyunwoo Woo & So Young Sohn, 2022. "Publisher Correction: A credit scoring model based on the Myers–Briggs type indicator in online peer-to-peer lending," Financial Innovation, Springer;Southwestern University of Finance and Economics, vol. 8(1), pages 1-1, December.
    16. Wu, Runxiong & Chen, Xin, 2021. "MM algorithms for distance covariance based sufficient dimension reduction and sufficient variable selection," Computational Statistics & Data Analysis, Elsevier, vol. 155(C).
    17. Trivedi, Shrawan Kumar, 2020. "A study on credit scoring modeling with different feature selection and machine learning approaches," Technology in Society, Elsevier, vol. 63(C).
    18. Wang, Qin & Yin, Xiangrong, 2008. "A nonlinear multi-dimensional variable selection method for high dimensional data: Sparse MAVE," Computational Statistics & Data Analysis, Elsevier, vol. 52(9), pages 4512-4520, May.
    19. Hui Zou & Trevor Hastie, 2005. "Addendum: Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(5), pages 768-768, November.
    20. Riza Emekter & Yanbin Tu & Benjamas Jirasakuldech & Min Lu, 2015. "Evaluating credit risk and loan performance in online Peer-to-Peer (P2P) lending," Applied Economics, Taylor & Francis Journals, vol. 47(1), pages 54-70, January.
    21. Hyunwoo Woo & So Young Sohn, 2022. "A credit scoring model based on the Myers–Briggs type indicator in online peer-to-peer lending," Financial Innovation, Springer;Southwestern University of Finance and Economics, vol. 8(1), pages 1-19, December.
    22. Hui Zou & Trevor Hastie, 2005. "Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(2), pages 301-320, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Peter Martey Addo & Dominique Guegan & Bertrand Hassani, 2018. "Credit Risk Analysis Using Machine and Deep Learning Models," Risks, MDPI, vol. 6(2), pages 1-20, April.
    2. Yunquan Song & Zitong Li & Minglu Fang, 2022. "Robust Variable Selection Based on Penalized Composite Quantile Regression for High-Dimensional Single-Index Models," Mathematics, MDPI, vol. 10(12), pages 1-17, June.
    3. Elena Ivona DUMITRESCU & Sullivan HUE & Christophe HURLIN & Sessi TOKPAVI, 2020. "Machine Learning or Econometrics for Credit Scoring: Let’s Get the Best of Both Worlds," LEO Working Papers / DR LEO 2839, Orleans Economics Laboratory / Laboratoire d'Economie d'Orleans (LEO), University of Orleans.
    4. Liu, Yi & Yang, Menglong & Wang, Yudong & Li, Yongshan & Xiong, Tiancheng & Li, Anzhe, 2022. "Applying machine learning algorithms to predict default probability in the online credit market: Evidence from China," International Review of Financial Analysis, Elsevier, vol. 79(C).
    5. Tutz, Gerhard & Pößnecker, Wolfgang & Uhlmann, Lorenz, 2015. "Variable selection in general multinomial logit models," Computational Statistics & Data Analysis, Elsevier, vol. 82(C), pages 207-222.
    6. Margherita Giuzio, 2017. "Genetic algorithm versus classical methods in sparse index tracking," Decisions in Economics and Finance, Springer;Associazione per la Matematica, vol. 40(1), pages 243-256, November.
    7. Mkhadri, Abdallah & Ouhourane, Mohamed, 2013. "An extended variable inclusion and shrinkage algorithm for correlated variables," Computational Statistics & Data Analysis, Elsevier, vol. 57(1), pages 631-644.
    8. Lucian Belascu & Alexandra Horobet & Georgiana Vrinceanu & Consuela Popescu, 2021. "Performance Dissimilarities in European Union Manufacturing: The Effect of Ownership and Technological Intensity," Sustainability, MDPI, vol. 13(18), pages 1-19, September.
    9. Yize Zhao & Matthias Chung & Brent A. Johnson & Carlos S. Moreno & Qi Long, 2016. "Hierarchical Feature Selection Incorporating Known and Novel Biological Information: Identifying Genomic Features Related to Prostate Cancer Recurrence," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(516), pages 1427-1439, October.
    10. Chuliá, Helena & Garrón, Ignacio & Uribe, Jorge M., 2024. "Daily growth at risk: Financial or real drivers? The answer is not always the same," International Journal of Forecasting, Elsevier, vol. 40(2), pages 762-776.
    11. Changrong Yan & Dixin Zhang, 2013. "Sparse dimension reduction for survival data," Computational Statistics, Springer, vol. 28(4), pages 1835-1852, August.
    12. Christopher J Greenwood & George J Youssef & Primrose Letcher & Jacqui A Macdonald & Lauryn J Hagg & Ann Sanson & Jenn Mcintosh & Delyse M Hutchinson & John W Toumbourou & Matthew Fuller-Tyszkiewicz &, 2020. "A comparison of penalised regression methods for informing the selection of predictive markers," PLOS ONE, Public Library of Science, vol. 15(11), pages 1-14, November.
    13. Norman R. Swanson & Weiqi Xiong, 2018. "Big data analytics in economics: What have we learned so far, and where should we go from here?," Canadian Journal of Economics/Revue canadienne d'économique, John Wiley & Sons, vol. 51(3), pages 695-746, August.
    14. Gareth M. James & Peter Radchenko & Jinchi Lv, 2009. "DASSO: connections between the Dantzig selector and lasso," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 71(1), pages 127-142, January.
    15. Mostafa Rezaei & Ivor Cribben & Michele Samorani, 2021. "A clustering-based feature selection method for automatically generated relational attributes," Annals of Operations Research, Springer, vol. 303(1), pages 233-263, August.
    16. Umberto Amato & Anestis Antoniadis & Italia De Feis & Irene Gijbels, 2021. "Penalised robust estimators for sparse and high-dimensional linear models," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 30(1), pages 1-48, March.
    17. Camila Epprecht & Dominique Guegan & Álvaro Veiga & Joel Correa da Rosa, 2017. "Variable selection and forecasting via automated methods for linear models: LASSO/adaLASSO and Autometrics," Post-Print halshs-00917797, HAL.
    18. Wang Zhu & Wang C.Y., 2010. "Buckley-James Boosting for Survival Analysis with High-Dimensional Biomarker Data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 9(1), pages 1-33, June.
    19. Wang, Christina Dan & Chen, Zhao & Lian, Yimin & Chen, Min, 2022. "Asset selection based on high frequency Sharpe ratio," Journal of Econometrics, Elsevier, vol. 227(1), pages 168-188.
    20. Štefan Lyócsa & Petra Vašaničová & Branka Hadji Misheva & Marko Dávid Vateha, 2022. "Default or profit scoring credit systems? Evidence from European and US peer-to-peer lending markets," Financial Innovation, Springer;Southwestern University of Finance and Economics, vol. 8(1), pages 1-21, December.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:fininn:v:11:y:2025:i:1:d:10.1186_s40854-024-00689-1. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.