IDEAS home Printed from https://ideas.repec.org/a/eee/soceps/v95y2024ics0038012124002441.html
   My bibliography  Save this article

Sample selection bias in non-traditional lending: A copula-based approach for imbalanced data

Author

Listed:
  • Calabrese, Raffaella
  • Osmetti, Silvia Angela
  • Zanin, Luca

Abstract

Credit scoring models for non-traditional lending channels, such as peer-to-peer (P2P) lending platforms, are usually estimated only on the sample of accepted applicants. This may lead to biased estimates of the risk drivers. This issue can be addressed using a reject inference technique that includes the characteristics of rejected applicants in the model. Due to the low numbers of accepted applicants and default records, credit scoring models usually face a class imbalance problem. However, previous literature on sample selection models for credit scoring does not address the class imbalance issue. To fill this gap, we extend the Generalised Extreme Value (GEV) regression model for binary data to the sample selection framework. We consider the quantile function of the GEV distribution as a link function in both the selection and outcome equations. We use the copula function to model the dependence structure between the two equations for its flexibility. This proposal is called the Sample Selection Generalised Extreme Value (SSGEV) model and it is implemented in the R package BivGEV. We apply this model to a comprehensive dataset provided by Lending Club, and we show that parameter estimates obtained only on accepted P2P applicants are biased and coherently with the literature. The SSGEV model achieves a higher predictive accuracy than those obtained using univariate approaches or a sample selection probit model. Our proposal also provides more conservative estimates of the Value-at-Risk and the Expected Shortfall.

Suggested Citation

  • Calabrese, Raffaella & Osmetti, Silvia Angela & Zanin, Luca, 2024. "Sample selection bias in non-traditional lending: A copula-based approach for imbalanced data," Socio-Economic Planning Sciences, Elsevier, vol. 95(C).
  • Handle: RePEc:eee:soceps:v:95:y:2024:i:c:s0038012124002441
    DOI: 10.1016/j.seps.2024.102045
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0038012124002441
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.seps.2024.102045?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. W. Breymann & A. Dias & P. Embrechts, 2003. "Dependence structures for multivariate high-frequency data in finance," Quantitative Finance, Taylor & Francis Journals, vol. 3(1), pages 1-14.
    2. Trivedi, Pravin K. & Zimmer, David M., 2007. "Copula Modeling: An Introduction for Practitioners," Foundations and Trends(R) in Econometrics, now publishers, vol. 1(1), pages 1-111, April.
    3. Andreeva, Galina & Calabrese, Raffaella & Osmetti, Silvia Angela, 2016. "A comparative analysis of the UK and Italian small businesses using Generalised Extreme Value models," European Journal of Operational Research, Elsevier, vol. 249(2), pages 506-516.
    4. Panagiotelis, Anastasios & Czado, Claudia & Joe, Harry & Stöber, Jakob, 2017. "Model selection for discrete regular vine copulas," Computational Statistics & Data Analysis, Elsevier, vol. 106(C), pages 138-152.
    5. Chen, Xiao & Huang, Bihong & Ye, Dezhu, 2020. "Gender gap in peer-to-peer lending: Evidence from China," Journal of Banking & Finance, Elsevier, vol. 112(C).
    6. Marshall, Andrew & Tang, Leilei & Milne, Alistair, 2010. "Variable reduction, sample selection bias and bank retail credit scoring," Journal of Empirical Finance, Elsevier, vol. 17(3), pages 501-512, June.
    7. G Verstraeten & D Van den Poel, 2005. "The impact of sample bias on consumer credit scoring performance and profitability," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 56(8), pages 981-992, August.
    8. Raffaella Calabrese & Silvia Angela Osmetti, 2013. "Modelling small and medium enterprise loan defaults as rare events: the generalized extreme value regression model," Journal of Applied Statistics, Taylor & Francis Journals, vol. 40(6), pages 1172-1188, June.
    9. Patrick Puhani, 2000. "The Heckman Correction for Sample Selection and Its Critique," Journal of Economic Surveys, Wiley Blackwell, vol. 14(1), pages 53-68, February.
    10. Xuchen Lin & Xiaolong Li & Zhong Zheng, 2017. "Evaluating borrower’s default risk in peer-to-peer lending: evidence from a lending platform in China," Applied Economics, Taylor & Francis Journals, vol. 49(35), pages 3538-3545, July.
    11. Clarke, Kevin A., 2007. "A Simple Distribution-Free Test for Nonnested Model Selection," Political Analysis, Cambridge University Press, vol. 15(3), pages 347-363, July.
    12. Raffaella Calabrese & Giampiero Marra & Silvia Angela Osmetti, 2016. "Bankruptcy prediction of small and medium enterprises using a flexible binary generalized extreme value model," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 67(4), pages 604-615, April.
    13. Zhiyong Li & Xinyi Hu & Ke Li & Fanyin Zhou & Feng Shen, 2020. "Inferring the outcomes of rejected loans: an application of semisupervised clustering," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 183(2), pages 631-654, February.
    14. Seth Freedman & Ginger Zhe Jin, 2008. "Do Social Networks Solve Information Problems for Peer-to-Peer Lending? Evidence from Prosper.com," Working Papers 08-43, NET Institute.
    15. Doriana Cucinelli & Lorenzo Gai & Federica Ielasi & Arturo Patarnello, 2021. "Preventing the deterioration of bank loan portfolio quality: a focus on unlikely-to-pay loans," The European Journal of Finance, Taylor & Francis Journals, vol. 27(7), pages 613-634, May.
    16. Heckman, James, 2013. "Sample selection bias as a specification error," Applied Econometrics, Russian Presidential Academy of National Economy and Public Administration (RANEPA), vol. 31(3), pages 129-137.
    17. Giampiero Marra & Rosalba Radice & Till Bärnighausen & Simon N. Wood & Mark E. McGovern, 2017. "A Simultaneous Equation Approach to Estimating HIV Prevalence With Nonignorable Missing Responses," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 112(518), pages 484-496, April.
    18. Banasik, John & Crook, Jonathan, 2007. "Reject inference, augmentation, and sample selection," European Journal of Operational Research, Elsevier, vol. 183(3), pages 1582-1594, December.
    19. Raffaella Calabrese & Paolo Giudici, 2015. "Estimating bank default with generalised extreme value regression models," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 66(11), pages 1783-1792, November.
    20. J Banasik & J Crook & L Thomas, 2003. "Sample selection bias in credit scoring models," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 54(8), pages 822-832, August.
    21. Dorfleitner, Gregor & Priberny, Christopher & Schuster, Stephanie & Stoiber, Johannes & Weber, Martina & de Castro, Ivan & Kammler, Julia, 2016. "Description-text related soft information in peer-to-peer lending – Evidence from two leading European platforms," Journal of Banking & Finance, Elsevier, vol. 64(C), pages 169-187.
    22. Monir El Annas & Badreddine Benyacoub & Mohamed Ouzineb, 2023. "Semi-supervised adapted HMMs for P2P credit scoring systems with reject inference," Computational Statistics, Springer, vol. 38(1), pages 149-169, March.
    23. Zanin, Luca, 2020. "Combining multiple probability predictions in the presence of class imbalance to discriminate between potential bad and good borrowers in the peer-to-peer lending market," Journal of Behavioral and Experimental Finance, Elsevier, vol. 25(C).
    24. Marra, Giampiero & Radice, Rosalba, 2013. "Estimation of a regression spline sample selection model," Computational Statistics & Data Analysis, Elsevier, vol. 61(C), pages 158-173.
    25. Carlos Serrano-Cinca & Begoña Gutiérrez-Nieto & Luz López-Palacios, 2015. "Determinants of Default in P2P Lending," PLOS ONE, Public Library of Science, vol. 10(10), pages 1-22, October.
    26. Raffaella Calabrese & Silvia Angela Osmetti & Luca Zanin, 2019. "A joint scoring model for peer‐to‐peer and traditional lending: a bivariate model with copula dependence," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 182(4), pages 1163-1188, October.
    27. Karol Wyszynski & Giampiero Marra, 2018. "Sample selection models for count data in R," Computational Statistics, Springer, vol. 33(3), pages 1385-1412, September.
    28. J Banasik & J Crook, 2010. "Reject inference in survival analysis by augmentation," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 61(3), pages 473-485, March.
    29. Marc Cowling & Weixi Liu & Raffaella Calabrese, 2022. "Has previous loan rejection scarred firms from applying for loans during Covid-19?," Small Business Economics, Springer, vol. 59(4), pages 1327-1350, December.
    30. Crook, Jonathan & Banasik, John, 2004. "Does reject inference really improve the performance of application scoring models?," Journal of Banking & Finance, Elsevier, vol. 28(4), pages 857-874, April.
    31. Marra, Giampiero & Radice, Rosalba, 2017. "Bivariate copula additive models for location, scale and shape," Computational Statistics & Data Analysis, Elsevier, vol. 112(C), pages 99-113.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Rogelio A. Mancisidor & Michael Kampffmeyer & Kjersti Aas & Robert Jenssen, 2019. "Deep Generative Models for Reject Inference in Credit Scoring," Papers 1904.11376, arXiv.org, revised Sep 2021.
    2. Ha-Thu Nguyen, 2016. "Reject inference in application scorecards: evidence from France," EconomiX Working Papers 2016-10, University of Paris Nanterre, EconomiX.
    3. Ha Thu Nguyen, 2016. "Reject inference in application scorecards: evidence from France," Working Papers hal-04141601, HAL.
    4. Wiemann, Paul F.V. & Klein, Nadja & Kneib, Thomas, 2022. "Correcting for sample selection bias in Bayesian distributional regression models," Computational Statistics & Data Analysis, Elsevier, vol. 168(C).
    5. Davaadorj, Zagdbazar & Enkhtaivan, Bolortuya & Lu, Wenling, 2024. "The role of job titles in online peer-to-peer lending: An empirical investigation on skilled borrowers," Journal of Behavioral and Experimental Finance, Elsevier, vol. 41(C).
    6. Thi Mai Luong, 2020. "Selection Effects of Lender and Borrower Choices on Risk Measurement, Management and Prudential Regulation," PhD Thesis, Finance Discipline Group, UTS Business School, University of Technology, Sydney, number 3-2020, January-A.
    7. Monir El Annas & Badreddine Benyacoub & Mohamed Ouzineb, 2023. "Semi-supervised adapted HMMs for P2P credit scoring systems with reject inference," Computational Statistics, Springer, vol. 38(1), pages 149-169, March.
    8. Y Kim & S Y Sohn, 2007. "Technology scoring model considering rejected applicants and effect of reject inference," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 58(10), pages 1341-1347, October.
    9. Hussein A. Abdou & John Pointon, 2011. "Credit Scoring, Statistical Techniques And Evaluation Criteria: A Review Of The Literature," Intelligent Systems in Accounting, Finance and Management, John Wiley & Sons, Ltd., vol. 18(2-3), pages 59-88, April.
    10. Raffaella Calabrese & Silvia Osmetti, 2014. "Modelling cross-border systemic risk in the European banking sector: a copula approach," Papers 1411.1348, arXiv.org.
    11. Calabrese, Raffaella & Degl’Innocenti, Marta & Osmetti, Silvia Angela, 2017. "The effectiveness of TARP-CPP on the US banking industry: A new copula-based approach," European Journal of Operational Research, Elsevier, vol. 256(3), pages 1029-1037.
    12. Andreeva, Galina & Calabrese, Raffaella & Osmetti, Silvia Angela, 2016. "A comparative analysis of the UK and Italian small businesses using Generalised Extreme Value models," European Journal of Operational Research, Elsevier, vol. 249(2), pages 506-516.
    13. Marra, Giampiero & Wyszynski, Karol, 2016. "Semi-parametric copula sample selection models for count responses," Computational Statistics & Data Analysis, Elsevier, vol. 104(C), pages 110-129.
    14. Calabrese, Raffaella & Osmetti, Silvia Angela, 2019. "A new approach to measure systemic risk: A bivariate copula model for dependent censored data," European Journal of Operational Research, Elsevier, vol. 279(3), pages 1053-1064.
    15. Mengnan Song & Jiasong Wang & Suisui Su, 2022. "Towards a Better Microcredit Decision," Papers 2209.07574, arXiv.org.
    16. Xueru Chen & Xiaoji Hu & Shenglin Ben, 2021. "How do reputation, structure design and FinTech ecosystem affect the net cash inflow of P2P lending platforms? Evidence from China," Electronic Commerce Research, Springer, vol. 21(4), pages 1055-1082, December.
    17. Ligang Zhou & Chao Ma, 2023. "A Comparison of Different Rules on Loans Evaluation in Peer-to-Peer Lending by Gradient Boosting Models Under Moving Windows with Two Timestamps," Computational Economics, Springer;Society for Computational Economics, vol. 62(4), pages 1481-1504, December.
    18. Jing Peng, 2023. "Identification of Causal Mechanisms from Randomized Experiments: A Framework for Endogenous Mediation Analysis," Information Systems Research, INFORMS, vol. 34(1), pages 67-84, March.
    19. Crook, Jonathan N. & Edelman, David B. & Thomas, Lyn C., 2007. "Recent developments in consumer credit risk assessment," European Journal of Operational Research, Elsevier, vol. 183(3), pages 1447-1465, December.
    20. Giampiero Marra & Rosalba Radice & Till Bärnighausen & Simon N. Wood & Mark E. McGovern, 2017. "A Simultaneous Equation Approach to Estimating HIV Prevalence With Nonignorable Missing Responses," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 112(518), pages 484-496, April.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:soceps:v:95:y:2024:i:c:s0038012124002441. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/seps .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.