IDEAS home Printed from https://ideas.repec.org/a/pal/jorsoc/v54y2003i8d10.1057_palgrave.jors.2601578.html
   My bibliography  Save this article

Sample selection bias in credit scoring models

Author

Listed:
  • J Banasik

    (University of Edinburgh)

  • J Crook

    (University of Edinburgh)

  • L Thomas

    (University of Southampton)

Abstract

One of the aims of credit scoring models is to predict the probability of repayment of any applicant and yet such models are usually parameterised using a sample of accepted applicants only. This may lead to biased estimates of the parameters. In this paper we examine two issues. First, we compare the classification accuracy of a model based only on accepted applicants, relative to one based on a sample of all applicants. We find only a minimal difference, given the cutoff scores for the old model used by the data supplier. Using a simulated model we examine the predictive performance of models estimated from bands of applicants, ranked by predicted creditworthiness. We find that the lower the risk band of the training sample, the less accurate the predictions for all applicants. We also find that the lower the risk band of the training sample, the greater the overestimate of the true performance of the model, when tested on a sample of applicants within the same risk band — as a financial institution would do. The overestimation may be very large. Second, we examine the predictive accuracy of a bivariate probit model with selection (BVP). This parameterises the accept–reject model allowing for (unknown) omitted variables to be correlated with those of the original good–bad model. The BVP model may improve accuracy if the loan officer has overridden a scoring rule. We find that a small improvement when using the BVP model is sometimes possible.

Suggested Citation

  • J Banasik & J Crook & L Thomas, 2003. "Sample selection bias in credit scoring models," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 54(8), pages 822-832, August.
  • Handle: RePEc:pal:jorsoc:v:54:y:2003:i:8:d:10.1057_palgrave.jors.2601578
    DOI: 10.1057/palgrave.jors.2601578
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1057/palgrave.jors.2601578
    File Function: Abstract
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1057/palgrave.jors.2601578?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Meng, Chun-Lo & Schmidt, Peter, 1985. "On the Cost of Partial Observability in the Bivariate Probit Model," International Economic Review, Department of Economics, University of Pennsylvania and Osaka University Institute of Social and Economic Research Association, vol. 26(1), pages 71-85, February.
    2. James J. Heckman, 1976. "The Common Structure of Statistical Models of Truncation, Sample Selection and Limited Dependent Variables and a Simple Estimator for Such Models," NBER Chapters, in: Annals of Economic and Social Measurement, Volume 5, number 4, pages 475-492, National Bureau of Economic Research, Inc.
    3. Greene, William, 1998. "Sample selection in credit-scoring models1," Japan and the World Economy, Elsevier, vol. 10(3), pages 299-316, July.
    4. Van de Ven, Wynand P. M. M. & Van Praag, Bernard M. S., 1981. "The demand for deductibles in private health insurance : A probit model with sample selection," Journal of Econometrics, Elsevier, vol. 17(2), pages 229-252, November.
    5. Heckman, James, 2013. "Sample selection bias as a specification error," Applied Econometrics, Russian Presidential Academy of National Economy and Public Administration (RANEPA), vol. 31(3), pages 129-137.
    6. William H. Greene, 1992. "A Statistical Model for Credit Scoring," Working Papers 92-29, New York University, Leonard N. Stern School of Business, Department of Economics.
    7. Boyes, William J. & Hoffman, Dennis L. & Low, Stuart A., 1989. "An econometric analysis of the bank credit scoring problem," Journal of Econometrics, Elsevier, vol. 40(1), pages 3-14, January.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Maksym Obrizan, 2011. "A Bayesian Model of Sample Selection with a Discrete Outcome Variable: Detecting Depression in Older Adults," Discussion Papers 41, Kyiv School of Economics.
    2. Filiz Garip, 2012. "An Integrated Analysis of Migration and Remittances: Modeling Migration as a Mechanism for Selection," Population Research and Policy Review, Springer;Southern Demographic Association (SDA), vol. 31(5), pages 637-663, October.
    3. González-Cabán, Armando & Loomis, John B. & Rodriguez, Andrea & Hesseln, Hayley, 2007. "A comparison of CVM survey response rates, protests and willingness-to-pay of Native Americans and general population for fuels reduction policies," Journal of Forest Economics, Elsevier, vol. 13(1), pages 49-71, May.
    4. Haan, Michael, 2005. "Summary Of: Are Immigrants Buying to Get In?: The Role of Ethnic Clustering on the Homeownership Propensities of 12 Toronto Immigrant Groups, 1996-2001," Analytical Studies Branch Research Paper Series 2005253e, Statistics Canada, Analytical Studies Branch.
    5. Glenn W. Harrison & Morten I. Lau & Hong Il Yoo, 2020. "Risk Attitudes, Sample Selection, and Attrition in a Longitudinal Field Experiment," The Review of Economics and Statistics, MIT Press, vol. 102(3), pages 552-568, July.
    6. Ha-Thu Nguyen, 2016. "Reject inference in application scorecards: evidence from France," EconomiX Working Papers 2016-10, University of Paris Nanterre, EconomiX.
    7. Y Kim & S Y Sohn, 2007. "Technology scoring model considering rejected applicants and effect of reject inference," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 58(10), pages 1341-1347, October.
    8. Ha Thu Nguyen, 2016. "Reject inference in application scorecards: evidence from France," Working Papers hal-04141601, HAL.
    9. Maksym, Obrizan, 2010. "A Bayesian Model of Sample Selection with a Discrete Outcome Variable," MPRA Paper 28577, University Library of Munich, Germany.
    10. Adelchi Azzalini & Hyoung-Moon Kim & Hea-Jung Kim, 2019. "Sample selection models for discrete and other non-Gaussian response variables," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 28(1), pages 27-56, March.
    11. Eun-Ju Lee & David Eastwood & Jinkook Lee, 2004. "A Sample Selection Model of Consumer Adoption of Computer Banking," Journal of Financial Services Research, Springer;Western Finance Association, vol. 26(3), pages 263-275, December.
    12. Rocholl, Jörg & Puri, Manju & Steffen, Sascha, 2011. "On the importance of prior relationships in bank loans to retail customers," Working Paper Series 1395, European Central Bank.
    13. Wolter, Marcus & Rösch, Daniel, 2014. "Cure events in default prediction," European Journal of Operational Research, Elsevier, vol. 238(3), pages 846-857.
    14. Rogelio A. Mancisidor & Michael Kampffmeyer & Kjersti Aas & Robert Jenssen, 2019. "Deep Generative Models for Reject Inference in Credit Scoring," Papers 1904.11376, arXiv.org, revised Sep 2021.
    15. Schwiebert, Jörg, 2012. "Semiparametric Estimation of a Binary Choice Model with Sample Selection," Hannover Economic Papers (HEP) dp-505, Leibniz Universität Hannover, Wirtschaftswissenschaftliche Fakultät.
    16. William Greene, 2006. "A General Approach to Incorporating Selectivity in a Model," Working Papers 06-10, New York University, Leonard N. Stern School of Business, Department of Economics.
    17. Watanabe, Hajime & Maruyama, Takuya, 2024. "A Bayesian sample selection model with a binary outcome for handling residential self-selection in individual car ownership," Journal of choice modelling, Elsevier, vol. 51(C).
    18. Céline Bignebat & Fabian Gouret, 2008. "Determinants and consequences of soft budget constraints. An empirical analysis using enterprise-level data in transition countries," Post-Print halshs-00308719, HAL.
    19. Creamer, Selmin F. & Blatner, Keith A. & Butler, Brett J., 2012. "Certification of family forests: What influences owners’ awareness and participation?," Journal of Forest Economics, Elsevier, vol. 18(2), pages 131-144.
    20. Fabrice Le Guel & Thierry Pénard & Raphaël Suire, 2005. "Adoption et usage marchand de l'Internet : une étude économétrique sur données bretonnes," Economie & Prévision, La Documentation Française, vol. 167(1), pages 67-84.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:pal:jorsoc:v:54:y:2003:i:8:d:10.1057_palgrave.jors.2601578. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.palgrave-journals.com/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.