IDEAS home Printed from https://ideas.repec.org/a/prg/jnlaop/v2015y2015i3id472p3-17.html
   My bibliography  Save this article

Data representativeness problem in credit scoring

Author

Listed:
  • Josef Ditrich

Abstract

When building models, it is common to split the whole dataset into a development and a validation sample. In some cases, using random sampling instead of stratified sampling can lead to loss of representativeness of final samples. In such cases, a model built on these data gives different or unexpected results when its performance is measured on the validation sample. In the business area, a lack of representativeness can cause interpretative problems and can have a huge financial impact when a biased model is involved in the credit granting process. The aim of this paper is to examine and understand why representativeness should be checked before the start of modelling. The paper deals with methods of identification of selection bias in time. It recommends using three tests as a common part of the data preparation process.

Suggested Citation

  • Josef Ditrich, 2015. "Data representativeness problem in credit scoring," Acta Oeconomica Pragensia, Prague University of Economics and Business, vol. 2015(3), pages 3-17.
  • Handle: RePEc:prg:jnlaop:v:2015:y:2015:i:3:id:472:p:3-17
    DOI: 10.18267/j.aop.472
    as

    Download full text from publisher

    File URL: http://aop.vse.cz/doi/10.18267/j.aop.472.html
    Download Restriction: free of charge

    File URL: http://aop.vse.cz/doi/10.18267/j.aop.472.pdf
    Download Restriction: free of charge

    File URL: https://libkey.io/10.18267/j.aop.472?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Donald W. K. Andrews, 2000. "Inconsistency of the Bootstrap when a Parameter Is on the Boundary of the Parameter Space," Econometrica, Econometric Society, vol. 68(2), pages 399-406, March.
    2. Xiao-Li Meng & Xianchao Xie, 2014. "I Got More Data, My Model is More Refined, but My Estimator is Getting Worse! Am I Just Dumb?," Econometric Reviews, Taylor & Francis Journals, vol. 33(1-4), pages 218-250, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Young-Joo Kim & Myung Hwan Seo, 2017. "Is There a Jump in the Transition?," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 35(2), pages 241-249, April.
    2. Khalaf, Lynda & Saphores, Jean-Daniel & Bilodeau, Jean-Francois, 2003. "Simulation-based exact jump tests in models with conditional heteroskedasticity," Journal of Economic Dynamics and Control, Elsevier, vol. 28(3), pages 531-553, December.
    3. Jean-Thomas Bernard & Ba Chu & Lynda Khalaf & Marcel Voia, 2019. "Non-Standard Confidence Sets for Ratios and Tipping Points with Applications to Dynamic Panel Data," Annals of Economics and Statistics, GENES, issue 134, pages 79-108.
    4. Iglesias Emma M., 2011. "Constrained k-class Estimators in the Presence of Weak Instruments," Studies in Nonlinear Dynamics & Econometrics, De Gruyter, vol. 15(4), pages 1-13, September.
    5. Greg Hannsgen, 2011. "Infinite-variance, Alpha-stable Shocks in Monetary SVAR: Final Working Paper Version," Economics Working Paper Archive wp_682, Levy Economics Institute.
    6. Chunlin Wang & Paul Marriott & Pengfei Li, 2022. "A note on the coverage behaviour of bootstrap percentile confidence intervals for constrained parameters," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 85(7), pages 809-831, October.
    7. Ekaterina Oparina & Sorawoot Srisuma, 2022. "Analyzing Subjective Well-Being Data with Misclassification," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 40(2), pages 730-743, April.
    8. Sauer, J., 2007. "Monotonicity and Curvature – A Bootstrapping Approach," Proceedings “Schriften der Gesellschaft für Wirtschafts- und Sozialwissenschaften des Landbaues e.V.”, German Association of Agricultural Economists (GEWISOLA), vol. 42, March.
    9. Boswijk, H. Peter & Cavaliere, Giuseppe & Georgiev, Iliyan & Rahbek, Anders, 2021. "Bootstrapping non-stationary stochastic volatility," Journal of Econometrics, Elsevier, vol. 224(1), pages 161-180.
    10. Ian W. McKeague & Min Qian, 2015. "An Adaptive Resampling Test for Detecting the Presence of Significant Predictors," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(512), pages 1422-1433, December.
    11. Frazis, Harley & Loewenstein, Mark A., 2003. "Estimating linear regressions with mismeasured, possibly endogenous, binary explanatory variables," Journal of Econometrics, Elsevier, vol. 117(1), pages 151-178, November.
    12. Irene Botosaru & Chris Muris & Krishna Pendakur, 2020. "Intertemporal Collective Household Models: Identification in Short Panels with Unobserved Heterogeneity in Resource Shares," Department of Economics Working Papers 2020-09, McMaster University.
    13. Centorrino, Samuele & Pérez-Urdiales, María, 2023. "Maximum likelihood estimation of stochastic frontier models with endogeneity," Journal of Econometrics, Elsevier, vol. 234(1), pages 82-105.
    14. Dufour, Jean-Marie, 2006. "Monte Carlo tests with nuisance parameters: A general approach to finite-sample inference and nonstandard asymptotics," Journal of Econometrics, Elsevier, vol. 133(2), pages 443-477, August.
    15. Guy P. Nason & Ben Powell & Duncan Elliott & Paul A. Smith, 2017. "Should we sample a time series more frequently?: decision support via multirate spectrum estimation," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 180(2), pages 353-407, February.
    16. Thomas DeLeire & Shakeeb Khan & Christopher Timmins, 2013. "Roy Model Sorting And Nonrandom Selection In The Valuation Of A Statistical Life," International Economic Review, Department of Economics, University of Pennsylvania and Osaka University Institute of Social and Economic Research Association, vol. 54(1), pages 279-306, February.
    17. Kitagawa, Toru & Montiel Olea, José Luis & Payne, Jonathan & Velez, Amilcar, 2020. "Posterior distribution of nondifferentiable functions," Journal of Econometrics, Elsevier, vol. 217(1), pages 161-175.
    18. Djebbari, Habiba & Smith, Jeffrey, 2008. "Heterogeneous impacts in PROGRESA," Journal of Econometrics, Elsevier, vol. 145(1-2), pages 64-80, July.
    19. Galichon, Alfred & Henry, Marc, 2009. "A test of non-identifying restrictions and confidence regions for partially identified parameters," Journal of Econometrics, Elsevier, vol. 152(2), pages 186-196, October.
    20. Daniel L. Millimet & Hao Li & Punarjit Roychowdhury, 2020. "Partial Identification of Economic Mobility: With an Application to the United States," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 38(4), pages 732-753, October.

    More about this item

    Keywords

    credit scoring; credit risk models; selection bias; random sampling; stratified sampling; data splitting;
    All these keywords.

    JEL classification:

    • C18 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General - - - Methodolical Issues: General
    • C80 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - General
    • C83 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - Survey Methods; Sampling Methods

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:prg:jnlaop:v:2015:y:2015:i:3:id:472:p:3-17. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Stanislav Vojir (email available below). General contact details of provider: https://edirc.repec.org/data/uevsecz.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.