IDEAS home Printed from https://ideas.repec.org/p/rug/rugwps/06-360.html
   My bibliography  Save this paper

Using Predicted Outcome Stratified Sampling to Reduce the Variability in Predictive Performance of a One-Shot Train-and-Test Split for Individual Customer Predictions

Author

Listed:
  • G. VERSTRAETEN
  • D. VAN DEN POEL

Abstract

Since it is generally recognized that models evaluated on the data that was used for constructing them are overly optimistic, in predictive modeling practice, the assessment of a model’s predictive performance frequently relies on a one-shot train-and-test split between observations used for estimating a model, and those used for validating it. Previous research has indicated the usefulness of stratified sampling for reducing the variation in predictive performance in a linear regression application. In this paper, we validate the previous findings on six real-life European predictive modeling applications for marketing and credit scoring using a dichotomous outcome variable. We find confirmation for the reduction in variability using a procedure we describe as predicted outcome stratified sampling in a logistic regression model, and we find that the gain in variation reduction is – also in large data sets – almost always significant, and in certain applications markedly high.

Suggested Citation

  • G. Verstraeten & D. Van Den Poel, 2006. "Using Predicted Outcome Stratified Sampling to Reduce the Variability in Predictive Performance of a One-Shot Train-and-Test Split for Individual Customer Predictions," Working Papers of Faculty of Economics and Business Administration, Ghent University, Belgium 06/360, Ghent University, Faculty of Economics and Business Administration.
  • Handle: RePEc:rug:rugwps:06/360
    as

    Download full text from publisher

    File URL: http://wps-feb.ugent.be/Papers/wp_06_360.pdf
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Dasgupta, Chanda Ghose & Dispensa, Gary S. & Ghose, Sanjoy, 1994. "Comparing the predictive performance of a neural network model with some traditional market response models," International Journal of Forecasting, Elsevier, vol. 10(2), pages 235-244, September.
    2. Young-Hoon Park & Peter S. Fader, 2004. "Modeling Browsing Behavior at Multiple Websites," Marketing Science, INFORMS, vol. 23(3), pages 280-303, May.
    3. B Baesens & T Van Gestel & S Viaene & M Stepanova & J Suykens & J Vanthienen, 2003. "Benchmarking state-of-the-art classification algorithms for credit scoring," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 54(6), pages 627-635, June.
    4. Alan L. Montgomery & Shibo Li & Kannan Srinivasan & John C. Liechty, 2004. "Modeling Online Browsing and Path Analysis Using Clickstream Data," Marketing Science, INFORMS, vol. 23(4), pages 579-595, November.
    5. Thomas, Lyn C., 2000. "A survey of credit and behavioural scoring: forecasting financial risk of lending to consumers," International Journal of Forecasting, Elsevier, vol. 16(2), pages 149-172.
    6. Joffre Swait & Rick L. Andrews, 2003. "Enriching Scanner Panel Models with Choice Experiments," Marketing Science, INFORMS, vol. 22(4), pages 442-460, September.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. J. Burez & D. Van Den Poel, 2008. "Handling class imbalance in customer churn prediction," Working Papers of Faculty of Economics and Business Administration, Ghent University, Belgium 08/517, Ghent University, Faculty of Economics and Business Administration.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. J. Burez & D. Van Den Poel, 2008. "Handling class imbalance in customer churn prediction," Working Papers of Faculty of Economics and Business Administration, Ghent University, Belgium 08/517, Ghent University, Faculty of Economics and Business Administration.
    2. Fraisse, Henri & Laporte, Matthias, 2022. "Return on investment on artificial intelligence: The case of bank capital requirement," Journal of Banking & Finance, Elsevier, vol. 138(C).
    3. Crone, Sven F. & Finlay, Steven, 2012. "Instance sampling in credit scoring: An empirical study of sample size and balancing," International Journal of Forecasting, Elsevier, vol. 28(1), pages 224-238.
    4. Dinh, Thi Huyen Thanh & Kleimeier, Stefanie, 2007. "A credit scoring model for Vietnam's retail banking market," International Review of Financial Analysis, Elsevier, vol. 16(5), pages 471-495.
    5. Nadia Ayed & Khemaies Bougatef, 2024. "Performance Assessment of Logistic Regression (LR), Artificial Neural Network (ANN), Fuzzy Inference System (FIS) and Adaptive Neuro-Fuzzy System (ANFIS) in Predicting Default Probability: The Case of," Computational Economics, Springer;Society for Computational Economics, vol. 64(3), pages 1803-1835, September.
    6. Lizhen Xu & Jason A. Duan & Andrew Whinston, 2014. "Path to Purchase: A Mutually Exciting Point Process Model for Online Advertising and Conversion," Management Science, INFORMS, vol. 60(6), pages 1392-1412, June.
    7. Carlos Serrano-Cinca & Begoña Gutiérrez-Nieto & Nydia M. Reyes, 2013. "A Social Approach to Microfinance Credit Scoring," Working Papers CEB 13-013, ULB -- Universite Libre de Bruxelles.
    8. R Fildes & K Nikolopoulos & S F Crone & A A Syntetos, 2008. "Forecasting and operational research: a review," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 59(9), pages 1150-1172, September.
    9. Dumitrescu, Elena & Hué, Sullivan & Hurlin, Christophe & Tokpavi, Sessi, 2022. "Machine learning for credit scoring: Improving logistic regression with non-linear decision-tree effects," European Journal of Operational Research, Elsevier, vol. 297(3), pages 1178-1192.
    10. Hussein A. Abdou & John Pointon, 2011. "Credit Scoring, Statistical Techniques And Evaluation Criteria: A Review Of The Literature," Intelligent Systems in Accounting, Finance and Management, John Wiley & Sons, Ltd., vol. 18(2-3), pages 59-88, April.
    11. Rais Ahmad Itoo & A. Selvarasu & José António Filipe, 2015. "Loan Products and Credit Scoring by Commercial Banks (India)," International Journal of Finance, Insurance and Risk Management, International Journal of Finance, Insurance and Risk Management, vol. 5(1), pages 851-851.
    12. Sahar Karimi, 2021. "Cross-visiting Behaviour of Online Consumers Across Retailers’ and Comparison Sites, a Macro-Study," Information Systems Frontiers, Springer, vol. 23(3), pages 531-542, June.
    13. Henri Fraisse & Matthias Laporte, 2021. "Return on Investment on AI: The Case of Capital Requirement," Working papers 809, Banque de France.
    14. Elena Ivona DUMITRESCU & Sullivan HUE & Christophe HURLIN & Sessi TOKPAVI, 2020. "Machine Learning or Econometrics for Credit Scoring: Let’s Get the Best of Both Worlds," LEO Working Papers / DR LEO 2839, Orleans Economics Laboratory / Laboratoire d'Economie d'Orleans (LEO), University of Orleans.
    15. Linhui Wang & Jianping Zhu & Chenlu Zheng & Zhiyuan Zhang, 2024. "Incorporating Digital Footprints into Credit-Scoring Models through Model Averaging," Mathematics, MDPI, vol. 12(18), pages 1-15, September.
    16. Ahmed Almustfa Hussin Adam Khatir & Marco Bee, 2022. "Machine Learning Models and Data-Balancing Techniques for Credit Scoring: What Is the Best Combination?," Risks, MDPI, vol. 10(9), pages 1-22, August.
    17. Huei-Wen Teng & Michael Lee, 2019. "Estimation Procedures of Using Five Alternative Machine Learning Methods for Predicting Credit Card Default," Review of Pacific Basin Financial Markets and Policies (RPBFMP), World Scientific Publishing Co. Pte. Ltd., vol. 22(03), pages 1-27, September.
    18. Michael Doumpos & Constantin Zopounidis, 2007. "Model combination for credit risk assessment: A stacked generalization approach," Annals of Operations Research, Springer, vol. 151(1), pages 289-306, April.
    19. Patrick Mair & Marcus Hudec, 2009. "Multivariate Weibull mixtures with proportional hazard restrictions for dwell‐time‐based session clustering with incomplete data," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 58(5), pages 619-639, December.
    20. Andreea Costea, 2017. "A Quantitative Approach to Credit Risk Management in the Underwriting Process for the Retail Portfolio," Romanian Economic Journal, Department of International Business and Economics from the Academy of Economic Studies Bucharest, vol. 20(63), pages 157-186, March.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:rug:rugwps:06/360. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Nathalie Verhaeghe (email available below). General contact details of provider: https://edirc.repec.org/data/ferugbe.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.