IDEAS home Printed from https://ideas.repec.org/a/bit/bsrysr/v12y2021i1p228-242n9.html
   My bibliography  Save this article

The Proportion for Splitting Data into Training and Test Set for the Bootstrap in Classification Problems

Author

Listed:
  • Vrigazova Borislava

    (Sofia University, Faculty of Economics and Business Administration, Bulgaria)

Abstract

Background: The bootstrap can be alternative to cross-validation as a training/test set splitting method since it minimizes the computing time in classification problems in comparison to the tenfold cross-validation. Objectives: Тhis research investigates what proportion should be used to split the dataset into the training and the testing set so that the bootstrap might be competitive in terms of accuracy to other resampling methods. Methods/Approach: Different train/test split proportions are used with the following resampling methods: the bootstrap, the leave-one-out cross-validation, the tenfold cross-validation, and the random repeated train/test split to test their performance on several classification methods. The classification methods used include the logistic regression, the decision tree, and the k-nearest neighbours. Results: The findings suggest that using a different structure of the test set (e.g. 30/70, 20/80) can further optimize the performance of the bootstrap when applied to the logistic regression and the decision tree. For the k-nearest neighbour, the tenfold cross-validation with a 70/30 train/test splitting ratio is recommended. Conclusions: Depending on the characteristics and the preliminary transformations of the variables, the bootstrap can improve the accuracy of the classification problem.

Suggested Citation

  • Vrigazova Borislava, 2021. "The Proportion for Splitting Data into Training and Test Set for the Bootstrap in Classification Problems," Business Systems Research, Sciendo, vol. 12(1), pages 228-242, May.
  • Handle: RePEc:bit:bsrysr:v:12:y:2021:i:1:p:228-242:n:9
    DOI: 10.2478/bsrj-2021-0015
    as

    Download full text from publisher

    File URL: https://doi.org/10.2478/bsrj-2021-0015
    Download Restriction: no

    File URL: https://libkey.io/10.2478/bsrj-2021-0015?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Borislava Petrova Vrigazova & Ivan Ganchev Ivanov, 2020. "The bootstrap procedure in classification problems," International Journal of Data Mining, Modelling and Management, Inderscience Enterprises Ltd, vol. 12(4), pages 428-446.
    2. James G. MacKinnon, 2002. "Bootstrap inference in econometrics," Canadian Journal of Economics, Canadian Economics Association, vol. 35(4), pages 615-645, November.
    3. Grubinger, Thomas & Zeileis, Achim & Pfeiffer, Karl-Peter, 2014. "evtree: Evolutionary Learning of Globally Optimal Classification and Regression Trees in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 61(i01).
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. M.L. Nores & M.P. Díaz, 2016. "Bootstrap hypothesis testing in generalized additive models for comparing curves of treatments in longitudinal studies," Journal of Applied Statistics, Taylor & Francis Journals, vol. 43(5), pages 810-826, April.
    2. Francesca Di Iorio & Umberto Triacca, 2022. "A comparison between VAR processes jointly modeling GDP and Unemployment rate in France and Germany," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 31(3), pages 617-635, September.
    3. Dean V. Williamson, 2010. "Financial-Market Contracting," Chapters, in: Peter G. Klein & Michael E. Sykuta (ed.), The Elgar Companion to Transaction Cost Economics, chapter 24, Edward Elgar Publishing.
    4. Jean-Marie Dufour, 2003. "Identification, weak instruments, and statistical inference in econometrics," Canadian Journal of Economics, Canadian Economics Association, vol. 36(4), pages 767-808, November.
    5. Li, Ying & Liu, Zhen & Qin, Kuiyuan & Cui, Jiayu & Zeng, Xiaoyu & Ji, Ming & Lan, Jijun & You, Xuqun & Li, Yuan, 2021. "Organizational trust and safety operation behavior in airline pilots: The mediating effects of organizational identification and organizational commitment," Journal of Air Transport Management, Elsevier, vol. 92(C).
    6. Kanybek Nur-tegin, 2007. "Do Transition Economies and Developing Countries Have Similar Destinies?," Atlantic Economic Journal, Springer;International Atlantic Economic Society, vol. 35(3), pages 327-342, September.
    7. A. Talha Yalta, 2013. "Small Sample Bootstrap Inference of Level Relationships in the Presence of Autocorrelated Errors: A Large Scale Simulation Study and an Application in Energy Demand," Working Papers 1301, TOBB University of Economics and Technology, Department of Economics.
    8. Emmanuel Flachaire, 2005. "More Efficient Tests Robust to Heteroskedasticity of Unknown Form," Econometric Reviews, Taylor & Francis Journals, vol. 24(2), pages 219-241.
    9. repec:ebl:ecbull:v:30:y:2010:i:1:p:55-66 is not listed on IDEAS
    10. C F Elliott & R Simmons, 2007. "Determinants of UK box office success: the impact of quality signals," Working Papers 584026, Lancaster University Management School, Economics Department.
    11. Dong Ding & Axel Gandy & Georg Hahn, 2020. "A simple method for implementing Monte Carlo tests," Computational Statistics, Springer, vol. 35(3), pages 1373-1392, September.
    12. Amélie Charles & Olivier Darné, 2009. "Variance‐Ratio Tests Of Random Walk: An Overview," Journal of Economic Surveys, Wiley Blackwell, vol. 23(3), pages 503-527, July.
    13. Zhenlin Yang, 2013. "LM Tests of Spatial Dependence Based on Bootstrap Critical Values," Working Papers 03-2013, Singapore Management University, School of Economics.
    14. Shackman, Joshua D., 2006. "The equity premium and market integration: Evidence from international data," Journal of International Financial Markets, Institutions and Money, Elsevier, vol. 16(2), pages 155-179, April.
    15. Federica Alberti & Werner Güth & Kei Tsutsui, 2023. "Experimental Effects of Institutionalizing Co-determination by a Procedurally Fair Bidding Rule," Journal of Business Ethics, Springer, vol. 184(2), pages 445-458, May.
    16. Emmanuel Jordy Menvouta & Jolien Ponnet & Robin Van Oirbeek & Tim Verdonck, 2022. "mCube: Multinomial Micro-level reserving Model," Papers 2212.00101, arXiv.org.
    17. Grammig, Joachim G. & Peter, Franziska J., 2008. "International price discovery in the presence of market microstructure effects," CFR Working Papers 08-10, University of Cologne, Centre for Financial Research (CFR).
    18. Kim, Jae H., 2017. "Stock returns and investors' mood: Good day sunshine or spurious correlation?," International Review of Financial Analysis, Elsevier, vol. 52(C), pages 94-103.
    19. Iqbal, Javed & Brooks, Robert & Galagedera, Don UA, 2007. "Robust Tests of the Lower Partial Moment Asset Pricing Model in Emerging Markets," MPRA Paper 25349, University Library of Munich, Germany, revised May 2007.
    20. Fernandez Martinez, Roberto & Lostado Lorza, Ruben & Santos Delgado, Ana Alexandra & Piedra, Nelson, 2021. "Use of classification trees and rule-based models to optimize the funding assignment to research projects: A case study of UTPL," Journal of Informetrics, Elsevier, vol. 15(1).
    21. Höppner, Sebastiaan & Stripling, Eugen & Baesens, Bart & Broucke, Seppe vanden & Verdonck, Tim, 2020. "Profit driven decision trees for churn prediction," European Journal of Operational Research, Elsevier, vol. 284(3), pages 920-933.

    More about this item

    Keywords

    the bootstrap; classification; cross-validation; repeated train/test splitting;
    All these keywords.

    JEL classification:

    • C38 - Mathematical and Quantitative Methods - - Multiple or Simultaneous Equation Models; Multiple Variables - - - Classification Methdos; Cluster Analysis; Principal Components; Factor Analysis
    • C52 - Mathematical and Quantitative Methods - - Econometric Modeling - - - Model Evaluation, Validation, and Selection
    • C55 - Mathematical and Quantitative Methods - - Econometric Modeling - - - Large Data Sets: Modeling and Analysis

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bit:bsrysr:v:12:y:2021:i:1:p:228-242:n:9. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Peter Golla (email available below). General contact details of provider: https://www.sciendo.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.