IDEAS home Printed from https://ideas.repec.org/a/bit/bsrysr/v12y2021i1p228-242n9.html
   My bibliography  Save this article

The Proportion for Splitting Data into Training and Test Set for the Bootstrap in Classification Problems

Author

Listed:
  • Vrigazova Borislava

    (Sofia University, Faculty of Economics and Business Administration, Bulgaria)

Abstract

Background: The bootstrap can be alternative to cross-validation as a training/test set splitting method since it minimizes the computing time in classification problems in comparison to the tenfold cross-validation. Objectives: Тhis research investigates what proportion should be used to split the dataset into the training and the testing set so that the bootstrap might be competitive in terms of accuracy to other resampling methods. Methods/Approach: Different train/test split proportions are used with the following resampling methods: the bootstrap, the leave-one-out cross-validation, the tenfold cross-validation, and the random repeated train/test split to test their performance on several classification methods. The classification methods used include the logistic regression, the decision tree, and the k-nearest neighbours. Results: The findings suggest that using a different structure of the test set (e.g. 30/70, 20/80) can further optimize the performance of the bootstrap when applied to the logistic regression and the decision tree. For the k-nearest neighbour, the tenfold cross-validation with a 70/30 train/test splitting ratio is recommended. Conclusions: Depending on the characteristics and the preliminary transformations of the variables, the bootstrap can improve the accuracy of the classification problem.

Suggested Citation

  • Vrigazova Borislava, 2021. "The Proportion for Splitting Data into Training and Test Set for the Bootstrap in Classification Problems," Business Systems Research, Sciendo, vol. 12(1), pages 228-242, May.
  • Handle: RePEc:bit:bsrysr:v:12:y:2021:i:1:p:228-242:n:9
    DOI: 10.2478/bsrj-2021-0015
    as

    Download full text from publisher

    File URL: https://doi.org/10.2478/bsrj-2021-0015
    Download Restriction: no

    File URL: https://libkey.io/10.2478/bsrj-2021-0015?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Borislava Petrova Vrigazova & Ivan Ganchev Ivanov, 2020. "The bootstrap procedure in classification problems," International Journal of Data Mining, Modelling and Management, Inderscience Enterprises Ltd, vol. 12(4), pages 428-446.
    2. James G. MacKinnon, 2002. "Bootstrap inference in econometrics," Canadian Journal of Economics, Canadian Economics Association, vol. 35(4), pages 615-645, November.
    3. Grubinger, Thomas & Zeileis, Achim & Pfeiffer, Karl-Peter, 2014. "evtree: Evolutionary Learning of Globally Optimal Classification and Regression Trees in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 61(i01).
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. M.L. Nores & M.P. Díaz, 2016. "Bootstrap hypothesis testing in generalized additive models for comparing curves of treatments in longitudinal studies," Journal of Applied Statistics, Taylor & Francis Journals, vol. 43(5), pages 810-826, April.
    2. Dean V. Williamson, 2010. "Financial-Market Contracting," Chapters, in: Peter G. Klein & Michael E. Sykuta (ed.), The Elgar Companion to Transaction Cost Economics, chapter 24, Edward Elgar Publishing.
    3. Li, Ying & Liu, Zhen & Qin, Kuiyuan & Cui, Jiayu & Zeng, Xiaoyu & Ji, Ming & Lan, Jijun & You, Xuqun & Li, Yuan, 2021. "Organizational trust and safety operation behavior in airline pilots: The mediating effects of organizational identification and organizational commitment," Journal of Air Transport Management, Elsevier, vol. 92(C).
    4. repec:ebl:ecbull:v:30:y:2010:i:1:p:55-66 is not listed on IDEAS
    5. C F Elliott & R Simmons, 2007. "Determinants of UK box office success: the impact of quality signals," Working Papers 584026, Lancaster University Management School, Economics Department.
    6. Dong Ding & Axel Gandy & Georg Hahn, 2020. "A simple method for implementing Monte Carlo tests," Computational Statistics, Springer, vol. 35(3), pages 1373-1392, September.
    7. Shackman, Joshua D., 2006. "The equity premium and market integration: Evidence from international data," Journal of International Financial Markets, Institutions and Money, Elsevier, vol. 16(2), pages 155-179, April.
    8. Federica Alberti & Werner Güth & Kei Tsutsui, 2023. "Experimental Effects of Institutionalizing Co-determination by a Procedurally Fair Bidding Rule," Journal of Business Ethics, Springer, vol. 184(2), pages 445-458, May.
    9. Emmanuel Jordy Menvouta & Jolien Ponnet & Robin Van Oirbeek & Tim Verdonck, 2022. "mCube: Multinomial Micro-level reserving Model," Papers 2212.00101, arXiv.org.
    10. Grammig, Joachim G. & Peter, Franziska J., 2008. "International price discovery in the presence of market microstructure effects," CFR Working Papers 08-10, University of Cologne, Centre for Financial Research (CFR).
    11. Kim, Jae H., 2017. "Stock returns and investors' mood: Good day sunshine or spurious correlation?," International Review of Financial Analysis, Elsevier, vol. 52(C), pages 94-103.
    12. Fernandez Martinez, Roberto & Lostado Lorza, Ruben & Santos Delgado, Ana Alexandra & Piedra, Nelson, 2021. "Use of classification trees and rule-based models to optimize the funding assignment to research projects: A case study of UTPL," Journal of Informetrics, Elsevier, vol. 15(1).
    13. Höppner, Sebastiaan & Stripling, Eugen & Baesens, Bart & Broucke, Seppe vanden & Verdonck, Tim, 2020. "Profit driven decision trees for churn prediction," European Journal of Operational Research, Elsevier, vol. 284(3), pages 920-933.
    14. Aye, Goodness C. & Gil-Alana, Luis A. & Gupta, Rangan & Wohar, Mark E., 2017. "The efficiency of the art market: Evidence from variance ratio tests, linear and nonlinear fractional integration approaches," International Review of Economics & Finance, Elsevier, vol. 51(C), pages 283-294.
    15. Kenneth S. Rogoff & Vania Stavrakeva, 2008. "The Continuing Puzzle of Short Horizon Exchange Rate Forecasting," NBER Working Papers 14071, National Bureau of Economic Research, Inc.
    16. repec:lan:wpaper:1090 is not listed on IDEAS
    17. Arghyrou, Michael G. & Gregoriou, Andros & Kontonikas, Alexandros, 2009. "Do real interest rates converge? Evidence from the European union," Journal of International Financial Markets, Institutions and Money, Elsevier, vol. 19(3), pages 447-460, July.
    18. Silvio John Camilleri & Christopher J. Green, 2009. "The impact of the suspension of opening and closing call auctions: evidence from the National Stock Exchange of India," International Journal of Banking, Accounting and Finance, Inderscience Enterprises Ltd, vol. 1(3), pages 257-284.
    19. Jean-Marie Dufour, 2003. "Identification, weak instruments, and statistical inference in econometrics," Canadian Journal of Economics, Canadian Economics Association, vol. 36(4), pages 767-808, November.
    20. Aqil Khan & Mumtaz Ahmed & Salma Bibi, 2019. "Financial development and economic growth nexus for Pakistan: a revisit using maximum entropy bootstrap approach," Empirical Economics, Springer, vol. 57(4), pages 1157-1169, October.
    21. Charles, Amélie & Darné, Olivier, 2009. "The efficiency of the crude oil markets: Evidence from variance ratio tests," Energy Policy, Elsevier, vol. 37(11), pages 4267-4272, November.
    22. Keaton Miller & Boyoung Seo, 2021. "The Effect of Cannabis Legalization on Substance Demand and Tax Revenues," National Tax Journal, University of Chicago Press, vol. 74(1), pages 107-145.

    More about this item

    Keywords

    the bootstrap; classification; cross-validation; repeated train/test splitting;
    All these keywords.

    JEL classification:

    • C38 - Mathematical and Quantitative Methods - - Multiple or Simultaneous Equation Models; Multiple Variables - - - Classification Methdos; Cluster Analysis; Principal Components; Factor Analysis
    • C52 - Mathematical and Quantitative Methods - - Econometric Modeling - - - Model Evaluation, Validation, and Selection
    • C55 - Mathematical and Quantitative Methods - - Econometric Modeling - - - Large Data Sets: Modeling and Analysis

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bit:bsrysr:v:12:y:2021:i:1:p:228-242:n:9. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Peter Golla (email available below). General contact details of provider: https://www.sciendo.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.