The Proportion for Splitting Data into Training and Test Set for the Bootstrap in Classification Problems

My bibliography Save this article

The Proportion for Splitting Data into Training and Test Set for the Bootstrap in Classification Problems

Author

Listed:

Vrigazova Borislava
(Sofia University, Faculty of Economics and Business Administration, Bulgaria)

Registered:

Abstract

Background: The bootstrap can be alternative to cross-validation as a training/test set splitting method since it minimizes the computing time in classification problems in comparison to the tenfold cross-validation. Objectives: Тhis research investigates what proportion should be used to split the dataset into the training and the testing set so that the bootstrap might be competitive in terms of accuracy to other resampling methods. Methods/Approach: Different train/test split proportions are used with the following resampling methods: the bootstrap, the leave-one-out cross-validation, the tenfold cross-validation, and the random repeated train/test split to test their performance on several classification methods. The classification methods used include the logistic regression, the decision tree, and the k-nearest neighbours. Results: The findings suggest that using a different structure of the test set (e.g. 30/70, 20/80) can further optimize the performance of the bootstrap when applied to the logistic regression and the decision tree. For the k-nearest neighbour, the tenfold cross-validation with a 70/30 train/test splitting ratio is recommended. Conclusions: Depending on the characteristics and the preliminary transformations of the variables, the bootstrap can improve the accuracy of the classification problem.

Suggested Citation

Vrigazova Borislava, 2021. "The Proportion for Splitting Data into Training and Test Set for the Bootstrap in Classification Problems," Business Systems Research, Sciendo, vol. 12(1), pages 228-242, May.

Handle: RePEc:bit:bsrysr:v:12:y:2021:i:1:p:228-242:n:9
DOI: 10.2478/bsrj-2021-0015

Download full text from publisher

References listed on IDEAS

Borislava Petrova Vrigazova & Ivan Ganchev Ivanov, 2020. "The bootstrap procedure in classification problems," International Journal of Data Mining, Modelling and Management, Inderscience Enterprises Ltd, vol. 12(4), pages 428-446.
James G. MacKinnon, 2002. "Bootstrap inference in econometrics," Canadian Journal of Economics, Canadian Economics Association, vol. 35(4), pages 615-645, November.
- James G. MacKinnon, 2002. "Bootstrap inference in econometrics," Canadian Journal of Economics/Revue canadienne d'économique, John Wiley & Sons, vol. 35(4), pages 615-645, November.
Grubinger, Thomas & Zeileis, Achim & Pfeiffer, Karl-Peter, 2014. "evtree: Evolutionary Learning of Globally Optimal Classification and Regression Trees in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 61(i01).
- Thomas Grubinger & Achim Zeileis & Karl-Peter Pfeiffer, 2011. "evtree: Evolutionary Learning of Globally Optimal Classification and Regression Trees in R," Working Papers 2011-20, Faculty of Economics and Statistics, Universität Innsbruck.

Full references (including those not matched with items on IDEAS)

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

M.L. Nores & M.P. Díaz, 2016. "Bootstrap hypothesis testing in generalized additive models for comparing curves of treatments in longitudinal studies," Journal of Applied Statistics, Taylor & Francis Journals, vol. 43(5), pages 810-826, April.
Francesca Di Iorio & Umberto Triacca, 2022. "A comparison between VAR processes jointly modeling GDP and Unemployment rate in France and Germany," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 31(3), pages 617-635, September.
Dean V. Williamson, 2010. "Financial-Market Contracting," Chapters, in: Peter G. Klein & Michael E. Sykuta (ed.), The Elgar Companion to Transaction Cost Economics, chapter 24, Edward Elgar Publishing.
Jean-Marie Dufour, 2003. "Identification, weak instruments, and statistical inference in econometrics," Canadian Journal of Economics, Canadian Economics Association, vol. 36(4), pages 767-808, November.
- DUFOUR, Jean-Marie, 2003. "Identification, Weak Instruments and Statistical Inference in Econometrics," Cahiers de recherche 2003-12, Universite de Montreal, Departement de sciences economiques.
- Jean-Marie Dufour, 2003. "Identification, Weak Instruments and Statistical Inference in Econometrics," CIRANO Working Papers 2003s-49, CIRANO.
- DUFOUR, Jean-Marie, 2003. "Identification, Weak Instruments and Statistical Inference in Econometrics," Cahiers de recherche 10-2003, Centre interuniversitaire de recherche en Ã©conomie quantitative, CIREQ.
Li, Ying & Liu, Zhen & Qin, Kuiyuan & Cui, Jiayu & Zeng, Xiaoyu & Ji, Ming & Lan, Jijun & You, Xuqun & Li, Yuan, 2021. "Organizational trust and safety operation behavior in airline pilots: The mediating effects of organizational identification and organizational commitment," Journal of Air Transport Management, Elsevier, vol. 92(C).
Kanybek Nur-tegin, 2007. "Do Transition Economies and Developing Countries Have Similar Destinies?," Atlantic Economic Journal, Springer;International Atlantic Economic Society, vol. 35(3), pages 327-342, September.
A. Talha Yalta, 2013. "Small Sample Bootstrap Inference of Level Relationships in the Presence of Autocorrelated Errors: A Large Scale Simulation Study and an Application in Energy Demand," Working Papers 1301, TOBB University of Economics and Technology, Department of Economics.
Emmanuel Flachaire, 2005. "More Efficient Tests Robust to Heteroskedasticity of Unknown Form," Econometric Reviews, Taylor & Francis Journals, vol. 24(2), pages 219-241.
- Emmanuel Flachaire, 2005. "More efficient tests robust to heteroskedasticity of unknown form," Post-Print halshs-00175914, HAL.
- Emmanuel Flachaire, 2005. "More efficient tests robust to heteroskedasticity of unknown form," Université Paris1 Panthéon-Sorbonne (Post-Print and Working Papers) halshs-00175914, HAL.
repec:ebl:ecbull:v:30:y:2010:i:1:p:55-66 is not listed on IDEAS
C F Elliott & R Simmons, 2007. "Determinants of UK box office success: the impact of quality signals," Working Papers 584026, Lancaster University Management School, Economics Department.
Dong Ding & Axel Gandy & Georg Hahn, 2020. "A simple method for implementing Monte Carlo tests," Computational Statistics, Springer, vol. 35(3), pages 1373-1392, September.
Amélie Charles & Olivier Darné, 2009. "Variance‐Ratio Tests Of Random Walk: An Overview," Journal of Economic Surveys, Wiley Blackwell, vol. 23(3), pages 503-527, July.
- Amélie Charles & Olivier Darné, 2009. "Variance ratio tests of random walk: An overview," Post-Print hal-00771078, HAL.
Zhenlin Yang, 2013. "LM Tests of Spatial Dependence Based on Bootstrap Critical Values," Working Papers 03-2013, Singapore Management University, School of Economics.
Shackman, Joshua D., 2006. "The equity premium and market integration: Evidence from international data," Journal of International Financial Markets, Institutions and Money, Elsevier, vol. 16(2), pages 155-179, April.
Federica Alberti & Werner Güth & Kei Tsutsui, 2023. "Experimental Effects of Institutionalizing Co-determination by a Procedurally Fair Bidding Rule," Journal of Business Ethics, Springer, vol. 184(2), pages 445-458, May.
Emmanuel Jordy Menvouta & Jolien Ponnet & Robin Van Oirbeek & Tim Verdonck, 2022. "mCube: Multinomial Micro-level reserving Model," Papers 2212.00101, arXiv.org.
Grammig, Joachim G. & Peter, Franziska J., 2008. "International price discovery in the presence of market microstructure effects," CFR Working Papers 08-10, University of Cologne, Centre for Financial Research (CFR).
Kim, Jae H., 2017. "Stock returns and investors' mood: Good day sunshine or spurious correlation?," International Review of Financial Analysis, Elsevier, vol. 52(C), pages 94-103.
- Kim, Jae, 2016. "Stock Returns and Investors’ Mood: Good Day Sunshine or Spurious Correlation?," MPRA Paper 70692, University Library of Munich, Germany.
Iqbal, Javed & Brooks, Robert & Galagedera, Don UA, 2007. "Robust Tests of the Lower Partial Moment Asset Pricing Model in Emerging Markets," MPRA Paper 25349, University Library of Munich, Germany, revised May 2007.
Fernandez Martinez, Roberto & Lostado Lorza, Ruben & Santos Delgado, Ana Alexandra & Piedra, Nelson, 2021. "Use of classification trees and rule-based models to optimize the funding assignment to research projects: A case study of UTPL," Journal of Informetrics, Elsevier, vol. 15(1).
Höppner, Sebastiaan & Stripling, Eugen & Baesens, Bart & Broucke, Seppe vanden & Verdonck, Tim, 2020. "Profit driven decision trees for churn prediction," European Journal of Operational Research, Elsevier, vol. 284(3), pages 920-933.

More about this item

Keywords

the bootstrap; classification; cross-validation; repeated train/test splitting;
All these keywords.

JEL classification:

C38 - Mathematical and Quantitative Methods - - Multiple or Simultaneous Equation Models; Multiple Variables - - - Classification Methdos; Cluster Analysis; Principal Components; Factor Analysis
C52 - Mathematical and Quantitative Methods - - Econometric Modeling - - - Model Evaluation, Validation, and Selection
C55 - Mathematical and Quantitative Methods - - Econometric Modeling - - - Large Data Sets: Modeling and Analysis

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bit:bsrysr:v:12:y:2021:i:1:p:228-242:n:9. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Peter Golla (email available below). General contact details of provider: https://www.sciendo.com .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

The Proportion for Splitting Data into Training and Test Set for the Bootstrap in Classification Problems

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Most related items

More about this item

Keywords

JEL classification:

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data