IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v71y2014icp789-802.html
   My bibliography  Save this article

Classification with decision trees from a nonparametric predictive inference perspective

Author

Listed:
  • Abellán, Joaquín
  • Baker, Rebecca M.
  • Coolen, Frank P.A.
  • Crossman, Richard J.
  • Masegosa, Andrés R.

Abstract

An application of nonparametric predictive inference for multinomial data (NPI) to classification tasks is presented. This model is applied to an established procedure for building classification trees using imprecise probabilities and uncertainty measures, thus far used only with the imprecise Dirichlet model (IDM), that is defined through the use of a parameter expressing previous knowledge. The accuracy of that procedure of classification has a significant dependence on the value of the parameter used when the IDM is applied. A detailed study involving 40 data sets shows that the procedure using the NPI model (which has no parameter dependence) obtains a better trade-off between accuracy and size of tree than does the procedure when the IDM is used, whatever the choice of parameter. In a bias-variance study of the errors, it is proved that the procedure with the NPI model has a lower variance than the one with the IDM, implying a lower level of over-fitting.

Suggested Citation

  • Abellán, Joaquín & Baker, Rebecca M. & Coolen, Frank P.A. & Crossman, Richard J. & Masegosa, Andrés R., 2014. "Classification with decision trees from a nonparametric predictive inference perspective," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 789-802.
  • Handle: RePEc:eee:csdana:v:71:y:2014:i:c:p:789-802
    DOI: 10.1016/j.csda.2013.02.009
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947313000534
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2013.02.009?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Chen, Weijie & Yousef, Waleed A. & Gallas, Brandon D. & Hsu, Elizabeth R. & Lababidi, Samir & Tang, Rong & Pennello, Gene A. & Symmans, W. Fraser & Pusztai, Lajos, 2012. "Uncertainty estimation with a finite dataset in the assessment of classification models," Computational Statistics & Data Analysis, Elsevier, vol. 56(5), pages 1016-1027.
    2. Hapfelmeier, A. & Ulm, K., 2013. "A new variable selection approach using Random Forests," Computational Statistics & Data Analysis, Elsevier, vol. 60(C), pages 50-69.
    3. Abellán, Joaquín & Baker, Rebecca M. & Coolen, Frank P.A., 2011. "Maximising entropy on the nonparametric predictive inference model for multinomial data," European Journal of Operational Research, Elsevier, vol. 212(1), pages 112-122, July.
    4. Abellán, Joaquín & Masegosa, Andrés R., 2010. "An ensemble method using credal decision trees," European Journal of Operational Research, Elsevier, vol. 205(1), pages 218-226, August.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Frank PA Coolen & Tahani Coolen-Maturi & Abdullah H Al-nefaiee, 2014. "Nonparametric predictive inference for system reliability using the survival signature," Journal of Risk and Reliability, , vol. 228(5), pages 437-448, October.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Weijun Wang & Dan Zhao & Liguo Fan & Yulong Jia, 2019. "Study on Icing Prediction of Power Transmission Lines Based on Ensemble Empirical Mode Decomposition and Feature Selection Optimized Extreme Learning Machine," Energies, MDPI, vol. 12(11), pages 1-21, June.
    2. Coolen-Maturi, Tahani & Elkhafifi, Faiza F. & Coolen, Frank P.A., 2014. "Three-group ROC analysis: A nonparametric predictive approach," Computational Statistics & Data Analysis, Elsevier, vol. 78(C), pages 69-81.
    3. Lkhagvadorj Munkhdalai & Tsendsuren Munkhdalai & Oyun-Erdene Namsrai & Jong Yun Lee & Keun Ho Ryu, 2019. "An Empirical Comparison of Machine-Learning Methods on Bank Client Credit Assessments," Sustainability, MDPI, vol. 11(3), pages 1-23, January.
    4. Houlding, B. & Coolen, F.P.A., 2012. "Nonparametric predictive utility inference," European Journal of Operational Research, Elsevier, vol. 221(1), pages 222-230.
    5. Cang, Shuang & Yu, Hongnian, 2014. "A combination selection algorithm on forecasting," European Journal of Operational Research, Elsevier, vol. 234(1), pages 127-139.
    6. Zardad Khan & Asma Gul & Aris Perperoglou & Miftahuddin Miftahuddin & Osama Mahmoud & Werner Adler & Berthold Lausen, 2020. "Ensemble of optimal trees, random forest and random projection ensemble classification," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 14(1), pages 97-116, March.
    7. Fellinghauer, Bernd & Bühlmann, Peter & Ryffel, Martin & von Rhein, Michael & Reinhardt, Jan D., 2013. "Stable graphical model estimation with Random Forests for discrete, continuous, and mixed variables," Computational Statistics & Data Analysis, Elsevier, vol. 64(C), pages 132-152.
    8. Hermel Homburger & Manuel K Schneider & Sandra Hilfiker & Andreas Lüscher, 2014. "Inferring Behavioral States of Grazing Livestock from High-Frequency Position Data Alone," PLOS ONE, Public Library of Science, vol. 9(12), pages 1-22, December.
    9. Ingrida Vaiciulyte & Zivile Kalsyte & Leonidas Sakalauskas & Darius Plikynas, 2017. "Assessment of market reaction on the share performance on the basis of its visualization in 2D space," Journal of Business Economics and Management, Taylor & Francis Journals, vol. 18(2), pages 309-318, March.
    10. Fernández, Arturo J., 2012. "Minimizing the area of a Pareto confidence region," European Journal of Operational Research, Elsevier, vol. 221(1), pages 205-212.
    11. Hapfelmeier, Alexander & Hornung, Roman & Haller, Bernhard, 2023. "Efficient permutation testing of variable importance measures by the example of random forests," Computational Statistics & Data Analysis, Elsevier, vol. 181(C).
    12. Dogah, Kingsley E. & Premaratne, Gamini, 2018. "Sectoral exposure of financial markets to oil risk factors in BRICS countries," Energy Economics, Elsevier, vol. 76(C), pages 228-256.
    13. Hapfelmeier, A. & Ulm, K., 2014. "Variable selection by Random Forests using data with missing values," Computational Statistics & Data Analysis, Elsevier, vol. 80(C), pages 129-139.
    14. Barbara Baranowska & Anna Kajdy & Paulina Pawlicka & Ernest Pokropek & Michał Rabijewski & Dorota Sys & Artur Pokropek, 2020. "What are the Critical Elements of Satisfaction and Experience in Labor and Childbirth—A Cross-Sectional Study," IJERPH, MDPI, vol. 17(24), pages 1-13, December.
    15. Massimiliano Fessina & Giambattista Albora & Andrea Tacchella & Andrea Zaccaria, 2022. "Which products activate a product? An explainable machine learning approach," Papers 2212.03094, arXiv.org.
    16. Mohan Bi & Huiying Li & Peter Meidl & Yanjie Zhu & Masahiro Ryo & Matthias C. Rillig, 2024. "Number and dissimilarity of global change factors influences soil properties and functions," Nature Communications, Nature, vol. 15(1), pages 1-14, December.
    17. Liangyuan Hu & Lihua Li, 2022. "Using Tree-Based Machine Learning for Health Studies: Literature Review and Case Series," IJERPH, MDPI, vol. 19(23), pages 1-13, December.
    18. Abellán, Joaquín & Baker, Rebecca M. & Coolen, Frank P.A., 2011. "Maximising entropy on the nonparametric predictive inference model for multinomial data," European Journal of Operational Research, Elsevier, vol. 212(1), pages 112-122, July.
    19. Chikalov, Igor & Hussain, Shahid & Moshkov, Mikhail, 2018. "Bi-criteria optimization of decision trees with applications to data analysis," European Journal of Operational Research, Elsevier, vol. 266(2), pages 689-701.
    20. Silke Janitza & Ender Celik & Anne-Laure Boulesteix, 2018. "A computationally fast variable importance test for random forests for high-dimensional data," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 12(4), pages 885-915, December.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:71:y:2014:i:c:p:789-802. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.