IDEAS home Printed from https://ideas.repec.org/a/taf/jnlasa/v113y2018i522p626-636.html
   My bibliography  Save this article

Bayesian Regression Trees for High-Dimensional Prediction and Variable Selection

Author

Listed:
  • Antonio R. Linero

Abstract

Decision tree ensembles are an extremely popular tool for obtaining high-quality predictions in nonparametric regression problems. Unmodified, however, many commonly used decision tree ensemble methods do not adapt to sparsity in the regime in which the number of predictors is larger than the number of observations. A recent stream of research concerns the construction of decision tree ensembles that are motivated by a generative probabilistic model, the most influential method being the Bayesian additive regression trees (BART) framework. In this article, we take a Bayesian point of view on this problem and show how to construct priors on decision tree ensembles that are capable of adapting to sparsity in the predictors by placing a sparsity-inducing Dirichlet hyperprior on the splitting proportions of the regression tree prior. We characterize the asymptotic distribution of the number of predictors included in the model and show how this prior can be easily incorporated into existing Markov chain Monte Carlo schemes. We demonstrate that our approach yields useful posterior inclusion probabilities for each predictor and illustrate the usefulness of our approach relative to other decision tree ensemble approaches on both simulated and real datasets. Supplementary materials for this article are available online.

Suggested Citation

  • Antonio R. Linero, 2018. "Bayesian Regression Trees for High-Dimensional Prediction and Variable Selection," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(522), pages 626-636, April.
  • Handle: RePEc:taf:jnlasa:v:113:y:2018:i:522:p:626-636
    DOI: 10.1080/01621459.2016.1264957
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1080/01621459.2016.1264957
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1080/01621459.2016.1264957?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Oyebayo Ridwan Olaniran & Ali Rashash R. Alzahrani, 2023. "On the Oracle Properties of Bayesian Random Forest for Sparse High-Dimensional Gaussian Regression," Mathematics, MDPI, vol. 11(24), pages 1-29, December.
    2. Yaojun Zhang & Lanpeng Ji & Georgios Aivaliotis & Charles Taylor, 2023. "Bayesian CART models for insurance claims frequency," Papers 2303.01923, arXiv.org, revised Dec 2023.
    3. Huber, Florian & Koop, Gary & Onorante, Luca & Pfarrhofer, Michael & Schreiner, Josef, 2023. "Nowcasting in a pandemic using non-parametric mixed frequency VARs," Journal of Econometrics, Elsevier, vol. 232(1), pages 52-69.
    4. Billio, Monica & Casarin, Roberto & Costola, Michele & Veggente, Veronica, 2024. "Learning from experts: Energy efficiency in residential buildings," Energy Economics, Elsevier, vol. 136(C).
    5. Deshpande Sameer K. & Evans Katherine, 2020. "Expected hypothetical completion probability," Journal of Quantitative Analysis in Sports, De Gruyter, vol. 16(2), pages 85-94, June.
    6. Lamprinakou, Stamatina & Barahona, Mauricio & Flaxman, Seth & Filippi, Sarah & Gandy, Axel & McCoy, Emma J., 2023. "BART-based inference for Poisson processes," Computational Statistics & Data Analysis, Elsevier, vol. 180(C).
    7. Zhang, Yaojun & Ji, Lanpeng & Aivaliotis, Georgios & Taylor, Charles, 2024. "Bayesian CART models for insurance claims frequency," Insurance: Mathematics and Economics, Elsevier, vol. 114(C), pages 108-131.
    8. Falco J. Bargagli-Stoffi & Fabio Incerti & Massimo Riccaboni & Armando Rungi, 2023. "Machine Learning for Zombie Hunting: Predicting Distress from Firms' Accounts and Missing Values," Papers 2306.08165, arXiv.org.
    9. Florian Huber & Luca Rossini, 2020. "Inference in Bayesian Additive Vector Autoregressive Tree Models," Papers 2006.16333, arXiv.org, revised Mar 2021.
    10. Yi Liu & Veronika Ročková & Yuexi Wang, 2021. "Variable selection with ABC Bayesian forests," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 83(3), pages 453-481, July.
    11. Rodney A. Sparapani & Brent R. Logan & Martin J. Maiers & Purushottam W. Laud & Robert E. McCulloch, 2023. "Nonparametric failure time: Time‐to‐event machine learning with heteroskedastic Bayesian additive regression trees and low information omnibus Dirichlet process mixtures," Biometrics, The International Biometric Society, vol. 79(4), pages 3023-3037, December.
    12. Falco J. Bargagli Stoffi & Kenneth De Beckker & Joana E. Maldonado & Kristof De Witte, 2021. "Assessing Sensitivity of Machine Learning Predictions.A Novel Toolbox with an Application to Financial Literacy," Papers 2102.04382, arXiv.org.
    13. Niu, Zibo & Wang, Chenlu & Zhang, Hongwei, 2023. "Forecasting stock market volatility with various geopolitical risks categories: New evidence from machine learning models," International Review of Financial Analysis, Elsevier, vol. 89(C).
    14. Horiguchi, Akira & Pratola, Matthew T. & Santner, Thomas J., 2021. "Assessing variable activity for Bayesian regression trees," Reliability Engineering and System Safety, Elsevier, vol. 207(C).
    15. Alberto Caron & Gianluca Baio & Ioanna Manolopoulou, 2022. "Estimating individual treatment effects using non‐parametric regression models: A review," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 185(3), pages 1115-1149, July.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:taf:jnlasa:v:113:y:2018:i:522:p:626-636. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Longhurst (email available below). General contact details of provider: http://www.tandfonline.com/UASA20 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.