IDEAS home Printed from https://ideas.repec.org/a/taf/japsta/v39y2012i1p151-160.html
   My bibliography  Save this article

Bias-corrected random forests in regression

Author

Listed:
  • Guoyi Zhang
  • Yan Lu

Abstract

It is well known that random forests reduce the variance of the regression predictors compared to a single tree, while leaving the bias unchanged. In many situations, the dominating component in the risk turns out to be the squared bias, which leads to the necessity of bias correction. In this paper, random forests are used to estimate the regression function. Five different methods for estimating bias are proposed and discussed. Simulated and real data are used to study the performance of these methods. Our proposed methods are significantly effective in reducing bias in regression context.

Suggested Citation

  • Guoyi Zhang & Yan Lu, 2012. "Bias-corrected random forests in regression," Journal of Applied Statistics, Taylor & Francis Journals, vol. 39(1), pages 151-160, March.
  • Handle: RePEc:taf:japsta:v:39:y:2012:i:1:p:151-160
    DOI: 10.1080/02664763.2011.578621
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1080/02664763.2011.578621
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1080/02664763.2011.578621?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Biau, Gérard & Devroye, Luc, 2010. "On the layered nearest neighbour estimate, the bagged nearest neighbour estimate and the random forest method in regression and classification," Journal of Multivariate Analysis, Elsevier, vol. 101(10), pages 2499-2518, November.
    2. Lin, Yi & Jeon, Yongho, 2006. "Random Forests and Adaptive Nearest Neighbors," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 578-590, June.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Yizhou Wu & Zichun Huang & Dan Han & Xiaoli Qiu & Yaxin Pan, 2023. "Evolution of Urban Ecosystem Service Value and a Scenario Analysis Based on Land Utilization Changes: A Case Study of Hangzhou, China," Sustainability, MDPI, vol. 15(10), pages 1-17, May.
    2. Backer, David & Billing, Trey, 2024. "Forecasting the prevalence of child acute malnutrition using environmental and conflict conditions as leading indicators," World Development, Elsevier, vol. 176(C).
    3. Feng, Puyu & Wang, Bin & Liu, De Li & Yu, Qiang, 2019. "Machine learning-based integration of remotely-sensed drought factors can improve the estimation of agricultural drought in South-Eastern Australia," Agricultural Systems, Elsevier, vol. 173(C), pages 303-316.
    4. Gert Bijnens & Shyngys Karimov & Jozef Konings, 2023. "Does Automatic Wage Indexation Destroy Jobs? A Machine Learning Approach," De Economist, Springer, vol. 171(1), pages 85-117, March.
    5. Hyukjun Gweon & Shu Li & Yangxuan Xu, 2024. "Use of Prediction Bias in Active Learning and Its Application to Large Variable Annuity Portfolios," Risks, MDPI, vol. 12(6), pages 1-14, May.
    6. Maria Angela Echeverry-Galvis & Jennifer K Peterson & Rajmonda Sulo-Caceres, 2014. "The Social Nestwork: Tree Structure Determines Nest Placement in Kenyan Weaverbird Colonies," PLOS ONE, Public Library of Science, vol. 9(2), pages 1-7, February.
    7. Ku, Arthur Lin & Qiu, Yueming (Lucy) & Lou, Jiehong & Nock, Destenie & Xing, Bo, 2022. "Changes in hourly electricity consumption under COVID mandates: A glance to future hourly residential power consumption pattern with remote work in Arizona," Applied Energy, Elsevier, vol. 310(C).
    8. Wang, Jiacheng & Zhao, Zhihong & Liu, Guihong & Xu, Haoran, 2022. "A robust optimization approach of well placement for doublet in heterogeneous geothermal reservoirs using random forest technique and genetic algorithm," Energy, Elsevier, vol. 254(PC).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Mendez, Guillermo & Lohr, Sharon, 2011. "Estimating residual variance in random forest regression," Computational Statistics & Data Analysis, Elsevier, vol. 55(11), pages 2937-2950, November.
    2. Susan Athey & Julie Tibshirani & Stefan Wager, 2016. "Generalized Random Forests," Papers 1610.01271, arXiv.org, revised Apr 2018.
    3. Jincheng Shen & Lu Wang & Jeremy M. G. Taylor, 2017. "Estimation of the optimal regime in treatment of prostate cancer recurrence from observational data using flexible weighting models," Biometrics, The International Biometric Society, vol. 73(2), pages 635-645, June.
    4. Gérard Biau & Erwan Scornet, 2016. "A random forest guided tour," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 25(2), pages 197-227, June.
    5. Uguccioni, James, 2022. "The long-run effects of parental unemployment in childhood," CLEF Working Paper Series 45, Canadian Labour Economics Forum (CLEF), University of Waterloo.
    6. Ramosaj, Burim & Pauly, Markus, 2019. "Consistent estimation of residual variance with random forest Out-Of-Bag errors," Statistics & Probability Letters, Elsevier, vol. 151(C), pages 49-57.
    7. Zhexiao Lin & Fang Han, 2022. "On regression-adjusted imputation estimators of the average treatment effect," Papers 2212.05424, arXiv.org, revised Jan 2023.
    8. Jerinsh Jeyapaulraj & Dhruv Desai & Peter Chu & Dhagash Mehta & Stefano Pasquali & Philip Sommer, 2022. "Supervised similarity learning for corporate bonds using Random Forest proximities," Papers 2207.04368, arXiv.org, revised Oct 2022.
    9. Luu, Tung Duy & Fadili, Jalal & Chesneau, Christophe, 2019. "PAC-Bayesian risk bounds for group-analysis sparse regression by exponential weighting," Journal of Multivariate Analysis, Elsevier, vol. 171(C), pages 209-233.
    10. David M. Ritzwoller & Vasilis Syrgkanis, 2024. "Simultaneous Inference for Local Structural Parameters with Random Forests," Papers 2405.07860, arXiv.org, revised Sep 2024.
    11. Li, Yiliang & Bai, Xiwen & Wang, Qi & Ma, Zhongjun, 2022. "A big data approach to cargo type prediction and its implications for oil trade estimation," Transportation Research Part E: Logistics and Transportation Review, Elsevier, vol. 165(C).
    12. Yi Fu & Shuai Cao & Tao Pang, 2020. "A Sustainable Quantitative Stock Selection Strategy Based on Dynamic Factor Adjustment," Sustainability, MDPI, vol. 12(10), pages 1-12, May.
    13. José María Sarabia & Faustino Prieto & Vanesa Jordá & Stefan Sperlich, 2020. "A Note on Combining Machine Learning with Statistical Modeling for Financial Data Analysis," Risks, MDPI, vol. 8(2), pages 1-14, April.
    14. Biau, Gérard & Devroye, Luc, 2010. "On the layered nearest neighbour estimate, the bagged nearest neighbour estimate and the random forest method in regression and classification," Journal of Multivariate Analysis, Elsevier, vol. 101(10), pages 2499-2518, November.
    15. Olivier BIAU & Angela D´ELIA, 2010. "Euro Area GDP Forecast Using Large Survey Dataset - A Random Forest Approach," EcoMod2010 259600029, EcoMod.
    16. Cleridy E. Lennert‐Cody & Richard A. Berk, 2007. "Statistical learning procedures for monitoring regulatory compliance: an application to fisheries data," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 170(3), pages 671-689, July.
    17. Philippe Goulet Coulombe, 2024. "The macroeconomy as a random forest," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 39(3), pages 401-421, April.
    18. Paola Zuccolotto & Marco Sandri & Marica Manisera, 2023. "Spatial performance analysis in basketball with CART, random forest and extremely randomized trees," Annals of Operations Research, Springer, vol. 325(1), pages 495-519, June.
    19. Dhruv Desai & Ashmita Dhiman & Tushar Sharma & Deepika Sharma & Dhagash Mehta & Stefano Pasquali, 2023. "Quantifying Outlierness of Funds from their Categories using Supervised Similarity," Papers 2308.06882, arXiv.org.
    20. Hoora Moradian & Denis Larocque & François Bellavance, 2017. "$$L_1$$ L 1 splitting rules in survival forests," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 23(4), pages 671-691, October.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:taf:japsta:v:39:y:2012:i:1:p:151-160. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Longhurst (email available below). General contact details of provider: http://www.tandfonline.com/CJAS20 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.