IDEAS home Printed from https://ideas.repec.org/a/oup/biomet/v109y2022i4p881-900..html
   My bibliography  Save this article

Mean decrease accuracy for random forests: inconsistency, and a practical solution via the Sobol-MDA
[Explaining individual predictions when features are dependent: more accurate approximations to Shapley values]

Author

Listed:
  • Clément Bénard
  • Sébastien Da Veiga
  • Erwan Scornet

Abstract

SummaryVariable importance measures are the main tools used to analyse the black-box mechanisms of random forests. Although the mean decrease accuracy is widely accepted as the most efficient variable importance measure for random forests, little is known about its statistical properties. In fact, the definition of mean decrease accuracy varies across the main random forest software. In this article, our objective is to rigorously analyse the behaviour of the main mean decrease accuracy implementations. Consequently, we mathematically formalize the various implemented mean decrease accuracy algorithms, and then establish their limits when the sample size increases. This asymptotic analysis reveals that these mean decrease accuracy versions differ as importance measures, since they converge towards different quantities. More importantly, we break down these limits into three components: the first two terms are related to Sobol indices, which are well-defined measures of a covariate contribution to the response variance, widely used in the sensitivity analysis field, as opposed to the third term, whose value increases with dependence within covariates. Thus, we theoretically demonstrate that the mean decrease accuracy does not target the right quantity to detect influential covariates in a dependent setting, a fact that has already been noticed experimentally. To address this issue, we define a new importance measure for random forests, the Sobol-mean decrease accuracy, which fixes the flaws of the original mean decrease accuracy, and consistently estimates the accuracy decrease of the forest retrained without a given covariate, but with an efficient computational cost. The Sobol-mean decrease accuracy empirically outperforms its competitors on both simulated and real data for variable selection.

Suggested Citation

  • Clément Bénard & Sébastien Da Veiga & Erwan Scornet, 2022. "Mean decrease accuracy for random forests: inconsistency, and a practical solution via the Sobol-MDA [Explaining individual predictions when features are dependent: more accurate approximations to ," Biometrika, Biometrika Trust, vol. 109(4), pages 881-900.
  • Handle: RePEc:oup:biomet:v:109:y:2022:i:4:p:881-900.
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1093/biomet/asac017
    Download Restriction: Access to full text is restricted to subscribers.
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:oup:biomet:v:109:y:2022:i:4:p:881-900.. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Oxford University Press (email available below). General contact details of provider: https://academic.oup.com/biomet .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.