Author
Abstract
A dynamic treatment regime (DTR) is a sequence of decision rules that adapt to the time-varying states of an individual. Black-box learning methods have shown great potential in predicting the optimal treatments; however, the resulting DTRs lack interpretability, which is of paramount importance for medical experts to understand and implement. We present a stochastic tree-based reinforcement learning (ST-RL) method for estimating optimal DTRs in a multistage multitreatment setting with data from either randomized trials or observational studies. At each stage, ST-RL constructs a decision tree by first modeling the mean of counterfactual outcomes via nonparametric regression models, and then stochastically searching for the optimal tree-structured decision rule using a Markov chain Monte Carlo algorithm. We implement the proposed method in a backward inductive fashion through multiple decision stages. The proposed ST-RL delivers optimal DTRs with better interpretability and contributes to the existing literature in its non-greedy policy search. Additionally, ST-RL demonstrates stable and outstanding performances even with a large number of covariates, which is especially appealing when data are from large observational studies. We illustrate the performance of ST-RL through simulation studies, and also a real data application using esophageal cancer data collected from 1170 patients at MD Anderson Cancer Center from 1998 to 2012. Supplementary materials for this article are available online.
Suggested Citation
Yilun Sun & Lu Wang, 2020.
"Stochastic Tree Search for Estimating Optimal Dynamic Treatment Regimes,"
Journal of the American Statistical Association, Taylor & Francis Journals, vol. 116(533), pages 421-432, October.
Handle:
RePEc:taf:jnlasa:v:116:y:2020:i:533:p:421-432
DOI: 10.1080/01621459.2020.1819294
Download full text from publisher
As the access to this document is restricted, you may want to search for a different version of it.
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:taf:jnlasa:v:116:y:2020:i:533:p:421-432. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Longhurst (email available below). General contact details of provider: http://www.tandfonline.com/UASA20 .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.