IDEAS home Printed from https://ideas.repec.org/p/bep/ucbbio/1141.html
   My bibliography  Save this paper

The Cross-Validated Adaptive Epsilon-Net Estimator

Author

Listed:
  • Mark van der Laan

    (Division of Biostatistics, School of Public Health, University of California, Berkeley)

  • Sandrine Dudoit

    (Division of Biostatistics, School of Public Health, University of California, Berkeley)

  • Aad van der Vaart

    (Dept. of Mathematics, Vrije Universitat, Amsterdam)

Abstract

Suppose that we observe a sample of independent and identically distributed realizations of a random variable. Assume that the parameter of interest can be defined as the minimizer, over a suitably defined parameter space, of the expectation (with respect to the distribution of the random variable) of a particular (loss) function of a candidate parameter value and the random variable. Examples of commonly used loss functions are the squared error loss function in regression and the negative log-density loss function in density estimation. Minimizing the empirical risk (i.e., the empirical mean of the loss function) over the entire parameter space typically results in ill-defined or too variable estimators of the parameter of interest (i.e., the risk minimizer for the true data generating distribution). In this article, we propose a cross-validated epsilon-net estimation methodology that covers a broad class of estimation problems, including multivariate outcome prediction and multivariate density estimation. An epsilon-net sieve of a subspace of the parameter space is defined as a collection of finite sets of points, the epsilon-nets indexed by epsilon, which approximate the subspace up till a resolution of epsilon. Given a collection of subspaces of the parameter space, one constructs an epsilon-net sieve for each of the subspaces. For each choice of subspace and each value of the resolution epsilon, one defines a candidate estimator as the minimizer of the empirical risk over the corresponding epsilon-net. The cross-validated epsilon-net estimator is then defined as the candidate estimator corresponding to the choice of subspace and epsilon-value minimizing the cross-validated empirical risk. We derive a finite sample inequality which proves that the proposed estimator achieves the adaptive optimal minimax rate of convergence, where the adaptivity is achieved by considering epsilon-net sieves for various subspaces. We also address the implementation of the cross-validated epsilon-net estimation procedure. In the context of a linear regression model, we present results of a preliminary simulation study comparing the cross-validated epsilon-net estimator to the cross-validated L^1-penalized least squares estimator (LASSO) and the least angle regression estimator (LARS). Finally, we discuss generalizations of the proposed estimation methodology to censored data structures.

Suggested Citation

  • Mark van der Laan & Sandrine Dudoit & Aad van der Vaart, 2004. "The Cross-Validated Adaptive Epsilon-Net Estimator," U.C. Berkeley Division of Biostatistics Working Paper Series 1141, Berkeley Electronic Press.
  • Handle: RePEc:bep:ucbbio:1141
    Note: oai:bepress.com:ucbbiostat-1141
    as

    Download full text from publisher

    File URL: http://www.bepress.com/cgi/viewcontent.cgi?article=1141&context=ucbbiostat
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Molinaro, Annette M. & Dudoit, Sandrine & van der Laan, M.J.Mark J., 2004. "Tree-based multivariate regression and density estimation with right-censored data," Journal of Multivariate Analysis, Elsevier, vol. 90(1), pages 154-177, July.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Luedtke Alexander R. & van der Laan Mark J., 2016. "Super-Learning of an Optimal Dynamic Treatment Rule," The International Journal of Biostatistics, De Gruyter, vol. 12(1), pages 305-332, May.
    2. Petersen, Maya L. & Molinaro, Annette M. & Sinisi, Sandra E. & van der Laan, Mark J., 2007. "Cross-validated bagged learning," Journal of Multivariate Analysis, Elsevier, vol. 98(9), pages 1693-1704, October.
    3. Porter Kristin E. & Gruber Susan & van der Laan Mark J. & Sekhon Jasjeet S., 2011. "The Relative Performance of Targeted Maximum Likelihood Estimators," The International Journal of Biostatistics, De Gruyter, vol. 7(1), pages 1-34, August.
    4. Ertefaie Ashkan & Asgharian Masoud & Stephens David A., 2018. "Variable Selection in Causal Inference using a Simultaneous Penalization Method," Journal of Causal Inference, De Gruyter, vol. 6(1), pages 1-16, March.
    5. Haight, Thaddeus J. & Wang, Yue & van der Laan, Mark J. & Tager, Ira B., 2010. "A cross-validation deletion-substitution-addition model selection algorithm: Application to marginal structural models," Computational Statistics & Data Analysis, Elsevier, vol. 54(12), pages 3080-3094, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Olivier Lopez & Xavier Milhaud & Pierre-Emmanuel Thérond, 2015. "Tree-based censored regression with applications to insurance," Working Papers hal-01141228, HAL.
    2. Laan Mark J. van der & Dudoit Sandrine & Vaart Aad W. van der, 2006. "The cross-validated adaptive epsilon-net estimator," Statistics & Risk Modeling, De Gruyter, vol. 24(3), pages 373-395, December.
    3. Yan Zhou & John McArdle, 2015. "Rationale and Applications of Survival Tree and Survival Ensemble Methods," Psychometrika, Springer;The Psychometric Society, vol. 80(3), pages 811-833, September.
    4. Olivier Lopez & Xavier Milhaud & Pierre-Emmanuel Thérond, 2016. "Tree-based censored regression with applications in insurance," Post-Print hal-01141228, HAL.
    5. Susan Athey & Julie Tibshirani & Stefan Wager, 2016. "Generalized Random Forests," Papers 1610.01271, arXiv.org, revised Apr 2018.
    6. Sinisi Sandra E. & Neugebauer Romain & van der Laan Mark J., 2006. "Cross-Validated Bagged Prediction of Survival," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 5(1), pages 1-26, May.
    7. Wei-Yin Loh, 2014. "Fifty Years of Classification and Regression Trees," International Statistical Review, International Statistical Institute, vol. 82(3), pages 329-348, December.
    8. Yifei Sun & Sy Han Chiou & Mei‐Cheng Wang, 2020. "ROC‐guided survival trees and ensembles," Biometrics, The International Biometric Society, vol. 76(4), pages 1177-1189, December.
    9. Pablo Gonzalez Ginestet & Ales Kotalik & David M. Vock & Julian Wolfson & Erin E. Gabriel, 2021. "Stacked inverse probability of censoring weighted bagging: A case study in the InfCareHIV Register," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 70(1), pages 51-65, January.
    10. Karen Lostritto & Robert L. Strawderman & Annette M. Molinaro, 2012. "A Partitioning Deletion/Substitution/Addition Algorithm for Creating Survival Risk Groups," Biometrics, The International Biometric Society, vol. 68(4), pages 1146-1156, December.
    11. Alexander Hanbo Li & Jelena Bradic, 2019. "Censored Quantile Regression Forests," Papers 1902.03327, arXiv.org.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bep:ucbbio:1141. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Christopher F. Baum (email available below). General contact details of provider: http://www.bepress.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.