IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2201.06169.html
   My bibliography  Save this paper

On Well-posedness and Minimax Optimal Rates of Nonparametric Q-function Estimation in Off-policy Evaluation

Author

Listed:
  • Xiaohong Chen
  • Zhengling Qi

Abstract

We study the off-policy evaluation (OPE) problem in an infinite-horizon Markov decision process with continuous states and actions. We recast the $Q$-function estimation into a special form of the nonparametric instrumental variables (NPIV) estimation problem. We first show that under one mild condition the NPIV formulation of $Q$-function estimation is well-posed in the sense of $L^2$-measure of ill-posedness with respect to the data generating distribution, bypassing a strong assumption on the discount factor $\gamma$ imposed in the recent literature for obtaining the $L^2$ convergence rates of various $Q$-function estimators. Thanks to this new well-posed property, we derive the first minimax lower bounds for the convergence rates of nonparametric estimation of $Q$-function and its derivatives in both sup-norm and $L^2$-norm, which are shown to be the same as those for the classical nonparametric regression (Stone, 1982). We then propose a sieve two-stage least squares estimator and establish its rate-optimality in both norms under some mild conditions. Our general results on the well-posedness and the minimax lower bounds are of independent interest to study not only other nonparametric estimators for $Q$-function but also efficient estimation on the value of any target policy in off-policy settings.

Suggested Citation

  • Xiaohong Chen & Zhengling Qi, 2022. "On Well-posedness and Minimax Optimal Rates of Nonparametric Q-function Estimation in Off-policy Evaluation," Papers 2201.06169, arXiv.org, revised Jun 2022.
  • Handle: RePEc:arx:papers:2201.06169
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2201.06169
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Chen, Xiaohong & Reiss, Markus, 2011. "On Rate Optimality For Ill-Posed Inverse Problems In Econometrics," Econometric Theory, Cambridge University Press, vol. 27(3), pages 497-521, June.
    2. Xiaohong Chen & Timothy M. Christensen, 2018. "Optimal sup‐norm rates and uniform inference on nonlinear functionals of nonparametric IV regression," Quantitative Economics, Econometric Society, vol. 9(1), pages 39-84, March.
    3. S. Darolles & Y. Fan & J. P. Florens & E. Renault, 2011. "Nonparametric Instrumental Regression," Econometrica, Econometric Society, vol. 79(5), pages 1541-1565, September.
    4. Chen, Xiaohong, 2007. "Large Sample Sieve Estimation of Semi-Nonparametric Models," Handbook of Econometrics, in: J.J. Heckman & E.E. Leamer (ed.), Handbook of Econometrics, edition 1, volume 6, chapter 76, Elsevier.
    5. Chen, Xiaohong & Christensen, Timothy M., 2015. "Optimal uniform convergence rates and asymptotic normality for series estimators under weak dependence and weak conditions," Journal of Econometrics, Elsevier, vol. 188(2), pages 447-465.
    6. Richard Blundell & Xiaohong Chen & Dennis Kristensen, 2007. "Semi-Nonparametric IV Estimation of Shape-Invariant Engel Curves," Econometrica, Econometric Society, vol. 75(6), pages 1613-1669, November.
    7. Whitney K. Newey & James L. Powell, 2003. "Instrumental Variable Estimation of Nonparametric Models," Econometrica, Econometric Society, vol. 71(5), pages 1565-1578, September.
    8. Chunrong Ai & Xiaohong Chen, 2003. "Efficient Estimation of Models with Conditional Moment Restrictions Containing Unknown Functions," Econometrica, Econometric Society, vol. 71(6), pages 1795-1843, November.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Xiaohong Chen & Yuan Liao & Weichen Wang, 2022. "Inference on Time Series Nonparametric Conditional Moment Restrictions Using General Sieves," Papers 2301.00092, arXiv.org, revised Jan 2023.
    2. Shi, Chengchun & Luo, Shikai & Le, Yuan & Zhu, Hongtu & Song, Rui, 2022. "Statistically efficient advantage learning for offline reinforcement learning in infinite horizons," LSE Research Online Documents on Economics 115598, London School of Economics and Political Science, LSE Library.
    3. Zhang, Yingying & Shi, Chengchun & Luo, Shikai, 2023. "Conformal off-policy prediction," LSE Research Online Documents on Economics 118250, London School of Economics and Political Science, LSE Library.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Breunig, Christoph & Mammen, Enno & Simoni, Anna, 2018. "Nonparametric estimation in case of endogenous selection," Journal of Econometrics, Elsevier, vol. 202(2), pages 268-285.
    2. Xiaohong Chen & Demian Pouzo, 2012. "Estimation of Nonparametric Conditional Moment Models With Possibly Nonsmooth Generalized Residuals," Econometrica, Econometric Society, vol. 80(1), pages 277-321, January.
    3. Xiaohong Chen & Timothy M. Christensen, 2015. "Optimal sup-norm rates, adaptivity and inference in nonparametric instrumental variables estimation," CeMMAP working papers 32/15, Institute for Fiscal Studies.
    4. Michael Jansson & Demian Pouzo, 2017. "Towards a General Large Sample Theory for Regularized Estimators," Papers 1712.07248, arXiv.org, revised Jul 2020.
    5. Xiaohong Chen & Timothy Christensen, 2013. "Optimal Sup-norm Rates, Adaptivity and Inference in Nonparametric Instrumental Variables Estimation," Cowles Foundation Discussion Papers 1923R, Cowles Foundation for Research in Economics, Yale University, revised Apr 2015.
    6. Xiaohong Chen & Demian Pouzo, 2015. "Sieve Wald and QLR Inferences on Semi/Nonparametric Conditional Moment Models," Econometrica, Econometric Society, vol. 83(3), pages 1013-1079, May.
    7. Chen, Xiaohong & Pouzo, Demian, 2008. "Estimation of Nonparametric Conditional Moment Models with Possibly Nonsmooth Moments," Working Papers 47, Yale University, Department of Economics.
    8. Breunig, Christoph, 2021. "Varying random coefficient models," Journal of Econometrics, Elsevier, vol. 221(2), pages 381-408.
    9. Hoshino, Tadao, 2022. "Sieve IV estimation of cross-sectional interaction models with nonparametric endogenous effect," Journal of Econometrics, Elsevier, vol. 229(2), pages 263-275.
    10. Liao, Yuan & Jiang, Wenxin, 2011. "Posterior consistency of nonparametric conditional moment restricted models," MPRA Paper 38700, University Library of Munich, Germany.
    11. Andrew Bennett & Nathan Kallus & Xiaojie Mao & Whitney Newey & Vasilis Syrgkanis & Masatoshi Uehara, 2023. "Minimax Instrumental Variable Regression and $L_2$ Convergence Guarantees without Identification or Closedness," Papers 2302.05404, arXiv.org.
    12. Zheng Fang & Juwon Seo, 2019. "A Projection Framework for Testing Shape Restrictions That Form Convex Cones," Papers 1910.07689, arXiv.org, revised Sep 2021.
    13. Xiaohong Chen & Victor Chernozhukov & Sokbae Lee & Whitney K. Newey, 2014. "Local Identification of Nonparametric and Semiparametric Models," Econometrica, Econometric Society, vol. 82(2), pages 785-809, March.
    14. Jean‐Pierre Florens & Jan Johannes & Sébastien Van Bellegem, 2012. "Instrumental regression in partially linear models," Econometrics Journal, Royal Economic Society, vol. 15(2), pages 304-324, June.
    15. Chen, Xiaohong & Pouzo, Demian, 2009. "Efficient estimation of semiparametric conditional moment models with possibly nonsmooth residuals," Journal of Econometrics, Elsevier, vol. 152(1), pages 46-60, September.
    16. Florens, Jean-Pierre & Simoni, Anna, 2016. "Regularizing Priors For Linear Inverse Problems," Econometric Theory, Cambridge University Press, vol. 32(1), pages 71-121, February.
    17. Victor Chernozhukov & Whitney Newey & Rahul Singh & Vasilis Syrgkanis, 2020. "Adversarial Estimation of Riesz Representers," Papers 2101.00009, arXiv.org, revised Apr 2024.
    18. Frédérique Fève & Jean-Pierre Florens, 2010. "The practice of non-parametric estimation by solving inverse problems: the example of transformation models," Econometrics Journal, Royal Economic Society, vol. 13(3), pages 1-27, October.
    19. Jad Beyhum & Elia Lapenta & Pascal Lavergne, 2023. "One-step smoothing splines instrumental regression," Papers 2307.14867, arXiv.org, revised Apr 2024.
    20. Florens, Jean-Pierre & Simoni, Anna, 2012. "Nonparametric estimation of an instrumental regression: A quasi-Bayesian approach based on regularized posterior," Journal of Econometrics, Elsevier, vol. 170(2), pages 458-475.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2201.06169. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.