IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2210.01282.html
   My bibliography  Save this paper

Structural Estimation of Markov Decision Processes in High-Dimensional State Space with Finite-Time Guarantees

Author

Listed:
  • Siliang Zeng
  • Mingyi Hong
  • Alfredo Garcia

Abstract

We consider the task of estimating a structural model of dynamic decisions by a human agent based upon the observable history of implemented actions and visited states. This problem has an inherent nested structure: in the inner problem, an optimal policy for a given reward function is identified while in the outer problem, a measure of fit is maximized. Several approaches have been proposed to alleviate the computational burden of this nested-loop structure, but these methods still suffer from high complexity when the state space is either discrete with large cardinality or continuous in high dimensions. Other approaches in the inverse reinforcement learning (IRL) literature emphasize policy estimation at the expense of reduced reward estimation accuracy. In this paper we propose a single-loop estimation algorithm with finite time guarantees that is equipped to deal with high-dimensional state spaces without compromising reward estimation accuracy. In the proposed algorithm, each policy improvement step is followed by a stochastic gradient step for likelihood maximization. We show that the proposed algorithm converges to a stationary solution with a finite-time guarantee. Further, if the reward is parameterized linearly, we show that the algorithm approximates the maximum likelihood estimator sublinearly. Finally, by using robotics control problems in MuJoCo and their transfer settings, we show that the proposed algorithm achieves superior performance compared with other IRL and imitation learning benchmarks.

Suggested Citation

  • Siliang Zeng & Mingyi Hong & Alfredo Garcia, 2022. "Structural Estimation of Markov Decision Processes in High-Dimensional State Space with Finite-Time Guarantees," Papers 2210.01282, arXiv.org, revised Mar 2024.
  • Handle: RePEc:arx:papers:2210.01282
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2210.01282
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Rust, John, 1987. "Optimal Replacement of GMC Bus Engines: An Empirical Model of Harold Zurcher," Econometrica, Econometric Society, vol. 55(5), pages 999-1033, September.
    2. V. Joseph Hotz & Robert A. Miller & Seth Sanders & Jeffrey Smith, 1994. "A Simulation Estimator for Dynamic Models of Discrete Choice," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 61(2), pages 265-289.
    3. Patrick Bajari & C. Lanier Benkard & Jonathan Levin, 2007. "Estimating Dynamic Models of Imperfect Competition," Econometrica, Econometric Society, vol. 75(5), pages 1331-1370, September.
    4. Victor Aguirregabiria & Pedro Mira, 2002. "Swapping the Nested Fixed Point Algorithm: A Class of Estimators for Discrete Markov Decision Models," Econometrica, Econometric Society, vol. 70(4), pages 1519-1543, July.
    5. Che‐Lin Su & Kenneth L. Judd, 2012. "Constrained Optimization Approaches to Estimation of Structural Models," Econometrica, Econometric Society, vol. 80(5), pages 2213-2230, September.
    6. V. Joseph Hotz & Robert A. Miller, 1993. "Conditional Choice Probabilities and the Estimation of Dynamic Models," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 60(3), pages 497-529.
    7. repec:nas:journl:v:115:y:2018:p:9163-9168 is not listed on IDEAS
    8. Tien Mai & Patrick Jaillet, 2020. "A Relation Analysis of Markov Decision Process Frameworks," Papers 2008.07820, arXiv.org.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Daniel Ackerberg, 2009. "A new use of importance sampling to reduce computational burden in simulation estimation," Quantitative Marketing and Economics (QME), Springer, vol. 7(4), pages 343-376, December.
    2. Victor Aguirregabiria & Victor Aguirregabiria & Aviv Nevo & Aviv Nevo, 2010. "Recent Developments in Empirical IO: Dynamic Demand and Dynamic Games," Working Papers tecipa-419, University of Toronto, Department of Economics.
    3. Victor Aguirregabiria & Arvind Magesan, 2013. "Euler Equations for the Estimation of Dynamic Discrete Choice Structural Models," Advances in Econometrics, in: Structural Econometric Models, volume 31, pages 3-44, Emerald Group Publishing Limited.
    4. Peter Arcidiacono & Patrick Bayer & Jason R. Blevins & Paul B. Ellickson, 2016. "Estimation of Dynamic Discrete Choice Models in Continuous Time with an Application to Retail Competition," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 83(3), pages 889-931.
    5. Adam Dearing & Jason R. Blevins, 2019. "Efficient and Convergent Sequential Pseudo-Likelihood Estimation of Dynamic Discrete Games," Papers 1912.10488, arXiv.org, revised Apr 2024.
    6. Sebastian Galiani & Juan Pantano, 2021. "Structural Models: Inception and Frontier," NBER Working Papers 28698, National Bureau of Economic Research, Inc.
    7. Joao Macieira, 2010. "Oblivious Equilibrium in Dynamic Discrete Games," 2010 Meeting Papers 680, Society for Economic Dynamics.
    8. Karun Adusumilli & Dita Eckardt, 2019. "Temporal-Difference estimation of dynamic discrete choice models," Papers 1912.09509, arXiv.org, revised Dec 2022.
    9. Hiroyuki Kasahara & Katsumi Shimotsu, 2012. "Sequential Estimation of Structural Models With a Fixed Point Constraint," Econometrica, Econometric Society, vol. 80(5), pages 2303-2319, September.
    10. Koray Cosguner & Tat Y. Chan & P. B. (Seethu) Seetharaman, 2018. "Dynamic Pricing in a Distribution Channel in the Presence of Switching Costs," Management Science, INFORMS, vol. 64(3), pages 1212-1229, March.
    11. Aguirregabiria, Victor & Mira, Pedro, 2010. "Dynamic discrete choice structural models: A survey," Journal of Econometrics, Elsevier, vol. 156(1), pages 38-67, May.
    12. Federico A. Bugni & Jackson Bunting & Takuya Ura, 2020. "Testing homogeneity in dynamic discrete games in finite samples," Papers 2010.02297, arXiv.org, revised Aug 2024.
    13. Haizhen Lin, 2015. "Quality Choice And Market Structure: A Dynamic Analysis Of Nursing Home Oligopolies," International Economic Review, Department of Economics, University of Pennsylvania and Osaka University Institute of Social and Economic Research Association, vol. 56(4), pages 1261-1290, November.
    14. Joseph Cullen & Nicolas Schutz & Oleksandr Shcherbakov, 2020. "The Welfare Effects of Early Termination Fees in the US Wireless Industry," CRC TR 224 Discussion Paper Series crctr224_2020_247, University of Bonn and University of Mannheim, Germany.
    15. Myrto Kalouptsidi & Paul T. Scott & Eduardo Souza-Rodrigues, 2018. "Linear IV Regression Estimators for Structural Dynamic Discrete Choice Models," NBER Working Papers 25134, National Bureau of Economic Research, Inc.
    16. Srisuma, Sorawoot & Linton, Oliver, 2012. "Semiparametric estimation of Markov decision processes with continuous state space," Journal of Econometrics, Elsevier, vol. 166(2), pages 320-341.
    17. Kalouptsidi, Myrto & Scott, Paul T. & Souza-Rodrigues, Eduardo, 2021. "Linear IV regression estimators for structural dynamic discrete choice models," Journal of Econometrics, Elsevier, vol. 222(1), pages 778-804.
    18. Hu, Yingyao & Shum, Matthew, 2012. "Nonparametric identification of dynamic models with unobserved state variables," Journal of Econometrics, Elsevier, vol. 171(1), pages 32-44.
    19. Bruneel-Zupanc, Christophe Alain, 2021. "Discrete-Continuous Dynamic Choice Models: Identification and Conditional Choice Probability Estimation," TSE Working Papers 21-1185, Toulouse School of Economics (TSE).
    20. Jason R. Blevins, 2024. "Leveraging Uniformization and Sparsity for Computation of Continuous Time Dynamic Discrete Choice Games," Papers 2407.14914, arXiv.org.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2210.01282. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.