Structural Estimation of Markov Decision Processes in High-Dimensional State Space with Finite-Time Guarantees

My bibliography Save this paper

Structural Estimation of Markov Decision Processes in High-Dimensional State Space with Finite-Time Guarantees

Author

Listed:

Siliang Zeng
Mingyi Hong
Alfredo Garcia

Registered:

Abstract

We consider the task of estimating a structural model of dynamic decisions by a human agent based upon the observable history of implemented actions and visited states. This problem has an inherent nested structure: in the inner problem, an optimal policy for a given reward function is identified while in the outer problem, a measure of fit is maximized. Several approaches have been proposed to alleviate the computational burden of this nested-loop structure, but these methods still suffer from high complexity when the state space is either discrete with large cardinality or continuous in high dimensions. Other approaches in the inverse reinforcement learning (IRL) literature emphasize policy estimation at the expense of reduced reward estimation accuracy. In this paper we propose a single-loop estimation algorithm with finite time guarantees that is equipped to deal with high-dimensional state spaces without compromising reward estimation accuracy. In the proposed algorithm, each policy improvement step is followed by a stochastic gradient step for likelihood maximization. We show that the proposed algorithm converges to a stationary solution with a finite-time guarantee. Further, if the reward is parameterized linearly, we show that the algorithm approximates the maximum likelihood estimator sublinearly. Finally, by using robotics control problems in MuJoCo and their transfer settings, we show that the proposed algorithm achieves superior performance compared with other IRL and imitation learning benchmarks.

Suggested Citation

Siliang Zeng & Mingyi Hong & Alfredo Garcia, 2022. "Structural Estimation of Markov Decision Processes in High-Dimensional State Space with Finite-Time Guarantees," Papers 2210.01282, arXiv.org, revised Mar 2024.

Handle: RePEc:arx:papers:2210.01282

Download full text from publisher

References listed on IDEAS

Rust, John, 1987. "Optimal Replacement of GMC Bus Engines: An Empirical Model of Harold Zurcher," Econometrica, Econometric Society, vol. 55(5), pages 999-1033, September.
V. Joseph Hotz & Robert A. Miller & Seth Sanders & Jeffrey Smith, 1994. "A Simulation Estimator for Dynamic Models of Discrete Choice," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 61(2), pages 265-289.
- Hotz, J.V. & Miller, R.A. & Sanders, S. & Smith, J., 1992. "A Simulation Estimator for Dynamic Models of Discrete Choice," GSIA Working Papers 1992-13, Carnegie Mellon University, Tepper School of Business.
- V. Joseph Hotz & Robert A. Miller & Seth Sanders & Jeffrey Smith, 1992. "A Simulation Estimator for Dynamic Models of Discrete Choice," Working Papers 9205, Harris School of Public Policy Studies, University of Chicago.
Patrick Bajari & C. Lanier Benkard & Jonathan Levin, 2007. "Estimating Dynamic Models of Imperfect Competition," Econometrica, Econometric Society, vol. 75(5), pages 1331-1370, September.
- Patrick Bajari & C. Lanier Benkard & Jonathan Levin, 2004. "Estimating Dynamic Models of Imperfect Competition," NBER Working Papers 10450, National Bureau of Economic Research, Inc.
- Bajari, Patrick & Benkard, C. Lanier & Levin, Jonathan, 2007. "Estimating Dynamic Models of Imperfect Competition," Research Papers 1852r1, Stanford University, Graduate School of Business.
- Jonathan Levin (Stanford University) & Pat Bajari & Lanier Benkard, 2004. "Estimating Dynamic Models of Imperfect Competition," Econometric Society 2004 North American Winter Meetings 627, Econometric Society.
- J. Levin & P. Bajari, 2004. "Estimating Dynamic Models of Imperfect Competition," 2004 Meeting Papers 579, Society for Economic Dynamics.
Victor Aguirregabiria & Pedro Mira, 2002. "Swapping the Nested Fixed Point Algorithm: A Class of Estimators for Discrete Markov Decision Models," Econometrica, Econometric Society, vol. 70(4), pages 1519-1543, July.
- Victor Aguirregabiria & Pedro Mira, 1999. "Swapping the Nested Fixed-Point Algorithm: a Class of Estimators for Discrete Markov Decision Models," Computing in Economics and Finance 1999 332, Society for Computational Economics.
- Víctor Aguirregabiria & Pedro Mira, 1999. "Swapping the Nested Fixed Point Algorithm: A Class of Estimators for Discrete Markov Decision Models," Working Papers wp1999_9904, CEMFI.
Che‐Lin Su & Kenneth L. Judd, 2012. "Constrained Optimization Approaches to Estimation of Structural Models," Econometrica, Econometric Society, vol. 80(5), pages 2213-2230, September.
- Che-Lin Su & Kenneth L. Judd, 2008. "Constrainted Optimization Approaches to Estimation of Structural Models," Discussion Papers 1460, Northwestern University, Center for Mathematical Studies in Economics and Management Science.
V. Joseph Hotz & Robert A. Miller, 1993. "Conditional Choice Probabilities and the Estimation of Dynamic Models," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 60(3), pages 497-529.
- Hotz, V.J. & Miller, R.A., 1991. "Conditional Choice Probabilities and the Estimation of Dynamic Models," GSIA Working Papers 1992-12, Carnegie Mellon University, Tepper School of Business.
- V. Joseph Hotz & Robert A. Miller, 1992. "Conditional Choice Probabilities and the Estimation of Dynamic Models," Working Papers 9202, Harris School of Public Policy Studies, University of Chicago.
repec:nas:journl:v:115:y:2018:p:9163-9168 is not listed on IDEAS
Tien Mai & Patrick Jaillet, 2020. "A Relation Analysis of Markov Decision Process Frameworks," Papers 2008.07820, arXiv.org.

Full references (including those not matched with items on IDEAS)

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Daniel Ackerberg, 2009. "A new use of importance sampling to reduce computational burden in simulation estimation," Quantitative Marketing and Economics (QME), Springer, vol. 7(4), pages 343-376, December.
- Daniel A. Ackerberg, 2001. "A New Use of Importance Sampling to Reduce Computational Burden in Simulation Estimation," NBER Technical Working Papers 0273, National Bureau of Economic Research, Inc.
Victor Aguirregabiria & Victor Aguirregabiria & Aviv Nevo & Aviv Nevo, 2010. "Recent Developments in Empirical IO: Dynamic Demand and Dynamic Games," Working Papers tecipa-419, University of Toronto, Department of Economics.
- Aguirregabiria, Victor & Nevo, Aviv, 2010. "Recent developments in empirical IO: dynamic demand and dynamic games," MPRA Paper 27814, University Library of Munich, Germany.
Victor Aguirregabiria & Arvind Magesan, 2013. "Euler Equations for the Estimation of Dynamic Discrete Choice Structural Models," Advances in Econometrics, in: Structural Econometric Models, volume 31, pages 3-44, Emerald Group Publishing Limited.
- Victor Aguirregabiria & Arvind Magesan, 2013. "Euler Equations for the Estimation of Dynamic Discrete Choice Structural Models," Working Papers tecipa-489, University of Toronto, Department of Economics.
Peter Arcidiacono & Patrick Bayer & Jason R. Blevins & Paul B. Ellickson, 2016. "Estimation of Dynamic Discrete Choice Models in Continuous Time with an Application to Retail Competition," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 83(3), pages 889-931.
- Peter Arcidiacono & Patrick Bayer & Jason R. Blevins & Paul B. Ellickson, 2012. "Estimation of Dynamic Discrete Choice Models in Continuous Time with an Application to Retail Competition," NBER Working Papers 18449, National Bureau of Economic Research, Inc.
Adam Dearing & Jason R. Blevins, 2019. "Efficient and Convergent Sequential Pseudo-Likelihood Estimation of Dynamic Discrete Games," Papers 1912.10488, arXiv.org, revised Apr 2024.
Sebastian Galiani & Juan Pantano, 2021. "Structural Models: Inception and Frontier," NBER Working Papers 28698, National Bureau of Economic Research, Inc.
Joao Macieira, 2010. "Oblivious Equilibrium in Dynamic Discrete Games," 2010 Meeting Papers 680, Society for Economic Dynamics.
Karun Adusumilli & Dita Eckardt, 2019. "Temporal-Difference estimation of dynamic discrete choice models," Papers 1912.09509, arXiv.org, revised Dec 2022.
Hiroyuki Kasahara & Katsumi Shimotsu, 2012. "Sequential Estimation of Structural Models With a Fixed Point Constraint," Econometrica, Econometric Society, vol. 80(5), pages 2303-2319, September.
- Hiroyuki Kasahara & Katsumi Shimotsu, 2008. "Sequential Estimation Of Structural Models With A Fixed Point Constraint," Working Paper 1192, Economics Department, Queen's University.
- Hiroyuki Kasahara & Katsumi Shimotsu, 2008. "Sequential Estimation of Structural Models with a Fixed Point Constraint," CESifo Working Paper Series 2507, CESifo.
- Kasahara, Hiroyuki & 笠原, 博幸 & Shimotsu, Katsumi & 下津, 克己, 2009. "Sequential Estimation of Structural Models with a Fixed Point Constraint," Discussion Papers 2009-18, Graduate School of Economics, Hitotsubashi University.
Koray Cosguner & Tat Y. Chan & P. B. (Seethu) Seetharaman, 2018. "Dynamic Pricing in a Distribution Channel in the Presence of Switching Costs," Management Science, INFORMS, vol. 64(3), pages 1212-1229, March.
Aguirregabiria, Victor & Mira, Pedro, 2010. "Dynamic discrete choice structural models: A survey," Journal of Econometrics, Elsevier, vol. 156(1), pages 38-67, May.
- Víctor Aguirregabiria & Pedro Mira, 2007. "Dynamic Discrete Choice Structural Models: A Survey," Working Papers wp2007_0711, CEMFI.
- Victor Aguirregabiria & Pedro mira, 2007. "Dynamic Discrete Choice Structural Models: A Survey," Working Papers tecipa-297, University of Toronto, Department of Economics.
Federico A. Bugni & Jackson Bunting & Takuya Ura, 2020. "Testing homogeneity in dynamic discrete games in finite samples," Papers 2010.02297, arXiv.org, revised Aug 2024.
Haizhen Lin, 2015. "Quality Choice And Market Structure: A Dynamic Analysis Of Nursing Home Oligopolies," International Economic Review, Department of Economics, University of Pennsylvania and Osaka University Institute of Social and Economic Research Association, vol. 56(4), pages 1261-1290, November.
Joseph Cullen & Nicolas Schutz & Oleksandr Shcherbakov, 2020. "The Welfare Effects of Early Termination Fees in the US Wireless Industry," CRC TR 224 Discussion Paper Series crctr224_2020_247, University of Bonn and University of Mannheim, Germany.
- Schutz, Nicolas & Cullen, Joseph & Shcherbakov, Oleksandr, 2020. "The Welfare Effects of Early Termination Fees in the US Wireless Industry," CEPR Discussion Papers 15506, C.E.P.R. Discussion Papers.
Myrto Kalouptsidi & Paul T. Scott & Eduardo Souza-Rodrigues, 2018. "Linear IV Regression Estimators for Structural Dynamic Discrete Choice Models," NBER Working Papers 25134, National Bureau of Economic Research, Inc.
Srisuma, Sorawoot & Linton, Oliver, 2012. "Semiparametric estimation of Markov decision processes with continuous state space," Journal of Econometrics, Elsevier, vol. 166(2), pages 320-341.
- Linton, Oliver & Srisuma, Sorawoot, 2010. "Semiparametric estimation of Markov decision processeswith continuous state space," LSE Research Online Documents on Economics 58187, London School of Economics and Political Science, LSE Library.
- Oliver Linton & Sorawoot Srisuma, 2010. "Semiparametric Estimation of Markov Decision Processeswith Continuous State Space," STICERD - Econometrics Paper Series 550, Suntory and Toyota International Centres for Economics and Related Disciplines, LSE.
Kalouptsidi, Myrto & Scott, Paul T. & Souza-Rodrigues, Eduardo, 2021. "Linear IV regression estimators for structural dynamic discrete choice models," Journal of Econometrics, Elsevier, vol. 222(1), pages 778-804.
Hu, Yingyao & Shum, Matthew, 2012. "Nonparametric identification of dynamic models with unobserved state variables," Journal of Econometrics, Elsevier, vol. 171(1), pages 32-44.
- Yingyao Hu & Matthew Shum, 2008. "Nonparametric Identification of Dynamic Models with Unobserved State Variables," Economics Working Paper Archive 543, The Johns Hopkins University,Department of Economics.
- Yingyao Hu & Matthew Shum, 2008. "Nonparametric identification of dynamic models with unobserved state variables," CeMMAP working papers CWP13/08, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
Bruneel-Zupanc, Christophe Alain, 2021. "Discrete-Continuous Dynamic Choice Models: Identification and Conditional Choice Probability Estimation," TSE Working Papers 21-1185, Toulouse School of Economics (TSE).
Jason R. Blevins, 2024. "Leveraging Uniformization and Sparsity for Computation of Continuous Time Dynamic Discrete Choice Games," Papers 2407.14914, arXiv.org.

More about this item

NEP fields

This paper has been announced in the following NEP Reports:

NEP-ECM-2022-10-31 (Econometrics)

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2210.01282. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Structural Estimation of Markov Decision Processes in High-Dimensional State Space with Finite-Time Guarantees

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Most related items

More about this item

NEP fields

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data