Q-learning and policy iteration algorithms for stochastic shortest path problems
Author
Abstract
Suggested Citation
DOI: 10.1007/s10479-012-1128-z
Download full text from publisher
As the access to this document is restricted, you may want to search for a different version of it.
References listed on IDEAS
- Dimitri P. Bertsekas & John N. Tsitsiklis, 1991. "An Analysis of Stochastic Shortest Path Problems," Mathematics of Operations Research, INFORMS, vol. 16(3), pages 580-595, August.
- Dimitri P. Bertsekas & Huizhen Yu, 2012. "Q-Learning and Enhanced Policy Iteration in Discounted Dynamic Programming," Mathematics of Operations Research, INFORMS, vol. 37(1), pages 66-94, February.
- Eugene A. Feinberg, 1992. "On Stationary Strategies in Borel Dynamic Programming," Mathematics of Operations Research, INFORMS, vol. 17(2), pages 392-397, May.
Citations
Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
Cited by:
- Dimitri P. Bertsekas, 2018. "Proximal algorithms and temporal difference methods for solving fixed point problems," Computational Optimization and Applications, Springer, vol. 70(3), pages 709-736, July.
- Jorge Visca & Javier Baliosian, 2022. "rl4dtn: Q-Learning for Opportunistic Networks," Future Internet, MDPI, vol. 14(12), pages 1-17, November.
- Huizhen Yu & Dimitri P. Bertsekas, 2015. "A Mixed Value and Policy Iteration Method for Stochastic Control with Universally Measurable Policies," Mathematics of Operations Research, INFORMS, vol. 40(4), pages 926-968, October.
- Dimitri P. Bertsekas, 2019. "Robust shortest path planning and semicontractive dynamic programming," Naval Research Logistics (NRL), John Wiley & Sons, vol. 66(1), pages 15-37, February.
Most related items
These are the items that most often cite the same works as this one and are cited by the same works as this one.- Dimitri P. Bertsekas, 2019. "Robust shortest path planning and semicontractive dynamic programming," Naval Research Logistics (NRL), John Wiley & Sons, vol. 66(1), pages 15-37, February.
- E. Nikolova & N. E. Stier-Moses, 2014. "A Mean-Risk Model for the Traffic Assignment Problem with Stochastic Travel Times," Operations Research, INFORMS, vol. 62(2), pages 366-382, April.
- Carey E. Priebe & Donniell E. Fishkind & Lowell Abrams & Christine D. Piatko, 2005. "Random disambiguation paths for traversing a mapped hazard field," Naval Research Logistics (NRL), John Wiley & Sons, vol. 52(3), pages 285-292, April.
- Pretolani, Daniele, 2000. "A directed hypergraph model for random time dependent shortest paths," European Journal of Operational Research, Elsevier, vol. 123(2), pages 315-324, June.
- Azadian, Farshid & Murat, Alper E. & Chinnam, Ratna Babu, 2012. "Dynamic routing of time-sensitive air cargo using real-time information," Transportation Research Part E: Logistics and Transportation Review, Elsevier, vol. 48(1), pages 355-372.
- Emin Karagözoglu & Cagri Saglam & Agah R. Turan, 2020. "Tullock Brings Perseverance and Suspense to Tug-of-War," CESifo Working Paper Series 8103, CESifo.
- Arthur Flajolet & Sébastien Blandin & Patrick Jaillet, 2018. "Robust Adaptive Routing Under Uncertainty," Operations Research, INFORMS, vol. 66(1), pages 210-229, January.
- Benkert, Jean-Michel & Letina, Igor & Nöldeke, Georg, 2018.
"Optimal search from multiple distributions with infinite horizon,"
Economics Letters, Elsevier, vol. 164(C), pages 15-18.
- Jean-Michel Benkert & Igor Letina & Georg Nöldeke, 2017. "Optimal search from multiple distributions with infinite horizon," ECON - Working Papers 262, Department of Economics - University of Zurich, revised Dec 2017.
- Blai Bonet, 2007. "On the Speed of Convergence of Value Iteration on Stochastic Shortest-Path Problems," Mathematics of Operations Research, INFORMS, vol. 32(2), pages 365-373, May.
- James L. Bander & Chelsea C. White, 2002. "A Heuristic Search Approach for a Nonstationary Stochastic Shortest Path Problem with Terminal Cost," Transportation Science, INFORMS, vol. 36(2), pages 218-230, May.
- Matsubayashi, Nobuo & Nishino, Hisakazu, 1999. "An application of Lemke's method to a class of Markov decision problems," European Journal of Operational Research, Elsevier, vol. 116(3), pages 584-590, August.
- Özlem Çavuş & Andrzej Ruszczyński, 2014. "Computational Methods for Risk-Averse Undiscounted Transient Markov Models," Operations Research, INFORMS, vol. 62(2), pages 401-417, April.
- Huizhen Yu & Dimitri P. Bertsekas, 2013. "On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems," Mathematics of Operations Research, INFORMS, vol. 38(2), pages 209-227, May.
- Arie Leizarowitz, 2003. "An Algorithm to Identify and Compute Average Optimal Policies in Multichain Markov Decision Processes," Mathematics of Operations Research, INFORMS, vol. 28(3), pages 553-586, August.
- Fengying Li & Yuqiang Li & Xianyi Wu, 2024. "Minimax weight learning for absorbing MDPs," Statistical Papers, Springer, vol. 65(6), pages 3545-3582, August.
- Guillot, Matthieu & Stauffer, Gautier, 2020. "The Stochastic Shortest Path Problem: A polyhedral combinatorics perspective," European Journal of Operational Research, Elsevier, vol. 285(1), pages 148-158.
- Karagözoğlu, Emin & Sağlam, Çağrı & Turan, Agah R., 2021. "Perseverance and suspense in tug-of-war," Journal of Mathematical Economics, Elsevier, vol. 95(C).
- Jorge Lorca & Emerson Melo, 2020. "Choice Aversion in Directed Networks," Working Papers Central Bank of Chile 879, Central Bank of Chile.
- Raymond K. Cheung & B. Muralidharan, 2000. "Dynamic Routing for Priority Shipments in LTL Service Networks," Transportation Science, INFORMS, vol. 34(1), pages 86-98, February.
- Eric A. Hansen, 2017. "Error bounds for stochastic shortest path problems," Mathematical Methods of Operations Research, Springer;Gesellschaft für Operations Research (GOR);Nederlands Genootschap voor Besliskunde (NGB), vol. 86(1), pages 1-27, August.
More about this item
Keywords
Markov decision processes; Q-learning; Approximate dynamic programming; Value iteration; Policy iteration; Stochastic shortest paths; Stochastic approximation;All these keywords.
Statistics
Access and download statisticsCorrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:annopr:v:208:y:2013:i:1:p:95-132:10.1007/s10479-012-1128-z. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .
Please note that corrections may take a couple of weeks to filter through the various RePEc services.