Q-learning and policy iteration algorithms for stochastic shortest path problems
Author
Abstract
Suggested Citation
DOI: 10.1007/s10479-012-1128-z
Download full text from publisher
As the access to this document is restricted, you may want to search for a different version of it.
References listed on IDEAS
- Dimitri P. Bertsekas & John N. Tsitsiklis, 1991. "An Analysis of Stochastic Shortest Path Problems," Mathematics of Operations Research, INFORMS, vol. 16(3), pages 580-595, August.
- Dimitri P. Bertsekas & Huizhen Yu, 2012. "Q-Learning and Enhanced Policy Iteration in Discounted Dynamic Programming," Mathematics of Operations Research, INFORMS, vol. 37(1), pages 66-94, February.
- Eugene A. Feinberg, 1992. "On Stationary Strategies in Borel Dynamic Programming," Mathematics of Operations Research, INFORMS, vol. 17(2), pages 392-397, May.
Citations
Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
Cited by:
- Dimitri P. Bertsekas, 2018. "Proximal algorithms and temporal difference methods for solving fixed point problems," Computational Optimization and Applications, Springer, vol. 70(3), pages 709-736, July.
- Jorge Visca & Javier Baliosian, 2022. "rl4dtn: Q-Learning for Opportunistic Networks," Future Internet, MDPI, vol. 14(12), pages 1-17, November.
- Huizhen Yu & Dimitri P. Bertsekas, 2015. "A Mixed Value and Policy Iteration Method for Stochastic Control with Universally Measurable Policies," Mathematics of Operations Research, INFORMS, vol. 40(4), pages 926-968, October.
- Dimitri P. Bertsekas, 2019. "Robust shortest path planning and semicontractive dynamic programming," Naval Research Logistics (NRL), John Wiley & Sons, vol. 66(1), pages 15-37, February.
Most related items
These are the items that most often cite the same works as this one and are cited by the same works as this one.- Dimitri P. Bertsekas, 2019. "Robust shortest path planning and semicontractive dynamic programming," Naval Research Logistics (NRL), John Wiley & Sons, vol. 66(1), pages 15-37, February.
- Raymond K. Cheung & B. Muralidharan, 2000. "Dynamic Routing for Priority Shipments in LTL Service Networks," Transportation Science, INFORMS, vol. 34(1), pages 86-98, February.
- E. Nikolova & N. E. Stier-Moses, 2014. "A Mean-Risk Model for the Traffic Assignment Problem with Stochastic Travel Times," Operations Research, INFORMS, vol. 62(2), pages 366-382, April.
- Eric A. Hansen, 2017. "Error bounds for stochastic shortest path problems," Mathematical Methods of Operations Research, Springer;Gesellschaft für Operations Research (GOR);Nederlands Genootschap voor Besliskunde (NGB), vol. 86(1), pages 1-27, August.
- Fernando Ordóñez & Nicolás E. Stier-Moses, 2010. "Wardrop Equilibria with Risk-Averse Users," Transportation Science, INFORMS, vol. 44(1), pages 63-86, February.
- Matthew H. Henry & Yacov Y. Haimes, 2009. "A Comprehensive Network Security Risk Model for Process Control Networks," Risk Analysis, John Wiley & Sons, vol. 29(2), pages 223-248, February.
- Carey E. Priebe & Donniell E. Fishkind & Lowell Abrams & Christine D. Piatko, 2005. "Random disambiguation paths for traversing a mapped hazard field," Naval Research Logistics (NRL), John Wiley & Sons, vol. 52(3), pages 285-292, April.
- A. Y. Golubin, 2003. "A Note on the Convergence of Policy Iteration in Markov Decision Processes with Compact Action Spaces," Mathematics of Operations Research, INFORMS, vol. 28(1), pages 194-200, February.
- Pretolani, Daniele, 2000. "A directed hypergraph model for random time dependent shortest paths," European Journal of Operational Research, Elsevier, vol. 123(2), pages 315-324, June.
- Azadian, Farshid & Murat, Alper E. & Chinnam, Ratna Babu, 2012. "Dynamic routing of time-sensitive air cargo using real-time information," Transportation Research Part E: Logistics and Transportation Review, Elsevier, vol. 48(1), pages 355-372.
- Emin Karagözoglu & Cagri Saglam & Agah R. Turan, 2020. "Tullock Brings Perseverance and Suspense to Tug-of-War," CESifo Working Paper Series 8103, CESifo.
- Dolinskaya, Irina & Shi, Zhenyu (Edwin) & Smilowitz, Karen, 2018. "Adaptive orienteering problem with stochastic travel times," Transportation Research Part E: Logistics and Transportation Review, Elsevier, vol. 109(C), pages 1-19.
- Arthur Flajolet & Sébastien Blandin & Patrick Jaillet, 2018. "Robust Adaptive Routing Under Uncertainty," Operations Research, INFORMS, vol. 66(1), pages 210-229, January.
- Benkert, Jean-Michel & Letina, Igor & Nöldeke, Georg, 2018.
"Optimal search from multiple distributions with infinite horizon,"
Economics Letters, Elsevier, vol. 164(C), pages 15-18.
- Jean-Michel Benkert & Igor Letina & Georg Nöldeke, 2017. "Optimal search from multiple distributions with infinite horizon," ECON - Working Papers 262, Department of Economics - University of Zurich, revised Dec 2017.
- B. Curtis Eaves & Arthur F. Veinott, 2014. "Maximum-Stopping-Value Policies in Finite Markov Population Decision Chains," Mathematics of Operations Research, INFORMS, vol. 39(3), pages 597-606, August.
- Daniel Lücking & Wolfgang Stadje, 2013. "The stochastic shortest-path problem for Markov chains with infinite state space with applications to nearest-neighbor lattice chains," Mathematical Methods of Operations Research, Springer;Gesellschaft für Operations Research (GOR);Nederlands Genootschap voor Besliskunde (NGB), vol. 77(2), pages 239-264, April.
- Blai Bonet, 2007. "On the Speed of Convergence of Value Iteration on Stochastic Shortest-Path Problems," Mathematics of Operations Research, INFORMS, vol. 32(2), pages 365-373, May.
- Cervellera, Cristiano & Caviglione, Luca, 2009. "Optimization of a peer-to-peer system for efficient content replication," European Journal of Operational Research, Elsevier, vol. 196(2), pages 423-433, July.
- Chris P. Lee & Glenn M. Chertow & Stefanos A. Zenios, 2008. "Optimal Initiation and Management of Dialysis Therapy," Operations Research, INFORMS, vol. 56(6), pages 1428-1449, December.
- Dimitri P. Bertsekas, 2018. "Proximal algorithms and temporal difference methods for solving fixed point problems," Computational Optimization and Applications, Springer, vol. 70(3), pages 709-736, July.
More about this item
Keywords
Markov decision processes; Q-learning; Approximate dynamic programming; Value iteration; Policy iteration; Stochastic shortest paths; Stochastic approximation;All these keywords.
Statistics
Access and download statisticsCorrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:annopr:v:208:y:2013:i:1:p:95-132:10.1007/s10479-012-1128-z. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .
Please note that corrections may take a couple of weeks to filter through the various RePEc services.