Long-term values in Markov Decision Processes and Repeated Games, and a new distance for probability spaces

My bibliography Save this paper

Long-term values in Markov Decision Processes and Repeated Games, and a new distance for probability spaces

Author

Listed:

Jérôme Renault
(TSE-R - Toulouse School of Economics - UT Capitole - Université Toulouse Capitole - UT - Université de Toulouse - INRA - Institut National de la Recherche Agronomique - EHESS - École des hautes études en sciences sociales - CNRS - Centre National de la Recherche Scientifique)
Xavier Venel
(PSE - Paris School of Economics - UP1 - Université Paris 1 Panthéon-Sorbonne - ENS-PSL - École normale supérieure - Paris - PSL - Université Paris Sciences et Lettres - EHESS - École des hautes études en sciences sociales - ENPC - École nationale des ponts et chaussées - CNRS - Centre National de la Recherche Scientifique - INRAE - Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement, CES - Centre d'économie de la Sorbonne - UP1 - Université Paris 1 Panthéon-Sorbonne - CNRS - Centre National de la Recherche Scientifique)

Registered:

Jerome Renault

Abstract

We study long-term Markov Decision Processes and Gambling Houses, with applications to any partial observation MDPs with finitely many states and zero-sum repeated games with an informed controller. We consider a decision-maker which is maximizing the weighted sum t≥1 θtrt, where rt is the expected reward of the t-th stage. We prove the existence of a very strong notion of long-term value called general uniform value, representing the fact that the decision-maker can play well independently of the evaluations (θt) t≥1 over stages, provided the total variation (or impatience) t≥1 |θt+1 − θt| is small enough. This result generalizes previous results of Rosenberg, Solan and Vieille [35] and Renault [31] that focus on arithmetic means and discounted evaluations. Moreover, we give a variational characterization of the general uniform value via the introduction of appropriate invariant measures for the decision problems, generalizing the fundamental theorem of gambling or the Aumann-Maschler cavu formula for repeated games with incomplete information. Apart the introduction of appropriate invariant measures, the main innovation in our proofs is the introduction of a new metric d * such that partial observation MDP's and repeated games with an informed controller may be associated to auxiliary problems that are non-expansive with respect to d *. Given two Borel probabilities over a compact subset X of a normed vector space, we define d * (u, v) = sup f ∈D 1 |u(f) − v(f)|, where D1 is the set of functions satisfying: ∀x, y ∈ X, ∀a, b ≥ 0, af (x) − bf (y) ≤ ax − by. The particular case where X is a simplex endowed with the L 1-norm is particularly interesting: d * is the largest distance over the probabilities with finite support over X which makes every disintegration non-expansive. Moreover, we obtain a Kantorovich-Rubinstein type duality formula for d * (u, v) involving couples of measures (α, β) over X × X such that the first marginal of α is u and the second marginal of β is v. MSC Classification: Primary: 90C40 ; Secondary: 60J20, 91A15.

Suggested Citation

Jérôme Renault & Xavier Venel, 2017. "Long-term values in Markov Decision Processes and Repeated Games, and a new distance for probability spaces," Post-Print hal-01396680, HAL.

Handle: RePEc:hal:journl:hal-01396680
DOI: 10.1287/moor.2016.0814

Download full text from publisher

To our knowledge, this item is not available for download. To find whether it is available, there are three options:
1. Check below whether another version of this item is available online.
2. Check on the provider's web page whether it is in fact available.
3. Perform a search for a similarly titled item that would be available.

Other versions of this item:

Jérôme Renault & Xavier Venel, 2017. "Long-Term Values in Markov Decision Processes and Repeated Games, and a New Distance for Probability Spaces," Mathematics of Operations Research, INFORMS, vol. 42(2), pages 349-376, May.

Jérôme Renault & Xavier Venel, 2017. "Long-term values in Markov Decision Processes and Repeated Games, and a new distance for probability spaces," Université Paris1 Panthéon-Sorbonne (Post-Print and Working Papers) hal-01396680, HAL.
Jérôme Renault & Xavier Venel, 2017. "Long-term values in Markov Decision Processes and Repeated Games, and a new distance for probability spaces," PSE-Ecole d'économie de Paris (Postprint) hal-01396680, HAL.

References listed on IDEAS

MERTENS, Jean-François, 1987. "Repeated games. Proceedings of the International Congress of Mathematicians," LIDAM Reprints CORE 788, Université catholique de Louvain, Center for Operations Research and Econometrics (CORE).
Jérôme Renault, 2006. "The Value of Markov Chain Games with Lack of Information on One Side," Mathematics of Operations Research, INFORMS, vol. 31(3), pages 490-512, August.
VIEILLE, Nicolas & ROSENBERG, Dinah & SOLAN, Eilon, 2002. "Stochastic games with a single controller and incomplete information," HEC Research Papers Series 754, HEC Paris.
- Nicolas Vieille & Eilon Solan & Dinah Rosenberg, 2004. "Stochastic Games with a Single Controller and Incomplete Information," Post-Print hal-00464938, HAL.
- Dinah Rosenberg & Eilon Solan & Nicolas Vieille, 2002. "Stochastic Games with a Single Controller and Incomplete Information," Working Papers hal-00593394, HAL.
- Dinah Rosenberg & Eilon Solan & Nicolas Vieille, 2002. "Stochastic Games with a Single Controller and Incomplete Information," Discussion Papers 1346, Northwestern University, Center for Mathematical Studies in Economics and Management Science.
Dinah Rosenberg & Eilon Solan & Nicolas Vieille, 2000. "Blackwell Optimality in Markov Decision Processes with Partial Observation," Discussion Papers 1292, Northwestern University, Center for Mathematical Studies in Economics and Management Science.
- Dinah Rosenberg & Nicolas Vieille & Eilon Solan, 2002. "Blackwell optimality in Markov decision processes with partial observation," Post-Print hal-00464998, HAL.
Ehud Lehrer & Sylvain Sorin, 1992. "A Uniform Tauberian Theorem in Dynamic Programming," Mathematics of Operations Research, INFORMS, vol. 17(2), pages 303-307, May.
John C. Harsanyi, 1967. "Games with Incomplete Information Played by "Bayesian" Players, I-III Part I. The Basic Model," Management Science, INFORMS, vol. 14(3), pages 159-182, November.
Abraham Neyman, 2008. "Existence of optimal strategies in Markov games with incomplete information," International Journal of Game Theory, Springer;Game Theory Society, vol. 37(4), pages 581-596, December.
- Abraham Neyman, 2005. "Existence of Optimal Strategies in Markov Games with Incomplete Information," Discussion Paper Series dp413, The Federmann Center for the Study of Rationality, the Hebrew University, Jerusalem.
Robert J. Aumann, 1995. "Repeated Games with Incomplete Information," MIT Press Books, The MIT Press, edition 1, volume 1, number 0262011476, December.
A. Hordijk & L. C. M. Kallenberg, 1979. "Linear Programming and Markov Decision Chains," Management Science, INFORMS, vol. 25(4), pages 352-362, April.
MERTENS, Jean-François & ZAMIR, Shmuel, 1985. "Formulation of Bayesian analysis for games with incomplete information," LIDAM Reprints CORE 608, Université catholique de Louvain, Center for Operations Research and Econometrics (CORE).
Truman Bewley & Elon Kohlberg, 1976. "The Asymptotic Theory of Stochastic Games," Mathematics of Operations Research, INFORMS, vol. 1(3), pages 197-208, August.
Jérôme Renault, 2012. "The Value of Repeated Games with an Informed Controller," Mathematics of Operations Research, INFORMS, vol. 37(1), pages 154-179, February.

Full references (including those not matched with items on IDEAS)

Citations

Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

Cited by:

Koessler, Frederic & Laclau, Marie & Renault, Jérôme & Tomala, Tristan, 2022. "Long information design," Theoretical Economics, Econometric Society, vol. 17(2), May.
- Frédéric Koessler & Marie Laclau & Jérôme Renault & Tristan Tomala, 2022. "Long Information Design," PSE-Ecole d'économie de Paris (Postprint) halshs-02400053, HAL.
- Frédéric Koessler & Marie Laclau & Jérôme Renault & Tristan Tomala, 2022. "Long Information Design," Post-Print halshs-02400053, HAL.
- Koessler, Frédéric & Laclau, Marie & Renault, Jérôme & Tomala, Tristan, 2022. "Long information design," TSE Working Papers 22-1341, Toulouse School of Economics (TSE).
Fabien Gensbittel & Marcin Peski & Jérôme Renault, 2019. "The Large Space Of Information Structures," Working Papers hal-02075905, HAL.
- Gensbittel, Fabien & Renault, Jérôme & Peski, Marcin, 2019. "The large space of information structures," TSE Working Papers 19-1006, Toulouse School of Economics (TSE).
Li, Jin & Quincampoix, Marc & Renault, Jérôme & Buckdahn, Rainer, 2019. "Representation formulas for limit values of long run stochastic optimal controls," TSE Working Papers 19-1007, Toulouse School of Economics (TSE).
- R. Buckdahn & Jin Li & Marc Quincampoix & Jérôme Renault, 2020. "Representation formulas for limit values of long run stochastic optimal controls," Post-Print hal-02929156, HAL.
Frédéric Koessler & Marie Laclau & Jerôme Renault & Tristan Tomala, 2022. "Long information design," Post-Print hal-03700394, HAL.
Frédéric Koessler & Marie Laclau & Jerôme Renault & Tristan Tomala, 2022. "Long information design," PSE-Ecole d'économie de Paris (Postprint) hal-03700394, HAL.
Rida Laraki & Jérôme Renault, 2020. "Acyclic Gambling Games," Mathematics of Operations Research, INFORMS, vol. 45(4), pages 1237-1257, November.
- Laraki, Rida & Renault, Jérôme, 2017. "Acyclic Gambling Games," TSE Working Papers 17-768, Toulouse School of Economics (TSE).

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Laraki, Rida & Sorin, Sylvain, 2015. "Advances in Zero-Sum Dynamic Games," Handbook of Game Theory with Economic Applications,, Elsevier.
Sylvain Sorin, 2011. "Zero-Sum Repeated Games: Recent Advances and New Links with Differential Games," Dynamic Games and Applications, Springer, vol. 1(1), pages 172-207, March.
Bruno Ziliotto, 2016. "A Tauberian Theorem for Nonexpansive Operators and Applications to Zero-Sum Stochastic Games," Mathematics of Operations Research, INFORMS, vol. 41(4), pages 1522-1534, November.
Xiaoxi Li & Xavier Venel, 2016. "Recursive games: Uniform value, Tauberian theorem and the Mertens conjecture " M axmin = lim v n = lim v λ "," Université Paris1 Panthéon-Sorbonne (Post-Print and Working Papers) hal-01302553, HAL.
Xiaoxi Li & Xavier Venel, 2016. "Recursive games: Uniform value, Tauberian theorem and the Mertens conjecture " M axmin = lim v n = lim v λ "," PSE-Ecole d'économie de Paris (Postprint) hal-01302553, HAL.
Ashkenazi-Golan, Galit & Rainer, Catherine & Solan, Eilon, 2020. "Solving two-state Markov games with incomplete information on one side," Games and Economic Behavior, Elsevier, vol. 122(C), pages 83-104.
Xiaoxi Li & Xavier Venel, 2016. "Recursive games: Uniform value, Tauberian theorem and the Mertens conjecture " M axmin = lim v n = lim v λ "," Post-Print hal-01302553, HAL.
Abraham Neyman & Sylvain Sorin, 2010. "Repeated games with public uncertain duration process," International Journal of Game Theory, Springer;Game Theory Society, vol. 39(1), pages 29-52, March.
Hugo Gimbert & Jérôme Renault & Sylvain Sorin & Xavier Venel & Wieslaw Zielonka, 2016. "On the values of repeated games with signals," PSE-Ecole d'économie de Paris (Postprint) hal-01006951, HAL.
- Hugo Gimbert & Jérôme Renault & Sylvain Sorin & Xavier Venel & Wieslaw Zielonka, 2016. "On the values of repeated games with signals," Université Paris1 Panthéon-Sorbonne (Post-Print and Working Papers) hal-01006951, HAL.
- Hugo Gimbert & Jérôme Renault & Sylvain Sorin & Xavier Venel & Wieslaw Zielonka, 2016. "On the values of repeated games with signals," Post-Print hal-01006951, HAL.
Dhruva Kartik & Ashutosh Nayyar, 2021. "Upper and Lower Values in Zero-Sum Stochastic Games with Asymmetric Information," Dynamic Games and Applications, Springer, vol. 11(2), pages 363-388, June.
Xavier Venel, 2015. "Commutative Stochastic Games," Mathematics of Operations Research, INFORMS, vol. 40(2), pages 403-428, February.
Fabien Gensbittel & Jérôme Renault, 2015. "The Value of Markov Chain Games with Incomplete Information on Both Sides," Mathematics of Operations Research, INFORMS, vol. 40(4), pages 820-841, October.
repec:dau:papers:123456789/10880 is not listed on IDEAS
Guillaume Vigeral, 2013. "A Zero-Sum Stochastic Game with Compact Action Sets and no Asymptotic Value," Dynamic Games and Applications, Springer, vol. 3(2), pages 172-186, June.
Jérôme Renault, 2012. "The Value of Repeated Games with an Informed Controller," Mathematics of Operations Research, INFORMS, vol. 37(1), pages 154-179, February.
Pierre Cardaliaguet & Catherine Rainer & Dinah Rosenberg & Nicolas Vieille, 2016. "Markov Games with Frequent Actions and Incomplete Information—The Limit Case," Mathematics of Operations Research, INFORMS, vol. 41(1), pages 49-71, February.
Jérôme Bolte & Stéphane Gaubert & Guillaume Vigeral, 2015. "Definable Zero-Sum Stochastic Games," Mathematics of Operations Research, INFORMS, vol. 40(1), pages 171-191, February.
Laraki, Rida & Renault, Jérôme, 2017. "Acyclic Gambling Games," TSE Working Papers 17-768, Toulouse School of Economics (TSE).
Mandel, Antoine & Venel, Xavier, 2020. "Dynamic competition over social networks," European Journal of Operational Research, Elsevier, vol. 280(2), pages 597-608.
- Antoine Mandel & Xavier Venel, 2017. "Dynamic competition over social networks," Documents de travail du Centre d'Economie de la Sorbonne 17021, Université Panthéon-Sorbonne (Paris 1), Centre d'Economie de la Sorbonne.
- Antoine Mandel & Xavier Venel, 2020. "Dynamic competition over social networks," Post-Print halshs-02334595, HAL.
- Antoine Mandel & Xavier Venel, 2020. "Dynamic competition over social networks," Université Paris1 Panthéon-Sorbonne (Post-Print and Working Papers) halshs-02334595, HAL.
- Antoine Mandel & Xavier Venel, 2020. "Dynamic competition over social networks," PSE-Ecole d'économie de Paris (Postprint) halshs-02334595, HAL.
Guilhem Lecouteux, 2018. "Bayesian game theorists and non-Bayesian players," The European Journal of the History of Economic Thought, Taylor & Francis Journals, vol. 25(6), pages 1420-1454, November.
- Guilhem Lecouteux, 2017. "Bayesian Game Theorists and Non-Bayesian Players," GREDEG Working Papers 2017-30, Groupe de REcherche en Droit, Economie, Gestion (GREDEG CNRS), Université Côte d'Azur, France, revised Jul 2018.
- Guilhem Lecouteux, 2018. "Bayesian Game Theorists and non-Bayesian Players," Working Papers halshs-01633126, HAL.
- Guilhem Lecouteux, 2018. "Bayesian Game Theorists and Non-Bayesian Players," Post-Print halshs-01941773, HAL.
Johannes Hörner & Satoru Takahashi & Nicolas Vieille, 2015. "Truthful Equilibria in Dynamic Bayesian Games," Econometrica, Econometric Society, vol. 83(5), pages 1795-1848, September.
- Johannes Horner & Satoru Takahashi & Nicolas Vieille, 2013. "Truthful Equilibria in Dynamic Bayesian Games," Cowles Foundation Discussion Papers 1933R, Cowles Foundation for Research in Economics, Yale University, revised Jan 2015.
- Johannes Horner & Satoru Takahashi & Nicolas Vieille, 2014. "Truthful Equilibria in Dynamic Bayesian Games," Levine's Working Paper Archive 786969000000000881, David K. Levine.
- Johannes Horner & Satoru Takahashi & Nicolas Vieille, 2013. "Truthful Equilibria in Dynamic Bayesian Games," Cowles Foundation Discussion Papers 1933, Cowles Foundation for Research in Economics, Yale University.

More about this item

Keywords

Characterization of the value; Partial observation Markov decision processes; Uniform value; Repeated games; Wasserstein metric; Gambling Houses;
All these keywords.

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:hal:journl:hal-01396680. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: CCSD (email available below). General contact details of provider: https://hal.archives-ouvertes.fr/ .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Long-term values in Markov Decision Processes and Repeated Games, and a new distance for probability spaces

Author

Abstract

Suggested Citation

Download full text from publisher

Other versions of this item:

References listed on IDEAS

Citations

Most related items

More about this item

Keywords

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data