IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2212.09192.html
   My bibliography  Save this paper

Multiarmed Bandits Problem Under the Mean-Variance Setting

Author

Listed:
  • Hongda Hu
  • Arthur Charpentier
  • Mario Ghossoub
  • Alexander Schied

Abstract

The classical multi-armed bandit (MAB) problem involves a learner and a collection of K independent arms, each with its own ex ante unknown independent reward distribution. At each one of a finite number of rounds, the learner selects one arm and receives new information. The learner often faces an exploration-exploitation dilemma: exploiting the current information by playing the arm with the highest estimated reward versus exploring all arms to gather more reward information. The design objective aims to maximize the expected cumulative reward over all rounds. However, such an objective does not account for a risk-reward tradeoff, which is often a fundamental precept in many areas of applications, most notably in finance and economics. In this paper, we build upon Sani et al. (2012) and extend the classical MAB problem to a mean-variance setting. Specifically, we relax the assumptions of independent arms and bounded rewards made in Sani et al. (2012) by considering sub-Gaussian arms. We introduce the Risk Aware Lower Confidence Bound (RALCB) algorithm to solve the problem, and study some of its properties. Finally, we perform a number of numerical simulations to demonstrate that, in both independent and dependent scenarios, our suggested approach performs better than the algorithm suggested by Sani et al. (2012).

Suggested Citation

  • Hongda Hu & Arthur Charpentier & Mario Ghossoub & Alexander Schied, 2022. "Multiarmed Bandits Problem Under the Mean-Variance Setting," Papers 2212.09192, arXiv.org, revised May 2024.
  • Handle: RePEc:arx:papers:2212.09192
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2212.09192
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Mengying Zhu & Xiaolin Zheng & Yan Wang & Yuyuan Li & Qianqiao Liang, 2019. "Adaptive Portfolio by Solving Multi-armed Bandit via Thompson Sampling," Papers 1911.05309, arXiv.org, revised Nov 2019.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.

      More about this item

      Statistics

      Access and download statistics

      Corrections

      All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2212.09192. See general information about how to correct material in RePEc.

      If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

      If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

      If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

      For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

      Please note that corrections may take a couple of weeks to filter through the various RePEc services.

      IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.