Author
Listed:
- Zhichao Feng
(Department of Logistics and Maritime Studies, Faculty of Business, The Hong Kong Polytechnic University, Kowloon, Hong Kong)
- Milind Dawande
(Naveen Jindal School of Management, The University of Texas at Dallas, Richardson, Texas 75080)
- Ganesh Janakiraman
(Naveen Jindal School of Management, The University of Texas at Dallas, Richardson, Texas 75080)
- Anyan Qi
(Naveen Jindal School of Management, The University of Texas at Dallas, Richardson, Texas 75080)
Abstract
In many practical settings, learning algorithms can take a substantial amount of time to converge, thereby raising the need to understand the role of discounting in learning. We illustrate the impact of discounting on the performance of learning algorithms by examining two classic and representative dynamic-pricing and learning problems studied in Broder and Rusmevichientong (BR) [Broder J, Rusmevichientong P (2012) Dynamic pricing under a general parametric choice model. Oper. Res. 60(4):965–980] and Keskin and Zeevi (KZ) [Keskin NB, Zeevi A (2014) Dynamic pricing with an unknown demand model: Asymptotically optimal semi-myopic policies. Oper. Res. 62(5):1142–1167]. In both settings, a seller sells a product with unlimited inventory over T periods. The seller initially does not know the parameters of the general choice model in BR (respectively, the linear demand curve in KZ). Given a discount factor ρ , the retailer’s objective is to determine a pricing policy to maximize the expected discounted revenue over T periods. In both settings, we establish lower bounds on the regret under any policy and show limiting bounds of Ω ( 1 / ( 1 − ρ ) ) and Ω ( T ) when T → ∞ and ρ → 1 , respectively. In the model of BR with discounting, we propose an asymptotically tight learning policy and show that the regret under our policy as well that under the MLE-CYCLE policy in BR is O ( 1 / ( 1 − ρ ) ) (respectively, O ( T ) ) when T → ∞ (respectively, ρ → 1 ). In the model of KZ with discounting, we present sufficient conditions for a learning policy to guarantee asymptotic optimality and show that the regret under any policy satisfying these conditions is O ( log ( 1 / ( 1 − ρ ) ) 1 / ( 1 − ρ ) ) (respectively, O ( log T T ) ) when T → ∞ (respectively, ρ → 1 ). We show that three different policies—namely, the two variants of the greedy iterated least squares policy in KZ and a different policy that we propose—achieve this upper bound on the regret. We numerically examine the behavior of the regret under our policies as well as those in BR and KZ in the presence of discounting. We also analyze a setting in which the discount factor per period is a function of the number of decision periods in the planning horizon.
Suggested Citation
Zhichao Feng & Milind Dawande & Ganesh Janakiraman & Anyan Qi, 2024.
"Technical Note—Dynamic Pricing and Learning with Discounting,"
Operations Research, INFORMS, vol. 72(2), pages 481-492, March.
Handle:
RePEc:inm:oropre:v:72:y:2024:i:2:p:481-492
DOI: 10.1287/opre.2023.2477
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:oropre:v:72:y:2024:i:2:p:481-492. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.