Mean--Variance Portfolio Selection by Continuous-Time Reinforcement Learning: Algorithms, Regret Analysis, and Empirical Study

My bibliography Save this paper

Mean--Variance Portfolio Selection by Continuous-Time Reinforcement Learning: Algorithms, Regret Analysis, and Empirical Study

Author

Listed:

Yilie Huang
Yanwei Jia
Xun Yu Zhou

Registered:

Abstract

We study continuous-time mean--variance portfolio selection in markets where stock prices are diffusion processes driven by observable factors that are also diffusion processes yet the coefficients of these processes are unknown. Based on the recently developed reinforcement learning (RL) theory for diffusion processes, we present a general data-driven RL algorithm that learns the pre-committed investment strategy directly without attempting to learn or estimate the market coefficients. For multi-stock Black--Scholes markets without factors, we further devise a baseline algorithm and prove its performance guarantee by deriving a sublinear regret bound in terms of Sharpe ratio. For performance enhancement and practical implementation, we modify the baseline algorithm into four variants, and carry out an extensive empirical study to compare their performance, in terms of a host of common metrics, with a large number of widely used portfolio allocation strategies on S\&P 500 constituents. The results demonstrate that the continuous-time RL strategies are consistently among the best especially in a volatile bear market, and decisively outperform the model-based continuous-time counterparts by significant margins.

Suggested Citation

Yilie Huang & Yanwei Jia & Xun Yu Zhou, 2024. "Mean--Variance Portfolio Selection by Continuous-Time Reinforcement Learning: Algorithms, Regret Analysis, and Empirical Study," Papers 2412.16175, arXiv.org.

Handle: RePEc:arx:papers:2412.16175

Download full text from publisher

References listed on IDEAS

Lewellen, Jonathan, 2015. "The Cross-section of Expected Stock Returns," Critical Finance Review, now publishers, vol. 4(1), pages 1-44, June.
Daniele Bianchi & Matthias Büchner & Tobias Hoogteijling & Andrea Tamoni, 2021. "Corrigendum: Bond Risk Premiums with Machine Learning [Bond risk premiums with machine learning]," The Review of Financial Studies, Society for Financial Studies, vol. 34(2), pages 1090-1103.
William F. Sharpe, 1963. "A Simplified Model for Portfolio Analysis," Management Science, INFORMS, vol. 9(2), pages 277-293, January.
Shihao Gu & Bryan Kelly & Dacheng Xiu, 2020. "Empirical Asset Pricing via Machine Learning," Review of Finance, European Finance Association, vol. 33(5), pages 2223-2273.
Leippold, Markus & Wang, Qian & Zhou, Wenyu, 2022. "Machine learning in the Chinese stock market," Journal of Financial Economics, Elsevier, vol. 145(2), pages 64-82.
Olivier Ledoit & Michael Wolf, 2017. "Nonlinear Shrinkage of the Covariance Matrix for Portfolio Selection: Markowitz Meets Goldilocks," The Review of Financial Studies, Society for Financial Studies, vol. 30(12), pages 4349-4388.
Louis K.C. Chan & Jason Karceski & Josef Lakonishok, 1999. "On Portfolio Optimization: Forecasting Covariances and Choosing the Risk Model," NBER Working Papers 7039, National Bureau of Economic Research, Inc.
V. Joseph Hotz & Robert A. Miller, 1993. "Conditional Choice Probabilities and the Estimation of Dynamic Models," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 60(3), pages 497-529.
- Hotz, V.J. & Miller, R.A., 1991. "Conditional Choice Probabilities and the Estimation of Dynamic Models," GSIA Working Papers 1992-12, Carnegie Mellon University, Tepper School of Business.
- V. Joseph Hotz & Robert A. Miller, 1992. "Conditional Choice Probabilities and the Estimation of Dynamic Models," Working Papers 9202, Harris School of Public Policy Studies, University of Chicago.
Martin Lettau & Markus Pelger & Stijn Van Nieuwerburgh, 2020. "Factors That Fit the Time Series and Cross-Section of Stock Returns," The Review of Financial Studies, Society for Financial Studies, vol. 33(5), pages 2274-2325.
- Lettau, Martin & Pelger, Markus, 2018. "Factors that Fit the Time Series and Cross-Section of Stock Returns," CEPR Discussion Papers 13049, C.E.P.R. Discussion Papers.
- Martin Lettau & Markus Pelger, 2018. "Factors that Fit the Time Series and Cross-Section of Stock Returns," NBER Working Papers 24858, National Bureau of Economic Research, Inc.
Gennotte, Gerard, 1986. "Optimal Portfolio Choice under Incomplete Information," Journal of Finance, American Finance Association, vol. 41(3), pages 733-746, July.
John H. Cochrane, 2011. "Presidential Address: Discount Rates," Journal of Finance, American Finance Association, vol. 66(4), pages 1047-1108, August.
Min Dai & Hanqing Jin & Steven Kou & Yuhong Xu, 2021. "A Dynamic Mean-Variance Analysis for Log Returns," Management Science, INFORMS, vol. 67(2), pages 1093-1108, February.
Heston, Steven L, 1993. "A Closed-Form Solution for Options with Stochastic Volatility with Applications to Bond and Currency Options," The Review of Financial Studies, Society for Financial Studies, vol. 6(2), pages 327-343.
Victor DeMiguel & Lorenzo Garlappi & Francisco J. Nogales & Raman Uppal, 2009. "A Generalized Approach to Portfolio Optimization: Improving Performance by Constraining Portfolio Norms," Management Science, INFORMS, vol. 55(5), pages 798-812, May.
Daniele Bianchi & Matthias Büchner & Andrea Tamoni, 2021. "Bond Risk Premiums with Machine Learning [Quadratic term structure models: Theory and evidence]," The Review of Financial Studies, Society for Financial Studies, vol. 34(2), pages 1046-1089.
Jakša Cvitanić & Ali Lazrak & Lionel Martellini & Fernando Zapatero, 2006. "Dynamic Portfolio Choice with Parameter Uncertainty and the Economic Value of Analysts' Recommendations," The Review of Financial Studies, Society for Financial Studies, vol. 19(4), pages 1113-1156.
Jegadeesh, Narasimhan, 1990. "Evidence of Predictable Behavior of Security Returns," Journal of Finance, American Finance Association, vol. 45(3), pages 881-898, July.
Gah-Yi Ban & Noureddine El Karoui & Andrew E. B. Lim, 2018. "Machine Learning and Portfolio Optimization," Management Science, INFORMS, vol. 64(3), pages 1136-1154, March.
Shihao Gu & Bryan Kelly & Dacheng Xiu, 2020. "Empirical Asset Pricing via Machine Learning," The Review of Financial Studies, Society for Financial Studies, vol. 33(5), pages 2223-2273.
- Shihao Gu & Bryan Kelly & Dacheng Xiu, 2018. "Empirical Asset Pricing via Machine Learning," NBER Working Papers 25398, National Bureau of Economic Research, Inc.
- Shihao Gu & Bryan T. Kelly & Dacheng Xiu, 2018. "Empirical Asset Pricing via Machine Learning," Swiss Finance Institute Research Paper Series 18-71, Swiss Finance Institute.
Gu, Shihao & Kelly, Bryan & Xiu, Dacheng, 2021. "Autoencoder asset pricing models," Journal of Econometrics, Elsevier, vol. 222(1), pages 429-450.
Volodymyr Mnih & Koray Kavukcuoglu & David Silver & Andrei A. Rusu & Joel Veness & Marc G. Bellemare & Alex Graves & Martin Riedmiller & Andreas K. Fidjeland & Georg Ostrovski & Stig Petersen & Charle, 2015. "Human-level control through deep reinforcement learning," Nature, Nature, vol. 518(7540), pages 529-533, February.
Yanwei Jia & Xun Yu Zhou, 2021. "Policy Gradient and Actor-Critic Learning in Continuous Time and Space: Theory and Algorithms," Papers 2111.11232, arXiv.org, revised Jul 2022.
Jianqing Fan & Jingjin Zhang & Ke Yu, 2012. "Vast Portfolio Selection With Gross-Exposure Constraints," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 107(498), pages 592-606, June.
Lorenzo Garlappi & Raman Uppal & Tan Wang, 2007. "Portfolio Selection with Parameter and Model Uncertainty: A Multi-Prior Approach," The Review of Financial Studies, Society for Financial Studies, vol. 20(1), pages 41-81, January.
- Raman Uppal & Lorenzo Garlappi & Tan Wang, 2004. "Portfolio Selection with Parameter and Model Uncertainty: A Multi-Prior Approach," Money Macro and Finance (MMF) Research Group Conference 2004 54, Money Macro and Finance Research Group.
- Uppal, Raman & Wang, Tan & Garlappi, Lorenzo, 2005. "Portfolio Selection with Parameter and Model Uncertainty: A Multi-Prior Approach," CEPR Discussion Papers 5148, C.E.P.R. Discussion Papers.
- Uppal, Raman & Wang, Tan & Garlappi, Lorenzo, 2005. "Portfolio Selection with Parameter and Model Uncertainty: A Multi-Prior Approach," CEPR Discussion Papers 5041, C.E.P.R. Discussion Papers.
Yanwei Jia & Xun Yu Zhou, 2021. "Policy Evaluation and Temporal-Difference Learning in Continuous Time and Space: A Martingale Approach," Papers 2108.06655, arXiv.org, revised Feb 2022.
Martin Lettau & Markus Pelger, 2020. "Factors That Fit the Time Series and Cross-Section of Stock Returns," Review of Finance, European Finance Association, vol. 33(5), pages 2274-2325.
Ledoit, Olivier & Wolf, Michael, 2003. "Improved estimation of the covariance matrix of stock returns with an application to portfolio selection," Journal of Empirical Finance, Elsevier, vol. 10(5), pages 603-621, December.
- Ledoit, Olivier & Wolf, Michael, 2000. "Improved estimation of the covariance matrix of stock returns with an application to portfolio selection," DES - Working Papers. Statistics and Econometrics. WS 10089, Universidad Carlos III de Madrid. Departamento de EstadÃstica.
- Olivier Ledoit & Michael Wolf, 2001. "Improved estimation of the covariance matrix of stock returns with an application to portofolio selection," Economics Working Papers 586, Department of Economics and Business, Universitat Pompeu Fabra.
Jorion, Philippe, 1986. "Bayes-Stein Estimation for Portfolio Analysis," Journal of Financial and Quantitative Analysis, Cambridge University Press, vol. 21(3), pages 279-292, September.
John Y. Campbell & Samuel B. Thompson, 2008. "Predicting Excess Stock Returns Out of Sample: Can Anything Beat the Historical Average?," The Review of Financial Studies, Society for Financial Studies, vol. 21(4), pages 1509-1531, July.
- Campbell, John & Thompson, Samuel P., 2008. "Predicting Excess Stock Returns Out of Sample: Can Anything Beat the Historical Average?," Scholarly Articles 2622619, Harvard University Department of Economics.
Wachter, Jessica A., 2002. "Portfolio and Consumption Decisions under Mean-Reverting Returns: An Exact Solution for Complete Markets," Journal of Financial and Quantitative Analysis, Cambridge University Press, vol. 37(1), pages 63-91, March.
Bruce N. Lehmann, 1990. "Fads, Martingales, and Market Efficiency," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 105(1), pages 1-28.
Drew Fudenberg & Ryota Iijima & Tomasz Strzalecki, 2015. "Stochastic Choice and Revealed Perturbed Utility," Econometrica, Econometric Society, vol. 83, pages 2371-2409, November.
- Drew Fudenberg & Ryota Iijima & Tomasz Strzalecki, "undated". "Stochastic Choice and Revealed Perturbed Utility," Working Paper 136731, Harvard University OpenScholar.
Fama, Eugene F. & French, Kenneth R., 1993. "Common risk factors in the returns on stocks and bonds," Journal of Financial Economics, Elsevier, vol. 33(1), pages 3-56, February.
William F. Sharpe, 1964. "Capital Asset Prices: A Theory Of Market Equilibrium Under Conditions Of Risk," Journal of Finance, American Finance Association, vol. 19(3), pages 425-442, September.
Mark Britten‐Jones, 1999. "The Sampling Error in Estimates of Mean‐Variance Efficient Portfolio Weights," Journal of Finance, American Finance Association, vol. 54(2), pages 655-671, April.
Kim, Tong Suk & Omberg, Edward, 1996. "Dynamic Nonmyopic Portfolio Behavior," The Review of Financial Studies, Society for Financial Studies, vol. 9(1), pages 141-161.
Guiyun Feng & Xiaobo Li & Zizhuo Wang, 2017. "Technical Note—On the Relation Between Several Discrete Choice Models," Operations Research, INFORMS, vol. 65(6), pages 1516-1525, December.
Jose Blanchet & Lin Chen & Xun Yu Zhou, 2022. "Distributionally Robust Mean-Variance Portfolio Selection with Wasserstein Distances," Management Science, INFORMS, vol. 68(9), pages 6382-6410, September.
Michael J. Best & Robert R. Grauer, 1991. "Sensitivity Analysis for Mean-Variance Portfolio Problems," Management Science, INFORMS, vol. 37(8), pages 980-989, August.
Chan, Louis K C & Karceski, Jason & Lakonishok, Josef, 1999. "On Portfolio Optimization: Forecasting Covariances and Choosing the Risk Model," The Review of Financial Studies, Society for Financial Studies, vol. 12(5), pages 937-974.
Mark Broadie & Deniz Cicek & Assaf Zeevi, 2011. "General Bounds and Finite-Time Improvement for the Kiefer-Wolfowitz Stochastic Approximation Algorithm," Operations Research, INFORMS, vol. 59(5), pages 1211-1224, October.
Sigrún Andradóttir, 1995. "A Stochastic Approximation Algorithm with Varying Bounds," Operations Research, INFORMS, vol. 43(6), pages 1037-1048, December.
Merton, Robert C., 1980. "On estimating the expected return on the market : An exploratory investigation," Journal of Financial Economics, Elsevier, vol. 8(4), pages 323-361, December.
- Robert C. Merton, 1980. "On Estimating the Expected Return on the Market: An Exploratory Investigation," NBER Working Papers 0444, National Bureau of Economic Research, Inc.
Jun Liu, 2007. "Portfolio Selection in Stochastic Environments," The Review of Financial Studies, Society for Financial Studies, vol. 20(1), pages 1-39, January.
D. Goldfarb & G. Iyengar, 2003. "Robust Portfolio Selection Problems," Mathematics of Operations Research, INFORMS, vol. 28(1), pages 1-38, February.
Fan, Jianqing & Fan, Yingying & Lv, Jinchi, 2008. "High dimensional covariance matrix estimation using a factor model," Journal of Econometrics, Elsevier, vol. 147(1), pages 186-197, November.
Jegadeesh, Narasimhan & Titman, Sheridan, 1993. "Returns to Buying Winners and Selling Losers: Implications for Stock Market Efficiency," Journal of Finance, American Finance Association, vol. 48(1), pages 65-91, March.
Hansen, Lars Peter & Singleton, Kenneth J, 1982. "Generalized Instrumental Variables Estimation of Nonlinear Rational Expectations Models," Econometrica, Econometric Society, vol. 50(5), pages 1269-1286, September.

Full references (including those not matched with items on IDEAS)

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Thomas Conlon & John Cotter & Iason Kynigakis, 2021. "Machine Learning and Factor-Based Portfolio Optimization," Papers 2107.13866, arXiv.org.
- Thomas Conlon & John Cotter & Iason Kynigakis, 2021. "Machine Learning and Factor-Based Portfolio Optimization," Working Papers 202111, Geary Institute, University College Dublin.
Behr, Patrick & Guettler, Andre & Truebenbach, Fabian, 2012. "Using industry momentum to improve portfolio performance," Journal of Banking & Finance, Elsevier, vol. 36(5), pages 1414-1423.
Wang, Christina Dan & Chen, Zhao & Lian, Yimin & Chen, Min, 2022. "Asset selection based on high frequency Sharpe ratio," Journal of Econometrics, Elsevier, vol. 227(1), pages 168-188.
Francisco Peñaranda & Enrique Sentana, 2024. "Portfolio management with big data," Working Papers wp2024_2411, CEMFI.
- Penaranda, Francisco & Sentana, Enrique, 2024. "Portfolio management with big data," CEPR Discussion Papers 19314, C.E.P.R. Discussion Papers.
Penaranda, Francisco, 2007. "Portfolio choice beyond the traditional approach," LSE Research Online Documents on Economics 24481, London School of Economics and Political Science, LSE Library.
De Nard, Gianluca & Zhao, Zhao, 2023. "Using, taming or avoiding the factor zoo? A double-shrinkage estimator for covariance matrices," Journal of Empirical Finance, Elsevier, vol. 72(C), pages 23-35.
Schanbacher Peter, 2015. "Averaging Across Asset Allocation Models," Journal of Economics and Statistics (Jahrbuecher fuer Nationaloekonomie und Statistik), De Gruyter, vol. 235(1), pages 61-81, February.
Mishra, Anil V., 2016. "Foreign bias in Australian-domiciled mutual fund holdings," Pacific-Basin Finance Journal, Elsevier, vol. 39(C), pages 101-123.
- Mishra, Anil V, 2015. "Foreign Bias in Australian Domiciled Mutual Fund Holdings," MPRA Paper 63376, University Library of Munich, Germany.
Maillet, Bertrand & Tokpavi, Sessi & Vaucher, Benoit, 2015. "Global minimum variance portfolio optimisation under some model risk: A robust regression-based approach," European Journal of Operational Research, Elsevier, vol. 244(1), pages 289-299.
- Bertrand Maillet & Sessi Tokpavi & Benoit Vaucher, 2015. "Global minimum variance portfolio optimisation under some model risk: A robust regression-based approach," Post-Print hal-01243408, HAL.
Yan, Cheng & Zhang, Huazhu, 2017. "Mean-variance versus naïve diversification: The role of mispricing," Journal of International Financial Markets, Institutions and Money, Elsevier, vol. 48(C), pages 61-81.
Cakici, Nusret & Fieberg, Christian & Metko, Daniel & Zaremba, Adam, 2023. "Machine learning goes global: Cross-sectional return predictability in international stock markets," Journal of Economic Dynamics and Control, Elsevier, vol. 155(C).
Constantinos Kardaras & Hyeng Keun Koo & Johannes Ruf, 2022. "Estimation of growth in fund models," Papers 2208.02573, arXiv.org.
Lin William Cong & Guanhao Feng & Jingyu He & Xin He, 2022. "Growing the Efficient Frontier on Panel Trees," NBER Working Papers 30805, National Bureau of Economic Research, Inc.
- Lin William Cong & Guanhao Feng & Jingyu He & Xin He, 2025. "Growing the Efficient Frontier on Panel Trees," Papers 2501.16730, arXiv.org, revised Feb 2025.
De Nard, Gianluca & Zhao, Zhao, 2022. "A large-dimensional test for cross-sectional anomalies:Efficient sorting revisited," International Review of Economics & Finance, Elsevier, vol. 80(C), pages 654-676.
Hsu, Po-Hsuan & Han, Qiheng & Wu, Wensheng & Cao, Zhiguang, 2018. "Asset allocation strategies, data snooping, and the 1 / N rule," Journal of Banking & Finance, Elsevier, vol. 97(C), pages 257-269.
Johannes Bock, 2018. "An updated review of (sub-)optimal diversification models," Papers 1811.08255, arXiv.org.
Bryzgalova, Svetlana & Huang, Jiantao & Julliard, Christian, 2023. "Bayesian solutions for the factor zoo: we just ran two quadrillion models," LSE Research Online Documents on Economics 126151, London School of Economics and Political Science, LSE Library.
Seyoung Park & Eun Ryung Lee & Sungchul Lee & Geonwoo Kim, 2019. "Dantzig Type Optimization Method with Applications to Portfolio Selection," Sustainability, MDPI, vol. 11(11), pages 1-32, June.
Mishra, Anil V., 2015. "Measures of equity home bias puzzle," Journal of Empirical Finance, Elsevier, vol. 34(C), pages 293-312.
- Mishra, Anil, 2013. "Measures of Equity Home Bias Puzzle," MPRA Paper 51223, University Library of Munich, Germany.
Cai, T. Tony & Hu, Jianchang & Li, Yingying & Zheng, Xinghua, 2020. "High-dimensional minimum variance portfolio estimation based on high-frequency data," Journal of Econometrics, Elsevier, vol. 214(2), pages 482-494.

More about this item

NEP fields

This paper has been announced in the following NEP Reports:

NEP-FMK-2025-01-27 (Financial Markets)

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2412.16175. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Mean--Variance Portfolio Selection by Continuous-Time Reinforcement Learning: Algorithms, Regret Analysis, and Empirical Study

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Most related items

More about this item

NEP fields

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data