IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2501.03993.html
   My bibliography  Save this paper

Synthetic Data for Portfolios: A Throw of the Dice Will Never Abolish Chance

Author

Listed:
  • Adil Rengim Cetingoz
  • Charles-Albert Lehalle

Abstract

Simulation methods have always been instrumental in finance, and data-driven methods with minimal model specification, commonly referred to as generative models, have attracted increasing attention, especially after the success of deep learning in a broad range of fields. However, the adoption of these models in financial applications has not kept pace with the growing interest, probably due to the unique complexities and challenges of financial markets. This paper aims to contribute to a deeper understanding of the limitations of generative models, particularly in portfolio and risk management. To this end, we begin by presenting theoretical results on the importance of initial sample size, and point out the potential pitfalls of generating far more data than originally available. We then highlight the inseparable nature of model development and the desired use case by touching on a paradox: generic generative models inherently care less about what is important for constructing portfolios (in particular the long-short ones). Based on these findings, we propose a pipeline for the generation of multivariate returns that meets conventional evaluation standards on a large universe of US equities while being compliant with stylized facts observed in asset returns and turning around the pitfalls we previously identified. Moreover, we insist on the need for more delicate evaluation methods, and suggest, through an example of mean-reversion strategies, a method designed to identify poor models for a given application based on regurgitative training, i.e. retraining the model using the data it has itself generated, which is commonly referred to in statistics as identifiability.

Suggested Citation

  • Adil Rengim Cetingoz & Charles-Albert Lehalle, 2025. "Synthetic Data for Portfolios: A Throw of the Dice Will Never Abolish Chance," Papers 2501.03993, arXiv.org, revised Jan 2025.
  • Handle: RePEc:arx:papers:2501.03993
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2501.03993
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Lars Ericson & Xuejun Zhu & Xusi Han & Rao Fu & Shuang Li & Steve Guo & Ping Hu, 2024. "Deep Generative Modeling for Financial Time Series with Application in VaR: A Comparative Review," Papers 2401.10370, arXiv.org.
    2. Adriano Koshiyama & Nick Firoozye & Philip Treleaven, 2021. "Generative adversarial networks for financial trading strategies fine-tuning and combination," Quantitative Finance, Taylor & Francis Journals, vol. 21(5), pages 797-813, May.
    3. Ding, Zhuanxin & Granger, Clive W. J. & Engle, Robert F., 1993. "A long memory property of stock market returns and a new model," Journal of Empirical Finance, Elsevier, vol. 1(1), pages 83-106, June.
    4. Carriere, Jacques F., 1996. "Valuation of the early-exercise price for options using simulations and nonparametric regression," Insurance: Mathematics and Economics, Elsevier, vol. 19(1), pages 19-30, December.
    5. Rudy Morel & St'ephane Mallat & Jean-Philippe Bouchaud, 2023. "Path Shadowing Monte-Carlo," Papers 2308.01486, arXiv.org.
    6. Magnus Wiese & Lianjun Bai & Ben Wood & Hans Buehler, 2019. "Deep Hedging: Learning to Simulate Equity Option Markets," Papers 1911.01700, arXiv.org.
    7. Florian Eckerli & Joerg Osterrieder, 2021. "Generative Adversarial Networks in finance: an overview," Papers 2106.06364, arXiv.org, revised Jul 2021.
    8. Farshid Jamshidian & Yu Zhu, 1996. "Scenario Simulation: Theory and methodology (*)," Finance and Stochastics, Springer, vol. 1(1), pages 43-67.
    9. Stephen Boyd & Kasper Johansson & Ronald Kahn & Philipp Schiele & Thomas Schmelzer, 2024. "Markowitz Portfolio Construction at Seventy," Papers 2401.05080, arXiv.org.
    10. Gilles Zumbach, 2007. "Time reversal invariance in finance," Papers 0708.4022, arXiv.org.
    11. Capponi,Agostino & Lehalle,Charles-Albert (ed.), 2023. "Machine Learning and Data Sciences for Financial Markets," Cambridge Books, Cambridge University Press, number 9781316516195, January.
    12. Magnus Wiese & Robert Knobloch & Ralf Korn & Peter Kretschmer, 2020. "Quant GANs: deep generation of financial time series," Quantitative Finance, Taylor & Francis Journals, vol. 20(9), pages 1419-1440, September.
    13. Mark Broadie & Paul Glasserman, 1996. "Estimating Security Price Derivatives Using Simulation," Management Science, INFORMS, vol. 42(2), pages 269-285, February.
    14. Sergio Caprioli & Emanuele Cagliero & Riccardo Crupi, 2023. "Quantifying Credit Portfolio sensitivity to asset correlations with interpretable generative neural networks," Papers 2309.08652, arXiv.org, revised Nov 2023.
    15. Edmond Lezmi & Jules Roche & Thierry Roncalli & Jiali Xu, 2020. "Improving the Robustness of Trading Strategy Backtesting with Boltzmann Machines and Generative Adversarial Networks," Papers 2007.04838, arXiv.org.
    16. Christina D. Romer, 1999. "Changes in Business Cycles: Evidence and Explanations," Journal of Economic Perspectives, American Economic Association, vol. 13(2), pages 23-44, Spring.
    17. Fama, Eugene F, 1970. "Efficient Capital Markets: A Review of Theory and Empirical Work," Journal of Finance, American Finance Association, vol. 25(2), pages 383-417, May.
    18. Weilong Fu & Ali Hirsa & Jorg Osterrieder, 2022. "Simulating financial time series using attention," Papers 2207.00493, arXiv.org.
    19. Gero Junike & Solveig Flaig & Ralf Werner, 2023. "Validation of machine learning based scenario generators," Papers 2301.12719, arXiv.org, revised Dec 2024.
    20. Boyle, Phelim P., 1977. "Options: A Monte Carlo approach," Journal of Financial Economics, Elsevier, vol. 4(3), pages 323-338, May.
    21. Ilia Shumailov & Zakhar Shumaylov & Yiren Zhao & Nicolas Papernot & Ross Anderson & Yarin Gal, 2024. "AI models collapse when trained on recursively generated data," Nature, Nature, vol. 631(8022), pages 755-759, July.
    22. Brandon Da Silva & Sylvie Shang Shi, 2019. "Style Transfer with Time Series: Generating Synthetic Financial Data," Papers 1906.03232, arXiv.org, revised Dec 2019.
    23. Yerin Kim & Daemook Kang & Mingoo Jeon & Chungmok Lee, 2019. "GAN-MP hybrid heuristic algorithm for non-convex portfolio optimization problem," The Engineering Economist, Taylor & Francis Journals, vol. 64(3), pages 196-226, July.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Solveig Flaig & Gero Junike, 2021. "Scenario generation for market risk models using generative neural networks," Papers 2109.10072, arXiv.org, revised Aug 2023.
    2. Solveig Flaig & Gero Junike, 2022. "Scenario Generation for Market Risk Models Using Generative Neural Networks," Risks, MDPI, vol. 10(11), pages 1-28, October.
    3. Francesca Biagini & Lukas Gonon & Niklas Walter, 2024. "Universal randomised signatures for generative time series modelling," Papers 2406.10214, arXiv.org, revised Sep 2024.
    4. Song Wei & Andrea Coletta & Svitlana Vyetrenko & Tucker Balch, 2023. "INTAGS: Interactive Agent-Guided Simulation," Papers 2309.01784, arXiv.org, revised Nov 2023.
    5. Emiel Lemahieu & Kris Boudt & Maarten Wyns, 2023. "Generating drawdown-realistic financial price paths using path signatures," Papers 2309.04507, arXiv.org.
    6. Szymon Kubiak & Tillman Weyde & Oleksandr Galkin & Dan Philps & Ram Gopal, 2023. "Improved Data Generation for Enhanced Asset Allocation: A Synthetic Dataset Approach for the Fixed Income Universe," Papers 2311.16004, arXiv.org.
    7. Stentoft, Lars, 2005. "Pricing American options when the underlying asset follows GARCH processes," Journal of Empirical Finance, Elsevier, vol. 12(4), pages 576-611, September.
    8. Tan, Zhengxun & Liu, Juan & Chen, Juanjuan, 2021. "Detecting stock market turning points using wavelet leaders method," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 565(C).
    9. Mahata, Ajit & Rai, Anish & Nurujjaman, Md. & Prakash, Om, 2021. "Modeling and analysis of the effect of COVID-19 on the stock price: V and L-shape recovery," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 574(C).
    10. Guglielmo Maria Caporale & Luis A. Gil-Alana & Alex Plastun, 2017. "Long Memory and Data Frequency in Financial Markets," Discussion Papers of DIW Berlin 1647, DIW Berlin, German Institute for Economic Research.
    11. Michael Karpe, 2020. "An overall view of key problems in algorithmic trading and recent progress," Papers 2006.05515, arXiv.org.
    12. Galai, Dan & Raviv, Alon & Wiener, Zvi, 2007. "Liquidation triggers and the valuation of equity and debt," Journal of Banking & Finance, Elsevier, vol. 31(12), pages 3604-3620, December.
    13. Blanka Horvath & Josef Teichmann & Žan Žurič, 2021. "Deep Hedging under Rough Volatility," Risks, MDPI, vol. 9(7), pages 1-20, July.
    14. Alexandre Miot, 2020. "Adversarial trading," Papers 2101.03128, arXiv.org.
    15. Loredana Ureche-Rangau & Quiterie de Rorthays, 2009. "More on the volatility-trading volume relationship in emerging markets: The Chinese stock market," Journal of Applied Statistics, Taylor & Francis Journals, vol. 36(7), pages 779-799.
    16. Hans Buhler & Blanka Horvath & Terry Lyons & Imanol Perez Arribas & Ben Wood, 2020. "A Data-driven Market Simulator for Small Data Environments," Papers 2006.14498, arXiv.org.
    17. Lim, Terence & Lo, Andrew W. & Merton, Robert C. & Scholes, Myron S., 2006. "The Derivatives Sourcebook," Foundations and Trends(R) in Finance, now publishers, vol. 1(5–6), pages 365-572, April.
    18. Edmond Lezmi & Jules Roche & Thierry Roncalli & Jiali Xu, 2020. "Improving the Robustness of Trading Strategy Backtesting with Boltzmann Machines and Generative Adversarial Networks," Papers 2007.04838, arXiv.org.
    19. Ilu, Ahmad Ibraheem, 2020. "Exchange Rate Pass through to Stock Prices: A Multi GARCH Approach," MPRA Paper 98442, University Library of Munich, Germany.
    20. Rim Ammar Lamouchi, 2020. "Long Memory and Stock Market Efficiency: Case of Saudi Arabia," International Journal of Economics and Financial Issues, Econjournals, vol. 10(3), pages 29-34.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2501.03993. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.