IDEAS home Printed from https://ideas.repec.org/a/gam/jftint/v17y2025i2p95-d1594853.html
   My bibliography  Save this article

Robust Synthetic Data Generation for Sequential Financial Models Using Hybrid Variational Autoencoder–Markov Chain Monte Carlo Architectures

Author

Listed:
  • Francesco Bruni Prenestino

    (Department of Mathematics and Physics, Catholic University of the Sacred Heart, 25121 Brescia, Italy)

  • Enrico Barbierato

    (Department of Mathematics and Physics, Catholic University of the Sacred Heart, 25121 Brescia, Italy)

  • Alice Gatti

    (Department of Mathematics and Physics, Catholic University of the Sacred Heart, 25121 Brescia, Italy)

Abstract

Generating high-quality synthetic data is essential for advancing machine learning applications in financial time series, where data scarcity and privacy concerns often pose significant challenges. This study proposes a novel hybrid architecture that combines variational autoencoders (VAEs) with Markov Chain Monte Carlo (MCMC) sampling to enhance the generation of robust synthetic sequential data. The model leverages Gated Recurrent Unit (GRU) layers for capturing long-term temporal dependencies and MCMC sampling for effective latent space exploration, ensuring high variability and accuracy. Experimental evaluations on datasets of Google, Tesla, and Nestlé stock prices demonstrate the model’s superior performance in preserving statistical and temporal patterns, as validated by quantitative metrics (discriminative and predictive scores), statistical tests (Kolmogorov–Smirnov), and t-Distributed Stochastic Neighbour Embedding (t-SNE) visualisations. The experiments reveal the model’s scalability, maintaining high fidelity even under augmented dataset sizes and missing data scenarios. These findings position the proposed framework as a computationally efficient and structurally simple alternative to Generative Adversarial Network (GAN)-based methods, suitable for real-world applications in data-driven financial modelling.

Suggested Citation

  • Francesco Bruni Prenestino & Enrico Barbierato & Alice Gatti, 2025. "Robust Synthetic Data Generation for Sequential Financial Models Using Hybrid Variational Autoencoder–Markov Chain Monte Carlo Architectures," Future Internet, MDPI, vol. 17(2), pages 1-31, February.
  • Handle: RePEc:gam:jftint:v:17:y:2025:i:2:p:95-:d:1594853
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/1999-5903/17/2/95/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/1999-5903/17/2/95/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Minhyeok Lee, 2023. "Recent Advances in Generative Adversarial Networks for Gene Expression Data: A Comprehensive Review," Mathematics, MDPI, vol. 11(14), pages 1-26, July.
    2. van der Voort, Haiko & van Bulderen, Sabine & Cunningham, Scott & Janssen, Marijn, 2021. "Data science as knowledge creation a framework for synergies between data analysts and domain professionals," Technological Forecasting and Social Change, Elsevier, vol. 173(C).
    3. Isaac Tamblyn & Tengkai Yu & Ian Benlolo, 2023. "fintech-kMC: Agent based simulations of financial platforms for design and testing of machine learning systems," Papers 2301.01807, arXiv.org.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Chatterjee, Sheshadri & Chaudhuri, Ranjan & Gupta, Shivam & Sivarajah, Uthayasankar & Bag, Surajit, 2023. "Assessing the impact of big data analytics on decision-making processes, forecasting, and performance of a firm," Technological Forecasting and Social Change, Elsevier, vol. 196(C).
    2. Sun, Pengfei & Yuan, Chunhui & Li, Xiaolong & Di, Jia, 2024. "Big data analytics, firm risk and corporate policies: Evidence from China," Research in International Business and Finance, Elsevier, vol. 70(PB).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jftint:v:17:y:2025:i:2:p:95-:d:1594853. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.