IDEAS home Printed from https://ideas.repec.org/a/spr/jcsosc/v8y2025i1d10.1007_s42001-024-00332-0.html
   My bibliography  Save this article

A comprehensive investigation of variational auto-encoders for population synthesis

Author

Listed:
  • Abdoul Razac Sané

    (University Gustave Eiffel)

  • Pierre-Olivier Vandanjon

    (University Gustave Eiffel)

  • Rachid Belaroussi

    (University Gustave Eiffel)

  • Pierre Hankach

    (University Gustave Eiffel)

Abstract

The use of synthetic populations has grown considerably over the recent years, in revolutionizing studies conducted within various fields, including social science research, urban planning, public health and transportation modeling. These synthetic populations prove to be valuable, as substitutes for the often missing or sensitive real data, and moreover are capable of preserving both privacy and representativeness. They are typically constructed from aggregate and/or sample data. Recently, new methods for generating synthetic populations based on deep learning, notably Variational Autoencoders (VAEs), have been developed. Such methods serve to overcome the limitations of traditional methods, such as Iterative Proportional Fitting (IPF), which are unable to generate agents with cross-modalities not found in the sample data. As such, IPF requires large samples to generate a synthetic population closely resembling the actual one. Conversely, the advantage of VAE lies in their ability to generate agents not found in the sample data, albeit with the risk of creating agents not existing in the actual population. However, the practical documentation as well as detailed analyses of the architectures and results from implementation of these deep learning approaches, in particular VAE, are limited, thus making these methods difficult to appropriate for practitioners. This paper focuses on generating synthetic populations using VAE. First, an in-depth and accessible theoretical explanation of how VAEs function is provided. Next, a detailed study of these methods is carried out by testing the various architectures, parameters, sample sizes and evaluation indicators necessary to guarantee high-quality results. Highlighted herein is the ability of VAEs to generate large datasets with a small training sample, in addition to VAE performance in generating new realistic individuals not present in the learning base. Certain limitations are identified, including the difficulties encountered by VAEs in managing numerical attributes and the need for post-processing to eliminate unrealistic individuals. In conclusion, despite a number of limitations, VAE constitutes a very promising methodology for generating synthetic populations, in offering practitioners numerous advantages. This paper is accompanied by a Python notebook to assist interested readers implement this new methodology.

Suggested Citation

  • Abdoul Razac Sané & Pierre-Olivier Vandanjon & Rachid Belaroussi & Pierre Hankach, 2025. "A comprehensive investigation of variational auto-encoders for population synthesis," Journal of Computational Social Science, Springer, vol. 8(1), pages 1-34, February.
  • Handle: RePEc:spr:jcsosc:v:8:y:2025:i:1:d:10.1007_s42001-024-00332-0
    DOI: 10.1007/s42001-024-00332-0
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s42001-024-00332-0
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s42001-024-00332-0?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Saadi, Ismaïl & Mustafa, Ahmed & Teller, Jacques & Farooq, Bilal & Cools, Mario, 2016. "Hidden Markov Model-based population synthesis," Transportation Research Part B: Methodological, Elsevier, vol. 90(C), pages 1-21.
    2. David Pritchard & Eric Miller, 2012. "Advances in population synthesis: fitting many attributes per agent and fitting to household and person margins simultaneously," Transportation, Springer, vol. 39(3), pages 685-704, May.
    3. Kirk Harland & Alison Heppenstall & Dianna Smith & Mark Birkin, 2012. "Creating Realistic Synthetic Populations at Varying Spatial Scales: A Comparative Critique of Population Synthesis Techniques," Journal of Artificial Societies and Social Simulation, Journal of Artificial Societies and Social Simulation, vol. 15(1), pages 1-1.
    4. Robin Lovelace & Mark Birkin & Dimitris Ballas & Eveline van Leeuwen, 2015. "Evaluating the Performance of Iterative Proportional Fitting for Spatial Microsimulation: New Tests for an Established Technique," Journal of Artificial Societies and Social Simulation, Journal of Artificial Societies and Social Simulation, vol. 18(2), pages 1-21.
    5. Sun, Lijun & Erath, Alexander & Cai, Ming, 2018. "A hierarchical mixture modeling framework for population synthesis," Transportation Research Part B: Methodological, Elsevier, vol. 114(C), pages 199-212.
    6. Kevin Chapuis & Patrick Taillandier & Alexis Drogoul, 2022. "Generation of Synthetic Populations in Social Simulations: A Review of Methods and Practices," Journal of Artificial Societies and Social Simulation, Journal of Artificial Societies and Social Simulation, vol. 25(2), pages 1-6.
    7. Anugrah Ilahi & Kay W. Axhausen, 2019. "Integrating Bayesian network and generalized raking for population synthesis in Greater Jakarta," Regional Studies, Regional Science, Taylor & Francis Journals, vol. 6(1), pages 623-636, January.
    8. Bouzouina, Louafi & Baraklianos, Ioannis & Bonnel, Patrick & Aissaoui, Hind, 2021. "Renters vs owners: The impact of accessibility on residential location choice. Evidence from Lyon urban area, France (1999–2013)," Transport Policy, Elsevier, vol. 109(C), pages 72-84.
    9. Peijun Ye & Xiaolin Hu & Yong Yuan & Fei-Yue Wang, 2017. "Population Synthesis Based on Joint Distribution Inference Without Disaggregate Samples," Journal of Artificial Societies and Social Simulation, Journal of Artificial Societies and Social Simulation, vol. 20(4), pages 1-16.
    10. Farooq, Bilal & Bierlaire, Michel & Hurtubia, Ricardo & Flötteröd, Gunnar, 2013. "Simulation based population synthesis," Transportation Research Part B: Methodological, Elsevier, vol. 58(C), pages 243-263.
    11. Pei-jun Ye & Xiao Wang & Cheng Chen & Yue-tong Lin & Fei-Yue Wang, 2016. "Hybrid Agent Modeling in Population Simulation: Current Approaches and Future Directions," Journal of Artificial Societies and Social Simulation, Journal of Artificial Societies and Social Simulation, vol. 19(1), pages 1-12.
    12. Templ, Matthias & Meindl, Bernhard & Kowarik, Alexander & Dupriez, Olivier, 2017. "Simulation of Synthetic Complex Data: The R Package simPop," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 79(i10).
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Mohamed Khachman & Catherine Morency & Francesco Ciari, 2024. "Integrated multiresolution framework for spatialized population synthesis," Transportation, Springer, vol. 51(3), pages 823-852, June.
    2. Nicholas Fournier & Eleni Christofa & Arun Prakash Akkinepally & Carlos Lima Azevedo, 2021. "Integrated population synthesis and workplace assignment using an efficient optimization-based person-household matching method," Transportation, Springer, vol. 48(2), pages 1061-1087, April.
    3. Rachid Belaroussi & Younes Delhoum, 2024. "Forecasting Daily Activity Plans of a Synthetic Population in an Upcoming District," Forecasting, MDPI, vol. 6(2), pages 1-26, May.
    4. Ma, Lu & Srinivasan, Sivaramakrishnan, 2016. "An empirical assessment of factors affecting the accuracy of target-year synthetic populations," Transportation Research Part A: Policy and Practice, Elsevier, vol. 85(C), pages 247-264.
    5. Sun, Lijun & Erath, Alexander & Cai, Ming, 2018. "A hierarchical mixture modeling framework for population synthesis," Transportation Research Part B: Methodological, Elsevier, vol. 114(C), pages 199-212.
    6. Stanislav S. Borysov & Jeppe Rich, 2021. "Introducing synthetic pseudo panels: application to transport behaviour dynamics," Transportation, Springer, vol. 48(5), pages 2493-2520, October.
    7. Nejad, Mohammad Motalleb & Erdogan, Sevgi & Cirillo, Cinzia, 2021. "A statistical approach to small area synthetic population generation as a basis for carless evacuation planning," Journal of Transport Geography, Elsevier, vol. 90(C).
    8. Yu Han & Changjie Chen & Zhong-Ren Peng & Pallab Mozumder, 2022. "Evaluating impacts of coastal flooding on the transportation system using an activity-based travel demand model: a case study in Miami-Dade County, FL," Transportation, Springer, vol. 49(1), pages 163-184, February.
    9. Martin Johnsen & Oliver Brandt & Sergio Garrido & Francisco C. Pereira, 2020. "Population synthesis for urban resident modeling using deep generative models," Papers 2011.06851, arXiv.org.
    10. Trond Husby & Olga Ivanova & Mark Thissen, 2018. "Simulating the Joint Distribution of Individuals, Households and Dwellings in Small Areas," International Journal of Microsimulation, International Microsimulation Association, vol. 11(2), pages 169-190.
    11. Jian Liu & Xiaosu Ma & Yi Zhu & Jing Li & Zong He & Sheng Ye, 2021. "Generating and Visualizing Spatially Disaggregated Synthetic Population Using a Web-Based Geospatial Service," Sustainability, MDPI, vol. 13(3), pages 1-16, February.
    12. He, Brian Y. & Zhou, Jinkai & Ma, Ziyi & Chow, Joseph Y.J. & Ozbay, Kaan, 2020. "Evaluation of city-scale built environment policies in New York City with an emerging-mobility-accessible synthetic population," Transportation Research Part A: Policy and Practice, Elsevier, vol. 141(C), pages 444-467.
    13. Lovelace, Robin & Ballas, Dimitris & Watson, Matt, 2014. "A spatial microsimulation approach for the analysis of commuter patterns: from individual to regional levels," Journal of Transport Geography, Elsevier, vol. 34(C), pages 282-296.
    14. Andrew Bwambale & Charisma F. Choudhury & Stephane Hess & Md. Shahadat Iqbal, 2021. "Getting the best of both worlds: a framework for combining disaggregate travel survey data and aggregate mobile phone data for trip generation modelling," Transportation, Springer, vol. 48(5), pages 2287-2314, October.
    15. Saadi, Ismaïl & Mustafa, Ahmed & Teller, Jacques & Farooq, Bilal & Cools, Mario, 2016. "Hidden Markov Model-based population synthesis," Transportation Research Part B: Methodological, Elsevier, vol. 90(C), pages 1-21.
    16. Jason Hawkins & Khandker Nurul Habib, 2023. "A multi-source data fusion framework for joint population, expenditure, and time use synthesis," Transportation, Springer, vol. 50(4), pages 1323-1346, August.
    17. Benjamin Cottreau & Adel Adraoui & Ouassim Manout & Louafi Bouzouina, 2023. "Spatio‐temporal patterns of the impact of COVID‐19 on public transit: An exploratory analysis from Lyon, France," Regional Science Policy & Practice, Wiley Blackwell, vol. 15(8), pages 1702-1721, October.
    18. Qingxu Huang & Dawn C Parker & Tatiana Filatova & Shipeng Sun, 2014. "A Review of Urban Residential Choice Models Using Agent-Based Modeling," Environment and Planning B, , vol. 41(4), pages 661-689, August.
    19. Alberto Vitalini & Simona Ballabio & Flavio Verrecchia, 2024. "Rebuilding a pseudo population register for estimating physical vulnerability at the local level: a case study of spatial microsimulation in Sondrio," RIEDS - Rivista Italiana di Economia, Demografia e Statistica - The Italian Journal of Economic, Demographic and Statistical Studies, SIEDS Societa' Italiana di Economia Demografia e Statistica, vol. 78(1), pages 55-64, January-M.
    20. Yang, Ziqi & Li, Xinghua & Guo, Yuntao & Qian, Xinwu, 2023. "Understanding active transportation accessibility's impacts on polycentric and monocentric cities' housing price," Research in Transportation Economics, Elsevier, vol. 98(C).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:jcsosc:v:8:y:2025:i:1:d:10.1007_s42001-024-00332-0. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.