IDEAS home Printed from https://ideas.repec.org/a/eee/transb/v190y2024ics0191261524001851.html
   My bibliography  Save this article

Comparing hundreds of machine learning and discrete choice models for travel demand modeling: An empirical benchmark

Author

Listed:
  • Wang, Shenhao
  • Mo, Baichuan
  • Zheng, Yunhan
  • Hess, Stephane
  • Zhao, Jinhua

Abstract

Numerous studies have compared machine learning (ML) and discrete choice models (DCMs) in predicting travel demand. However, these studies often lack generalizability as they compare models deterministically without considering contextual variations. To address this limitation, our study develops an empirical benchmark by designing a tournament model to learn the intrinsic predictive values of ML and DCMs. This novel approach enables us to efficiently summarize a large number of experiments, quantify the randomness in model comparisons, and use formal statistical tests to differentiate between the model and contextual effects. This benchmark study compares two large-scale data sources: a database compiled from literature review summarizing 136 experiments from 35 studies, and our own experiment data, encompassing a total of 6970 experiments from 105 models and 12 model families, tested repeatedly on three datasets, sample sizes, and choice categories. This benchmark study yields two key findings. Firstly, many ML models, particularly the ensemble methods and deep learning, statistically outperform the DCM family and its individual variants (i.e., multinomial, nested, and mixed logit), thus corroborating with the previous research. However, this study also highlights the crucial role of the contextual factors (i.e., data sources, inputs and choice categories), which can explain models’ predictive performance more effectively than the differences in model types alone. Model performance varies significantly with data sources, improving with larger sample sizes and lower dimensional alternative sets. After controlling all the model and contextual factors, significant randomness still remains, implying inherent uncertainty in such model comparisons. Overall, we suggest that future researchers shift more focus from context-specific and deterministic model comparisons towards examining model transferability across contexts and characterizing the inherent uncertainty in ML, thus creating more robust and generalizable next-generation travel demand models.

Suggested Citation

  • Wang, Shenhao & Mo, Baichuan & Zheng, Yunhan & Hess, Stephane & Zhao, Jinhua, 2024. "Comparing hundreds of machine learning and discrete choice models for travel demand modeling: An empirical benchmark," Transportation Research Part B: Methodological, Elsevier, vol. 190(C).
  • Handle: RePEc:eee:transb:v:190:y:2024:i:c:s0191261524001851
    DOI: 10.1016/j.trb.2024.103061
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0191261524001851
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.trb.2024.103061?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Zhou, Xiaolu & Wang, Mingshu & Li, Dongying, 2019. "Bike-sharing or taxi? Modeling the choices of travel mode in Chicago using machine learning," Journal of Transport Geography, Elsevier, vol. 79(C), pages 1-1.
    2. Allahviranloo, Mahdieh & Recker, Will, 2013. "Daily activity pattern recognition by using support vector machines with multiple classes," Transportation Research Part B: Methodological, Elsevier, vol. 58(C), pages 16-43.
    3. Hensher, David A. & Ton, Tu T., 2000. "A comparison of the predictive potential of artificial neural networks and nested logit models for commuter mode choice," Transportation Research Part E: Logistics and Transportation Review, Elsevier, vol. 36(3), pages 155-172, September.
    4. Wang, Shenhao & Zhao, Jinhua, 2019. "Risk preference and adoption of autonomous vehicles," Transportation Research Part A: Policy and Practice, Elsevier, vol. 126(C), pages 215-229.
    5. Sifringer, Brian & Lurkin, Virginie & Alahi, Alexandre, 2020. "Enhancing discrete choice models with representation learning," Transportation Research Part B: Methodological, Elsevier, vol. 140(C), pages 236-261.
    6. Han, Yafei & Pereira, Francisco Camara & Ben-Akiva, Moshe & Zegras, Christopher, 2022. "A neural-embedded discrete choice model: Learning taste representation with strengthened interpretability," Transportation Research Part B: Methodological, Elsevier, vol. 163(C), pages 166-186.
    7. Holmgren, Johan, 2007. "Meta-analysis of public transport demand," Transportation Research Part A: Policy and Practice, Elsevier, vol. 41(10), pages 1021-1035, December.
    8. Muhammad Shafique & Eiji Hato, 2015. "Use of acceleration data for transportation mode prediction," Transportation, Springer, vol. 42(1), pages 163-188, January.
    9. Wang, Shenhao & Wang, Qingyi & Zhao, Jinhua, 2020. "Multitask learning deep neural networks to combine revealed and stated preference data," Journal of choice modelling, Elsevier, vol. 37(C).
    10. Melvin Wong & Bilal Farooq, 2019. "ResLogit: A residual neural network logit model for data-driven choice modelling," Papers 1912.10058, arXiv.org, revised Feb 2021.
    11. Mozolin, M. & Thill, J. -C. & Lynn Usery, E., 2000. "Trip distribution forecasting with multilayer perceptron neural networks: A critical evaluation," Transportation Research Part B: Methodological, Elsevier, vol. 34(1), pages 53-73, January.
    12. Melo, Patricia C. & Graham, Daniel J. & Noland, Robert B., 2009. "A meta-analysis of estimates of urban agglomeration economies," Regional Science and Urban Economics, Elsevier, vol. 39(3), pages 332-342, May.
    13. Liang Tang & Chenfeng Xiong & Lei Zhang, 2015. "Decision tree method for modeling travel mode switching in a dynamic behavioral process," Transportation Planning and Technology, Taylor & Francis Journals, vol. 38(8), pages 833-850, December.
    14. S. le Cessie & J. C. van Houwelingen, 1992. "Ridge Estimators in Logistic Regression," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 41(1), pages 191-201, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Shenhao Wang & Baichuan Mo & Yunhan Zheng & Stephane Hess & Jinhua Zhao, 2021. "Comparing hundreds of machine learning classifiers and discrete choice models in predicting travel behavior: an empirical benchmark," Papers 2102.01130, arXiv.org, revised Mar 2025.
    2. Wang, Shenhao & Wang, Qingyi & Bailey, Nate & Zhao, Jinhua, 2021. "Deep neural networks for choice analysis: A statistical learning theory perspective," Transportation Research Part B: Methodological, Elsevier, vol. 148(C), pages 60-81.
    3. Smeele, Nicholas V.R. & Chorus, Caspar G. & Schermer, Maartje H.N. & de Bekker-Grob, Esther W., 2023. "Towards machine learning for moral choice analysis in health economics: A literature review and research agenda," Social Science & Medicine, Elsevier, vol. 326(C).
    4. Wang, Qingyi & Wang, Shenhao & Zheng, Yunhan & Lin, Hongzhou & Zhang, Xiaohu & Zhao, Jinhua & Walker, Joan, 2024. "Deep hybrid model with satellite imagery: How to combine demand modeling and computer vision for travel behavior analysis?," Transportation Research Part B: Methodological, Elsevier, vol. 179(C).
    5. Wang, Shenhao & Mo, Baichuan & Zhao, Jinhua, 2021. "Theory-based residual neural networks: A synergy of discrete choice models and deep neural networks," Transportation Research Part B: Methodological, Elsevier, vol. 146(C), pages 333-358.
    6. Kim, Eui-Jin & Bansal, Prateek, 2024. "A new flexible and partially monotonic discrete choice model," Transportation Research Part B: Methodological, Elsevier, vol. 183(C).
    7. Ali, Azam & Kalatian, Arash & Choudhury, Charisma F., 2023. "Comparing and contrasting choice model and machine learning techniques in the context of vehicle ownership decisions," Transportation Research Part A: Policy and Practice, Elsevier, vol. 173(C).
    8. Georges Sfeir & Filipe Rodrigues & Maya Abou-Zeid, 2021. "Gaussian Process Latent Class Choice Models," Papers 2101.12252, arXiv.org.
    9. Niousha Bagheri & Milad Ghasri & Michael Barlow, 2025. "RUM-NN: A Neural Network Model Compatible with Random Utility Maximisation for Discrete Choice Setups," Papers 2501.05221, arXiv.org.
    10. Sifringer, Brian & Lurkin, Virginie & Alahi, Alexandre, 2020. "Enhancing discrete choice models with representation learning," Transportation Research Part B: Methodological, Elsevier, vol. 140(C), pages 236-261.
    11. Han, Yafei & Pereira, Francisco Camara & Ben-Akiva, Moshe & Zegras, Christopher, 2022. "A neural-embedded discrete choice model: Learning taste representation with strengthened interpretability," Transportation Research Part B: Methodological, Elsevier, vol. 163(C), pages 166-186.
    12. Qingyi Wang & Shenhao Wang & Yunhan Zheng & Hongzhou Lin & Xiaohu Zhang & Jinhua Zhao & Joan Walker, 2023. "Deep hybrid model with satellite imagery: how to combine demand modeling and computer vision for behavior analysis?," Papers 2303.04204, arXiv.org, revised Feb 2024.
    13. Jiajia Zhang & Tao Feng & Harry Timmermans & Zhengkui Lin, 2023. "Improved imputation of rule sets in class association rule modeling: application to transportation mode choice," Transportation, Springer, vol. 50(1), pages 63-106, February.
    14. Shenhao Wang & Baichuan Mo & Jinhua Zhao, 2020. "Theory-based residual neural networks: A synergy of discrete choice models and deep neural networks," Papers 2010.11644, arXiv.org.
    15. Liu, Yicong & Loa, Patrick & Wang, Kaili & Habib, Khandker Nurul, 2023. "Theory-driven or data-driven? Modelling ride-sourcing mode choices using integrated choice and latent variable model and multi-task learning deep neural networks," Journal of choice modelling, Elsevier, vol. 48(C).
    16. Ha, Tran Vinh & Asada, Takumi & Arimura, Mikiharu, 2019. "Determination of the influence factors on household vehicle ownership patterns in Phnom Penh using statistical and machine learning methods," Journal of Transport Geography, Elsevier, vol. 78(C), pages 70-86.
    17. Dubey, Subodh & Cats, Oded & Hoogendoorn, Serge & Bansal, Prateek, 2022. "A multinomial probit model with Choquet integral and attribute cut-offs," Transportation Research Part B: Methodological, Elsevier, vol. 158(C), pages 140-163.
    18. Holmgren, Johan, 2020. "The effect of public transport quality on car ownership – A source of wider benefits?," Research in Transportation Economics, Elsevier, vol. 83(C).
    19. Ibrahim A. Nafisah & Irsa Sajjad & Mohammed A. Alshahrani & Osama Abdulaziz Alamri & Mohammed M. A. Almazah & Javid Gani Dar, 2024. "Statistical Predictive Hybrid Choice Modeling: Exploring Embedded Neural Architecture," Mathematics, MDPI, vol. 12(19), pages 1-20, October.
    20. Zheng Zhu & Xiqun Chen & Chenfeng Xiong & Lei Zhang, 2018. "A mixed Bayesian network for two-dimensional decision modeling of departure time and mode choice," Transportation, Springer, vol. 45(5), pages 1499-1522, September.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:transb:v:190:y:2024:i:c:s0191261524001851. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/wps/find/journaldescription.cws_home/548/description#description .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.