IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2102.01130.html
   My bibliography  Save this paper

Comparing hundreds of machine learning classifiers and discrete choice models in predicting travel behavior: an empirical benchmark

Author

Listed:
  • Shenhao Wang
  • Baichuan Mo
  • Stephane Hess
  • Jinhua Zhao

Abstract

Researchers have compared machine learning (ML) classifiers and discrete choice models (DCMs) in predicting travel behavior, but the generalizability of the findings is limited by the specifics of data, contexts, and authors' expertise. This study seeks to provide a generalizable empirical benchmark by comparing hundreds of ML and DCM classifiers in a highly structured manner. The experiments evaluate both prediction accuracy and computational cost by spanning four hyper-dimensions, including 105 ML and DCM classifiers from 12 model families, 3 datasets, 3 sample sizes, and 3 outputs. This experimental design leads to an immense number of 6,970 experiments, which are corroborated with a meta dataset of 136 experiment points from 35 previous studies. This study is hitherto the most comprehensive and almost exhaustive comparison of the classifiers for travel behavioral prediction. We found that the ensemble methods and deep neural networks achieve the highest predictive performance, but at a relatively high computational cost. Random forests are the most computationally efficient, balancing between prediction and computation. While discrete choice models offer accuracy with only 3-4 percentage points lower than the top ML classifiers, they have much longer computational time and become computationally impossible with large sample size, high input dimensions, or simulation-based estimation. The relative ranking of the ML and DCM classifiers is highly stable, while the absolute values of the prediction accuracy and computational time have large variations. Overall, this paper suggests using deep neural networks, model ensembles, and random forests as baseline models for future travel behavior prediction. For choice modeling, the DCM community should switch more attention from fitting models to improving computational efficiency, so that the DCMs can be widely adopted in the big data context.

Suggested Citation

  • Shenhao Wang & Baichuan Mo & Stephane Hess & Jinhua Zhao, 2021. "Comparing hundreds of machine learning classifiers and discrete choice models in predicting travel behavior: an empirical benchmark," Papers 2102.01130, arXiv.org.
  • Handle: RePEc:arx:papers:2102.01130
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2102.01130
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Reid Ewing & Robert Cervero, 2010. "Travel and the Built Environment," Journal of the American Planning Association, Taylor & Francis Journals, vol. 76(3), pages 265-294.
    2. Train,Kenneth E., 2009. "Discrete Choice Methods with Simulation," Cambridge Books, Cambridge University Press, number 9780521766555, October.
    3. Mozolin, M. & Thill, J. -C. & Lynn Usery, E., 2000. "Trip distribution forecasting with multilayer perceptron neural networks: A critical evaluation," Transportation Research Part B: Methodological, Elsevier, vol. 34(1), pages 53-73, January.
    4. Zhou, Xiaolu & Wang, Mingshu & Li, Dongying, 2019. "Bike-sharing or taxi? Modeling the choices of travel mode in Chicago using machine learning," Journal of Transport Geography, Elsevier, vol. 79(C), pages 1-1.
    5. Wang, Shenhao & Zhao, Jinhua, 2019. "Risk preference and adoption of autonomous vehicles," Transportation Research Part A: Policy and Practice, Elsevier, vol. 126(C), pages 215-229.
    6. Louviere,Jordan J. & Hensher,David A. & Swait,Joffre D. With contributions by-Name:Adamowicz,Wiktor, 2000. "Stated Choice Methods," Cambridge Books, Cambridge University Press, number 9780521788304.
    7. Allahviranloo, Mahdieh & Recker, Will, 2013. "Daily activity pattern recognition by using support vector machines with multiple classes," Transportation Research Part B: Methodological, Elsevier, vol. 58(C), pages 16-43.
    8. Liang Tang & Chenfeng Xiong & Lei Zhang, 2015. "Decision tree method for modeling travel mode switching in a dynamic behavioral process," Transportation Planning and Technology, Taylor & Francis Journals, vol. 38(8), pages 833-850, December.
    9. Wang, Shenhao & Wang, Qingyi & Zhao, Jinhua, 2020. "Multitask learning deep neural networks to combine revealed and stated preference data," Journal of choice modelling, Elsevier, vol. 37(C).
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Krueger, Rico & Bierlaire, Michel & Daziano, Ricardo A. & Rashidi, Taha H. & Bansal, Prateek, 2021. "Evaluating the predictive abilities of mixed logit models with unobserved inter- and intra-individual heterogeneity," Journal of choice modelling, Elsevier, vol. 41(C).
    2. Han, Yafei & Pereira, Francisco Camara & Ben-Akiva, Moshe & Zegras, Christopher, 2022. "A neural-embedded discrete choice model: Learning taste representation with strengthened interpretability," Transportation Research Part B: Methodological, Elsevier, vol. 163(C), pages 166-186.
    3. Hamed Naseri & Edward Owen Douglas Waygood & Bobin Wang & Zachary Patterson, 2022. "Application of Machine Learning to Child Mode Choice with a Novel Technique to Optimize Hyperparameters," IJERPH, MDPI, vol. 19(24), pages 1-19, December.
    4. Connor R. Forsythe & Cristian Arteaga & John P. Helveston, 2024. "The Heterogeneous Aggregate Valence Analysis (HAVAN) Model: A Flexible Approach to Modeling Unobserved Heterogeneity in Discrete Choice Analysis," Papers 2402.00184, arXiv.org.
    5. Hu, Songhua & Xiong, Chenfeng & Chen, Peng & Schonfeld, Paul, 2023. "Examining nonlinearity in population inflow estimation using big data: An empirical comparison of explainable machine learning models," Transportation Research Part A: Policy and Practice, Elsevier, vol. 174(C).
    6. Li Tang & Chuanli Tang & Qi Fu, 2023. "Enhanced multilayer perceptron with feature selection and grid search for travel mode choice prediction," Papers 2304.12698, arXiv.org, revised Oct 2023.
    7. Smeele, Nicholas V.R. & Chorus, Caspar G. & Schermer, Maartje H.N. & de Bekker-Grob, Esther W., 2023. "Towards machine learning for moral choice analysis in health economics: A literature review and research agenda," Social Science & Medicine, Elsevier, vol. 326(C).
    8. S. Van Cranenburgh & S. Wang & A. Vij & F. Pereira & J. Walker, 2021. "Choice modelling in the age of machine learning -- discussion paper," Papers 2101.11948, arXiv.org, revised Nov 2021.
    9. Stephan Hetzenecker & Maximilian Osterhaus, 2024. "Deep Learning for the Estimation of Heterogeneous Parameters in Discrete Choice Models," Papers 2408.09560, arXiv.org.
    10. Ioannis Politis & Georgios Georgiadis & Aristomenis Kopsacheilis & Anastasia Nikolaidou & Chrysanthi Sfyri & Socrates Basbas, 2023. "A Route Choice Model for the Investigation of Drivers’ Willingness to Choose a Flyover Motorway in Greece," Sustainability, MDPI, vol. 15(5), pages 1-23, March.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Wang, Shenhao & Wang, Qingyi & Bailey, Nate & Zhao, Jinhua, 2021. "Deep neural networks for choice analysis: A statistical learning theory perspective," Transportation Research Part B: Methodological, Elsevier, vol. 148(C), pages 60-81.
    2. Wang, Shenhao & Mo, Baichuan & Zhao, Jinhua, 2021. "Theory-based residual neural networks: A synergy of discrete choice models and deep neural networks," Transportation Research Part B: Methodological, Elsevier, vol. 146(C), pages 333-358.
    3. Andani, I Gusti Ayu & La Paix Puello, Lissy & Geurs, Karst, 2021. "Modelling effects of changes in travel time and costs of toll road usage on choices for residential location, route and travel mode across population segments in the Jakarta-Bandung region, Indonesia," Transportation Research Part A: Policy and Practice, Elsevier, vol. 145(C), pages 81-102.
    4. Shenhao Wang & Baichuan Mo & Jinhua Zhao, 2020. "Theory-based residual neural networks: A synergy of discrete choice models and deep neural networks," Papers 2010.11644, arXiv.org.
    5. Haghani, Milad & Bliemer, Michiel C.J. & Hensher, David A., 2021. "The landscape of econometric discrete choice modelling research," Journal of choice modelling, Elsevier, vol. 40(C).
    6. Sfeir, Georges & Abou-Zeid, Maya & Rodrigues, Filipe & Pereira, Francisco Camara & Kaysi, Isam, 2021. "Latent class choice model with a flexible class membership component: A mixture model approach," Journal of choice modelling, Elsevier, vol. 41(C).
    7. Zhifeng Gao & Ted C. Schroeder, 2009. "Consumer responses to new food quality information: are some consumers more sensitive than others?," Agricultural Economics, International Association of Agricultural Economists, vol. 40(3), pages 339-346, May.
    8. Ortega, David L. & Wang, H. Holly & Wu, Laping & Hong, Soo Jeong, 2015. "Retail channel and consumer demand for food quality in China," China Economic Review, Elsevier, vol. 36(C), pages 359-366.
    9. Yamada, Katsunori & Sato, Masayuki, 2013. "Another avenue for anatomy of income comparisons: Evidence from hypothetical choice experiments," Journal of Economic Behavior & Organization, Elsevier, vol. 89(C), pages 35-57.
    10. Potoglou, Dimitris & Palacios, Juan & Feijoo, Claudio & Gómez Barroso, Jose-Luis, 2015. "The supply of personal information: A study on the determinants of information provision in e-commerce scenarios," 26th European Regional ITS Conference, Madrid 2015 127174, International Telecommunications Society (ITS).
    11. Sant'Anna, Ana Claudia & Bergtold, Jason & Shanoyan, Aleksan & Caldas, Marcellus & Granco, Gabriel, 2021. "Deal or No Deal? Analysis of Bioenergy Feedstock Contract Choice with Multiple Opt-out Options and Contract Attribute Substitutability," 2021 Conference, August 17-31, 2021, Virtual 315289, International Association of Agricultural Economists.
    12. Choi, Andy S., 2013. "Nonmarket values of major resources in the Korean DMZ areas: A test of distance decay," Ecological Economics, Elsevier, vol. 88(C), pages 97-107.
    13. Doherty, Edel & Campbell, Danny, 2011. "Demand for improved food safety and quality: a cross-regional comparison," 85th Annual Conference, April 18-20, 2011, Warwick University, Coventry, UK 108791, Agricultural Economics Society.
    14. Kesternich, Iris & Heiss, Florian & McFadden, Daniel & Winter, Joachim, 2013. "Suit the action to the word, the word to the action: Hypothetical choices and real decisions in Medicare Part D," Journal of Health Economics, Elsevier, vol. 32(6), pages 1313-1324.
    15. David Hensher & John Rose & Zheng Li, 2012. "Does the choice model method and/or the data matter?," Transportation, Springer, vol. 39(2), pages 351-385, March.
    16. Toşa, Cristian & Sato, Hitomi & Morikawa, Takayuki & Miwa, Tomio, 2018. "Commuting behavior in emerging urban areas: Findings of a revealed-preferences and stated-intentions survey in Cluj-Napoca, Romania," Journal of Transport Geography, Elsevier, vol. 68(C), pages 78-93.
    17. Qin, Pin & Carlsson, Fredrik & Xu, Jintao, 2009. "Forestland Reform in China: What do the Farmers Want? A Choice Experiment on Farmers’ Property Rights Preferences," Working Papers in Economics 370, University of Gothenburg, Department of Economics.
    18. Clark, Andrew E. & Senik, Claudia & Yamada, Katsunori, 2017. "When experienced and decision utility concur: The case of income comparisons," Journal of Behavioral and Experimental Economics (formerly The Journal of Socio-Economics), Elsevier, vol. 70(C), pages 1-9.
    19. Ping Qin & Fredrik Carlsson & Jintao Xu, 2011. "Forest Tenure Reform in China: A Choice Experiment on Farmers’ Property Rights Preferences," Land Economics, University of Wisconsin Press, vol. 87(3), pages 473-487.
    20. Joachim Marti, 2012. "Assessing preferences for improved smoking cessation medications: a discrete choice experiment," The European Journal of Health Economics, Springer;Deutsche Gesellschaft für Gesundheitsökonomie (DGGÖ), vol. 13(5), pages 533-548, October.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2102.01130. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.