IDEAS home Printed from https://ideas.repec.org/a/eee/eejocm/v28y2018icp167-182.html
   My bibliography  Save this article

Is your dataset big enough? Sample size requirements when using artificial neural networks for discrete choice analysis

Author

Listed:
  • Alwosheel, Ahmad
  • van Cranenburgh, Sander
  • Chorus, Caspar G.

Abstract

Artificial Neural Networks (ANNs) are increasingly used for discrete choice analysis. But, at present, it is unknown what sample size requirements are appropriate when using ANNs in this particular context. This paper fills this knowledge gap: we empirically establish a rule-of-thumb for ANN-based discrete choice analysis based on analyses of synthetic and real data. To investigate the effect of complexity of the data generating process on the minimum required sample size, we conduct extensive Monte Carlo analyses using a series of different model specifications with different levels of model complexity, including RUM and RRM models, with and without random taste parameters. Based on our analyses we advise to use a minimum sample size of fifty times the number of weights in the ANN; it should be noted, that the number of weights is generally much larger than the number of parameters in a discrete choice model. This rule-of-thumb is considerably more conservative than the rule-of-thumb that is most often used in the ANN community, which advises to use at least ten times the number of weights.

Suggested Citation

  • Alwosheel, Ahmad & van Cranenburgh, Sander & Chorus, Caspar G., 2018. "Is your dataset big enough? Sample size requirements when using artificial neural networks for discrete choice analysis," Journal of choice modelling, Elsevier, vol. 28(C), pages 167-182.
  • Handle: RePEc:eee:eejocm:v:28:y:2018:i:c:p:167-182
    DOI: 10.1016/j.jocm.2018.07.002
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S1755534518300058
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.jocm.2018.07.002?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Joel Huber and Kenneth Train., 2000. "On the Similarity of Classical and Bayesian Estimates of Individual Mean Partworths," Economics Working Papers E00-289, University of California at Berkeley.
    2. McFadden, Daniel L., 1984. "Econometric analysis of qualitative response models," Handbook of Econometrics, in: Z. Griliches† & M. D. Intriligator (ed.), Handbook of Econometrics, edition 1, volume 2, chapter 24, pages 1395-1457, Elsevier.
    3. Caspar Chorus & Michel Bierlaire, 2013. "An empirical comparison of travel choice models that capture preferences for compromise alternatives," Transportation, Springer, vol. 40(3), pages 549-562, May.
    4. Hensher, David A. & Ton, Tu T., 2000. "A comparison of the predictive potential of artificial neural networks and nested logit models for commuter mode choice," Transportation Research Part E: Logistics and Transportation Review, Elsevier, vol. 36(3), pages 155-172, September.
    5. Train,Kenneth E., 2009. "Discrete Choice Methods with Simulation," Cambridge Books, Cambridge University Press, number 9780521766555.
    6. John Rose & Michiel Bliemer, 2013. "Sample size requirements for stated choice experiments," Transportation, Springer, vol. 40(5), pages 1021-1041, September.
    7. van Cranenburgh, Sander & Guevara, Cristian Angelo & Chorus, Caspar G., 2015. "New insights on random regret minimization models," Transportation Research Part A: Policy and Practice, Elsevier, vol. 74(C), pages 91-109.
    8. Davide Castelvecchi, 2016. "Can we open the black box of AI?," Nature, Nature, vol. 538(7623), pages 20-23, October.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Kigerl, Alex & Hamilton, Zachary & Kowalski, Melissa & Mei, Xiaohan, 2022. "The great methods bake-off: Comparing performance of machine learning algorithms," Journal of Criminal Justice, Elsevier, vol. 82(C).
    2. Koffi Dumor & Li Yao, 2019. "Estimating China’s Trade with Its Partner Countries within the Belt and Road Initiative Using Neural Network Analysis," Sustainability, MDPI, vol. 11(5), pages 1-22, March.
    3. Haoying Wang & Guohui Wu, 2022. "Modeling discrete choices with large fine-scale spatial data: opportunities and challenges," Journal of Geographical Systems, Springer, vol. 24(3), pages 325-351, July.
    4. Melvin Wong & Bilal Farooq, 2019. "Information processing constraints in travel behaviour modelling: A generative learning approach," Papers 1907.07036, arXiv.org, revised Jul 2019.
    5. Smeele, Nicholas V.R. & Chorus, Caspar G. & Schermer, Maartje H.N. & de Bekker-Grob, Esther W., 2023. "Towards machine learning for moral choice analysis in health economics: A literature review and research agenda," Social Science & Medicine, Elsevier, vol. 326(C).
    6. Horvath, Sabine & Soot, Matthias & Zaddach, Sebastian & Neuner, Hans & Weitkamp, Alexandra, 2021. "Deriving adequate sample sizes for ANN-based modelling of real estate valuation tasks by complexity analysis," Land Use Policy, Elsevier, vol. 107(C).
    7. Sophia Voulgaropoulou & Nikolaos Samaras & Nikolaos Ploskas, 2022. "Predicting the Execution Time of the Primal and Dual Simplex Algorithms Using Artificial Neural Networks," Mathematics, MDPI, vol. 10(7), pages 1-21, March.
    8. Coqueret, Guillaume & Deguest, Romain, 2024. "Unexpected opportunities in misspecified predictive regressions," European Journal of Operational Research, Elsevier, vol. 318(2), pages 686-700.
    9. Shu-Long Luo & Xing Shi & Feng Yang, 2024. "A Review of Data-Driven Methods in Building Retrofit and Performance Optimization: From the Perspective of Carbon Emission Reductions," Energies, MDPI, vol. 17(18), pages 1-33, September.
    10. Mihai Mutascu & Scott W. Hegerty, 2023. "Predicting the contribution of artificial intelligence to unemployment rates: an artificial neural network approach," Journal of Economics and Finance, Springer;Academy of Economics and Finance, vol. 47(2), pages 400-416, June.
    11. Ester Vasta & Tommaso Scimone & Giovanni Nobile & Otto Eberhardt & Daniele Dugo & Massimiliano Maurizio De Benedetti & Luigi Lanuzza & Giuseppe Scarcella & Luca Patanè & Paolo Arena & Mario Cacciato, 2023. "Models for Battery Health Assessment: A Comparative Evaluation," Energies, MDPI, vol. 16(2), pages 1-34, January.
    12. Alwosheel, Ahmad & van Cranenburgh, Sander & Chorus, Caspar G., 2019. "‘Computer says no’ is not enough: Using prototypical examples to diagnose artificial neural networks for discrete choice analysis," Journal of choice modelling, Elsevier, vol. 33(C).
    13. Ibrahim A. Nafisah & Irsa Sajjad & Mohammed A. Alshahrani & Osama Abdulaziz Alamri & Mohammed M. A. Almazah & Javid Gani Dar, 2024. "Statistical Predictive Hybrid Choice Modeling: Exploring Embedded Neural Architecture," Mathematics, MDPI, vol. 12(19), pages 1-20, October.
    14. Melvin Wong & Bilal Farooq, 2019. "ResLogit: A residual neural network logit model for data-driven choice modelling," Papers 1912.10058, arXiv.org, revised Feb 2021.
    15. Salah Alghamdi & Waiching Tang & Sittimont Kanjanabootra & Dariusz Alterman, 2024. "Optimising Building Energy and Comfort Predictions with Intelligent Computational Model," Sustainability, MDPI, vol. 16(8), pages 1-18, April.
    16. Ying Zhang & Mutahar Safdar & Jiarui Xie & Jinghao Li & Manuel Sage & Yaoyao Fiona Zhao, 2023. "A systematic review on data of additive manufacturing for machine learning applications: the data quality, type, preprocessing, and management," Journal of Intelligent Manufacturing, Springer, vol. 34(8), pages 3305-3340, December.
    17. Venkatraj, V. & Dixit, M.K., 2022. "Challenges in implementing data-driven approaches for building life cycle energy assessment: A review," Renewable and Sustainable Energy Reviews, Elsevier, vol. 160(C).
    18. Mahmoud Abdel-Sattar & Adel M. Al-Saif & Abdulwahed M. Aboukarima & Dalia H. Eshra & Lidia Sas-Paszt, 2022. "Quality Attributes Prediction of Flame Seedless Grape Clusters Based on Nutritional Status Employing Multiple Linear Regression Technique," Agriculture, MDPI, vol. 12(9), pages 1-19, August.
    19. Wang, Yuanping & Hou, Lingchun & Hu, Lang & Cai, Weiguang & Wang, Lin & Dai, Cuilian & Chen, Juntao, 2023. "How family structure type affects household energy consumption: A heterogeneous study based on Chinese household evidence," Energy, Elsevier, vol. 284(C).
    20. Guillaume Coqueret & Romain Deguest, 2024. "Unexpected opportunities in misspecified predictive regressions," Post-Print hal-04595355, HAL.
    21. Sadia Samar Ali & Rajbir Kaur & D. Jinil Persis & Raiswa Saha & Murugan Pattusamy & V. Raja Sreedharan, 2023. "Developing a hybrid evaluation approach for the low carbon performance on sustainable manufacturing environment," Annals of Operations Research, Springer, vol. 324(1), pages 249-281, May.
    22. Bonfiglio, A. & Camaioni, B. & Carta, V. & Cristiano, S., 2023. "Estimating the common agricultural policy milestones and targets by neural networks," Evaluation and Program Planning, Elsevier, vol. 99(C).
    23. S. Van Cranenburgh & S. Wang & A. Vij & F. Pereira & J. Walker, 2021. "Choice modelling in the age of machine learning -- discussion paper," Papers 2101.11948, arXiv.org, revised Nov 2021.
    24. Sander Cranenburgh & Marco Kouwenhoven, 2021. "An artificial neural network based method to uncover the value-of-travel-time distribution," Transportation, Springer, vol. 48(5), pages 2545-2583, October.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. John Buckell & Vrinda Vasavada & Sarah Wordsworth & Dean A. Regier & Matthew Quaife, 2022. "Utility maximization versus regret minimization in health choice behavior: Evidence from four datasets," Health Economics, John Wiley & Sons, Ltd., vol. 31(2), pages 363-381, February.
    2. van Cranenburgh, Sander & Collins, Andrew T., 2019. "New software tools for creating stated choice experimental designs efficient for regret minimisation and utility maximisation decision rules," Journal of choice modelling, Elsevier, vol. 31(C), pages 104-123.
    3. Boeri, Marco & Scarpa, Riccardo & Chorus, Caspar G., 2014. "Stated choices and benefit estimates in the context of traffic calming schemes: Utility maximization, regret minimization, or both?," Transportation Research Part A: Policy and Practice, Elsevier, vol. 61(C), pages 121-135.
    4. Bibhuti Sharma & Mark Hickman & Neema Nassir, 2019. "Park-and-ride lot choice model using random utility maximization and random regret minimization," Transportation, Springer, vol. 46(1), pages 217-232, February.
    5. Abdurrahman B. Aydemir & Erkan Duman, 2021. "Migrant Networks and Destination Choice: Evidence from Moves across Turkish Provinces," Koç University-TUSIAD Economic Research Forum Working Papers 2109, Koc University-TUSIAD Economic Research Forum.
    6. Michel Beine & Marco Delogu & Lionel Ragot, 2020. "The role of fees in foreign education: evidence from Italy [Determinants of international student migration]," Journal of Economic Geography, Oxford University Press, vol. 20(2), pages 571-600.
    7. Chorus, Caspar & van Cranenburgh, Sander & Daniel, Aemiro Melkamu & Sandorf, Erlend Dancke & Sobhani, Anae & Szép, Teodóra, 2021. "Obfuscation maximization-based decision-making: Theory, methodology and first empirical evidence," Mathematical Social Sciences, Elsevier, vol. 109(C), pages 28-44.
    8. Mittelhammer, Ron C. & Judge, George, 2011. "A family of empirical likelihood functions and estimators for the binary response model," Journal of Econometrics, Elsevier, vol. 164(2), pages 207-217, October.
    9. Kanchanaroek, Yingluk & Termansen, Mette & Quinn, Claire, 2013. "Property rights regimes in complex fishery management systems: A choice experiment application," Ecological Economics, Elsevier, vol. 93(C), pages 363-373.
    10. Lan Anh Nguyen & Manh-Hung Nguyen & Viet-Ngu Hoang & Arnaud Reynaud & Michel Simioni & Clevo Wilson, 2024. "Tourists’ preferences and willingness to pay for protecting a World Heritage site from coastal erosion in Vietnam," Environment, Development and Sustainability: A Multidisciplinary Approach to the Theory and Practice of Sustainable Development, Springer, vol. 26(11), pages 27607-27628, November.
    11. Ron Mittelhammer & George Judge, 2009. "A Minimum Power Divergence Class of CDFs and Estimators for the Binary Choice Model," International Econometric Review (IER), Econometric Research Association, vol. 1(1), pages 33-49, April.
    12. Villas-Boas, Sofia B & Taylor, Rebecca & Krovetz, Hannah, 2016. "Willingness to Pay for Low Water Footprint Food Choices During Drought," Department of Agricultural & Resource Economics, UC Berkeley, Working Paper Series qt9vh3x180, Department of Agricultural & Resource Economics, UC Berkeley.
    13. Kim, Junghun & Seung, Hyunchan & Lee, Jongsu & Ahn, Joongha, 2020. "Asymmetric preference and loss aversion for electric vehicles: The reference-dependent choice model capturing different preference directions," Energy Economics, Elsevier, vol. 86(C).
    14. Szabó, Andrea & Pham, Vinh, 2022. "Net neutrality and consumer demand in the video on-demand market," Information Economics and Policy, Elsevier, vol. 61(C).
    15. Michel Beine & Marco Delogu & Lionel Ragot, 2017. "The Role of Fees in Foreign Education: Evidence From Italy and the United Kingdom," Working Papers 2017-04, CEPII research center.
    16. Schaefer, Thilo & Peichl, Andreas, 2006. "Documentation FiFoSiM: integrated tax benefit microsimulation and CGE model," FiFo Discussion Papers - Finanzwissenschaftliche Diskussionsbeiträge 06-10, University of Cologne, FiFo Institute for Public Economics.
    17. Kuroda, Toshifumi & Ida, Takanori & Koguchi, Teppei, 2015. "The impact of asymmetric regulation on product bundling: The case of fixed broadband and mobile communications in Japan," 2015 Regional ITS Conference, Los Angeles 2015 146318, International Telecommunications Society (ITS).
    18. Rinaldo Brau, 2008. "Demand-Driven Sustainable Tourism? A Choice Modelling Analysis," Tourism Economics, , vol. 14(4), pages 691-708, December.
    19. Rolf Aaberge & Ugo Colombino, 2014. "Labour Supply Models," Contributions to Economic Analysis, in: Handbook of Microsimulation Modelling, volume 127, pages 167-221, Emerald Group Publishing Limited.
    20. Larranaga, Ana Margarita & Arellana, Julian & Senna, Luiz Afonso, 2017. "Encouraging intermodality: A stated preference analysis of freight mode choice in Rio Grande do Sul," Transportation Research Part A: Policy and Practice, Elsevier, vol. 102(C), pages 202-211.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:eejocm:v:28:y:2018:i:c:p:167-182. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.journals.elsevier.com/journal-of-choice-modelling .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.