IDEAS home Printed from https://ideas.repec.org/a/eee/eejocm/v50y2024ics1755534524000046.html
   My bibliography  Save this article

Revealing and reducing bias when modelling choice behaviour on imbalanced panel datasets

Author

Listed:
  • Łukawska, Mirosława
  • Cazor, Laurent
  • Paulsen, Mads
  • Rasmussen, Thomas Kjær
  • Nielsen, Otto Anker

Abstract

The emergence of modern tools and technologies gives a unique opportunity to collect large amounts of data for understanding behaviour. However, the generated datasets are often imbalanced, as individuals might contribute to the datasets at different frequencies and periods. Models based on these datasets are challenging to estimate, and the results are not straightforward to interpret without considering the sample structure. This study investigates the issue of handling imbalanced panel datasets for modelling individual behaviour. It first conducts a simulation experiment to study to which degree mixed logit models with and without panel reproduce the population preferences when using imbalanced data. It then investigates how the application of bias reduction strategies, such as subsampling and likelihood weighting, influences model results and finds that combining these techniques helps to find an optimal trade-off between bias and variance of the estimates. Considering the conclusions from the simulation study, a large-scale case study estimates bicycle route choice models with different correction strategies. These strategies are compared in terms of efficiency, weighted fit measures, and computational burden to provide recommendations that fit the modelling purpose. We find that the weighted panel mixed multinomial logit model, estimated on the entire dataset, performs best in terms of minimising the bias-efficiency trade-off in the estimates. Finally, we propose a strategy that ensures equal contribution of each individual to the estimation results, regardless of their representation in the sample, while reducing the computational burden related to estimating models on large datasets.

Suggested Citation

  • Łukawska, Mirosława & Cazor, Laurent & Paulsen, Mads & Rasmussen, Thomas Kjær & Nielsen, Otto Anker, 2024. "Revealing and reducing bias when modelling choice behaviour on imbalanced panel datasets," Journal of choice modelling, Elsevier, vol. 50(C).
  • Handle: RePEc:eee:eejocm:v:50:y:2024:i:c:s1755534524000046
    DOI: 10.1016/j.jocm.2024.100471
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S1755534524000046
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.jocm.2024.100471?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:eejocm:v:50:y:2024:i:c:s1755534524000046. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.journals.elsevier.com/journal-of-choice-modelling .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.