Author
Abstract
Accurate travel time estimation is paramount for providing transit users with reliable schedules and dependable real-time information. This work is the first to utilize roadside urban imagery to aid transit agencies and practitioners in improving travel time prediction. We propose and evaluate an end-to-end framework integrating traditional transit data sources with a roadside camera for automated image data acquisition, labeling, and model training to predict transit travel times across a segment of interest. First, we show how the General Transit Feed Specification real-time data can be utilized as an efficient activation mechanism for a roadside camera unit monitoring a segment of interest. Second, automated vehicle location data is utilized to generate ground truth labels for the acquired images based on the observed transit travel time percentiles across the camera-monitored segment during the time of image acquisition. Finally, the generated labeled image dataset is used to train and thoroughly evaluate a Vision Transformer (ViT) model to predict a discrete transit travel time range (band). The results of this exploratory study illustrate that the ViT model is able to learn image features and contents that best help it deduce the expected travel time range with an average validation accuracy ranging between 80 and 85%. We assess the interpretability of the ViT model’s predictions and showcase how this discrete travel time band prediction can subsequently improve continuous transit travel time estimation. The workflow and results presented in this study provide an end-to-end, scalable, automated, and highly efficient approach for integrating traditional transit data sources and roadside imagery to improve the estimation of transit travel duration. This work also demonstrates the added value of incorporating real-time information from computer-vision sources, which are becoming increasingly accessible and can have major implications for improving transit operations and passenger real-time information.
Suggested Citation
Awad Abdelhalim & Jinhua Zhao, 2025.
"Computer vision for transit travel time prediction: an end-to-end framework using roadside urban imagery,"
Public Transport, Springer, vol. 17(1), pages 221-246, March.
Handle:
RePEc:spr:pubtra:v:17:y:2025:i:1:d:10.1007_s12469-023-00346-3
DOI: 10.1007/s12469-023-00346-3
Download full text from publisher
As the access to this document is restricted, you may want to search for a different version of it.
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:pubtra:v:17:y:2025:i:1:d:10.1007_s12469-023-00346-3. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.