Author
Listed:
- Minh Hieu Nguyen
(AME-DEST - Dynamiques Economiques et Sociales des Transports - Université Gustave Eiffel)
- Jimmy Armoogum
(AME-DEST - Dynamiques Economiques et Sociales des Transports - Université Gustave Eiffel)
- Cedric Garcia
(AME-DEST - Dynamiques Economiques et Sociales des Transports - Université Gustave Eiffel)
Abstract
Mode detection is the heart of researches based on GPS data collected in mobility surveys using wearable devices and recently smartphones. There is room in the literature of this field that is the great focus on developed countries like US, Sweden, Switzerland, Canada, Australia and so on, which has led list of modes to be around basic modes including walk, bike, bus/tram, car and train. Here, we presented an attempt to identify modes from data in a developing country where mobility heavily depends upon motorcycle. Data: Between mid-April and mid-May in 2019, the lab DEST under IFSTTAR (France) carried out a survey using the app TRavelVU developed by Trivector (Sweden) to collect both GPS data at high frequency ranging from 1 to 3 seconds and the corresponding ground truth of 63 participants in Hanoi, Vietnam. Among 2791 segments, 758 (27.2%), 104 (3.7%), 97 (3.5%), 1245 (44.6%) and 587 (21%) are walking, biking, bus, motor and car, respectively. Method: To distinguish five modes, deterministic and random forest methods were created and described in the following table. Method Description RULE-BASED 95th percentile speed Median speed Proximity to bus stops Mode Step 1 < 3.5 < 2.0 - Walk Step 2 < 6.0 < 4.0 - Bike Step 3 < 15.0 > 3.5 Yes Bus Step 4 > 12.0 > 6.0 - Car Step 5 The remainder of segments Motor - This is a hierarchical process where segments given labels in a previous step are not considered in the subsequent. - Proximity to bus stops refers to the distances from both origin and destination of a segment to the nearest stops within 75 m RANDOM FOREST Features: 95th percentile speed, median speed, proximity to bus stops (0 if no and 1 if yes), heading change rate, low speed rate, 95 percentile acceleration, average (absolute) acceleration. Splitting data: at the rate of 75% vs. 25% Results and discussions: The prediction results of two methods were compared with the ground truth and showed on the normalised confusion matrixes in the following figure. Random forest generated higher accuracy (79.08%) than Rule-based (61.73%) thanks to detecting significantly more correctly walk and motorcycle that make up the largest percentages in the mode share; however, it identified obviously worse bus, bike and car. The reason is that random forest over-fitted seriously motorcycle and walks. This problem came from the nature of unbalanced mode usage and limited sample size of secondary modes (i.e. bike and bus). As for the rule-based approach, compared with random forest, it showed a considerable higher recalls of bus and bike. Rules failed to address overlapping of speed between modes but it demonstrated the advantage of a hierarchical process over random forest where all modes and features were examined simultaneously. To illustrate, bus was detected far better (53% vs. 11%) if only considered proximity to bus stops and speed profiles than considered adjacent to bus stops with a series of other features such as heading change rate, acceleration characteristics, distance and so on. Among five modes, motorcycle was the major source of misclassification. It could show similar behaviours to car, bus and bike. Whereas, detecting bus by origin and destination of each segment seems to be insufficient. Conclusion: Inferring modes from GPS data in emerging countries is demanding due to the inclusion of motorcycle as the main means. A hierarchical process would be better choice in case of the limited sample size of some modes. Together with the first and the last point, the association between GPS points between them and GIS data should be examined to gain higher precision level for bus classification. This paper contributes to the geographical diversity of the mode detection field. Besides, it is one of the first studies covering motorcycle in the list of classification.
Suggested Citation
Minh Hieu Nguyen & Jimmy Armoogum & Cedric Garcia, 2022.
"Imputing transportation modes from GPS Data in a motorcycle dependent area,"
Post-Print
hal-03670773, HAL.
Handle:
RePEc:hal:journl:hal-03670773
Note: View the original document on HAL open archive server: https://hal.science/hal-03670773
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:hal:journl:hal-03670773. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: CCSD (email available below). General contact details of provider: https://hal.archives-ouvertes.fr/ .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.