Author
Listed:
- Francisco Martinez-Gil
(Computational Multiscale Simulation Lab (CoMMLab), Escola Tècnica Superior d’Enginyeria (ETSE-UV), Universitat de València, 46010 València, Spain
Current address: Dto. Informática, ETSE-UV. Avda. de la Universidad s/n., C.P. 46100 Burjassot, Valencia, Spain.)
- Miguel Lozano
(Computational Multiscale Simulation Lab (CoMMLab), Escola Tècnica Superior d’Enginyeria (ETSE-UV), Universitat de València, 46010 València, Spain
Current address: Dto. Informática, ETSE-UV. Avda. de la Universidad s/n., C.P. 46100 Burjassot, Valencia, Spain.)
- Ignacio García-Fernández
(Computational Multiscale Simulation Lab (CoMMLab), Escola Tècnica Superior d’Enginyeria (ETSE-UV), Universitat de València, 46010 València, Spain
Current address: Dto. Informática, ETSE-UV. Avda. de la Universidad s/n., C.P. 46100 Burjassot, Valencia, Spain.)
- Pau Romero
(Computational Multiscale Simulation Lab (CoMMLab), Escola Tècnica Superior d’Enginyeria (ETSE-UV), Universitat de València, 46010 València, Spain
Current address: Dto. Informática, ETSE-UV. Avda. de la Universidad s/n., C.P. 46100 Burjassot, Valencia, Spain.)
- Dolors Serra
(Computational Multiscale Simulation Lab (CoMMLab), Escola Tècnica Superior d’Enginyeria (ETSE-UV), Universitat de València, 46010 València, Spain
Current address: Dto. Informática, ETSE-UV. Avda. de la Universidad s/n., C.P. 46100 Burjassot, Valencia, Spain.)
- Rafael Sebastián
(Computational Multiscale Simulation Lab (CoMMLab), Escola Tècnica Superior d’Enginyeria (ETSE-UV), Universitat de València, 46010 València, Spain
Current address: Dto. Informática, ETSE-UV. Avda. de la Universidad s/n., C.P. 46100 Burjassot, Valencia, Spain.)
Abstract
Reinforcement learning is one of the most promising machine learning techniques to get intelligent behaviors for embodied agents in simulations. The output of the classic Temporal Difference family of Reinforcement Learning algorithms adopts the form of a value function expressed as a numeric table or a function approximator. The learned behavior is then derived using a greedy policy with respect to this value function. Nevertheless, sometimes the learned policy does not meet expectations, and the task of authoring is difficult and unsafe because the modification of one value or parameter in the learned value function has unpredictable consequences in the space of the policies it represents. This invalidates direct manipulation of the learned value function as a method to modify the derived behaviors. In this paper, we propose the use of Inverse Reinforcement Learning to incorporate real behavior traces in the learning process to shape the learned behaviors, thus increasing their trustworthiness (in terms of conformance to reality). To do so, we adapt the Inverse Reinforcement Learning framework to the navigation problem domain. Specifically, we use Soft Q-learning, an algorithm based on the maximum causal entropy principle, with MARL-Ped (a Reinforcement Learning-based pedestrian simulator) to include information from trajectories of real pedestrians in the process of learning how to navigate inside a virtual 3D space that represents the real environment. A comparison with the behaviors learned using a Reinforcement Learning classic algorithm (Sarsa( λ )) shows that the Inverse Reinforcement Learning behaviors adjust significantly better to the real trajectories.
Suggested Citation
Francisco Martinez-Gil & Miguel Lozano & Ignacio García-Fernández & Pau Romero & Dolors Serra & Rafael Sebastián, 2020.
"Using Inverse Reinforcement Learning with Real Trajectories to Get More Trustworthy Pedestrian Simulations,"
Mathematics, MDPI, vol. 8(9), pages 1-15, September.
Handle:
RePEc:gam:jmathe:v:8:y:2020:i:9:p:1479-:d:407733
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:8:y:2020:i:9:p:1479-:d:407733. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.