Author
Listed:
- Dingding Qi
(Air Defense and AntiMissile School, Air Force Engineering University, Xi’an 710043, China)
- Yingjun Zhao
(Air Defense and AntiMissile School, Air Force Engineering University, Xi’an 710043, China)
- Longyue Li
(Air Defense and AntiMissile School, Air Force Engineering University, Xi’an 710043, China)
- Zhanxiao Jia
(Unmanned System Research Institute, Northwestern Polytechnical University, Xi’an 710072, China)
Abstract
In this paper, we introduce an agent rescue scheduling approach grounded in proximal policy optimization, coupled with a singularity-free predefined-time control strategy. The primary objective of this methodology is to bolster the efficiency and precision of rescue missions. Firstly, we have designed an evaluation function closely related to the average flying distance of agents, which provides a quantitative benchmark for assessing different scheduling schemes and assists in optimizing the allocation of rescue resources. Secondly, we have developed a scheduling strategy optimization method using the Proximal Policy Optimization (PPO) algorithm. This method can automatically learn and adjust scheduling strategies to adapt to complex rescue environments and varying task demands. The evaluation function provides crucial feedback signals for the PPO algorithm, ensuring that the algorithm can precisely adjust the scheduling strategies to achieve optimal results. Thirdly, aiming to attain stability and precision in agent navigation to designated positions, we formulate a singularity-free predefined-time fuzzy adaptive tracking control strategy. This approach dynamically modulates control parameters in reaction to external disturbances and uncertainties, thus ensuring the precise arrival of agents at their destinations within the predefined time. Finally, to substantiate the validity of our proposed approach, we crafted a simulation environment in Python 3.7, engaging in a comparative analysis between the PPO and the other optimization method, Deep Q-network (DQN), utilizing the variation in reward values as the benchmark for evaluation.
Suggested Citation
Dingding Qi & Yingjun Zhao & Longyue Li & Zhanxiao Jia, 2024.
"Optimization of Predefined-Time Agent-Scheduling Strategy Based on PPO,"
Mathematics, MDPI, vol. 12(15), pages 1-17, July.
Handle:
RePEc:gam:jmathe:v:12:y:2024:i:15:p:2387-:d:1447030
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:12:y:2024:i:15:p:2387-:d:1447030. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.