基于深度强化学习的多无人车系统编队控制
FORMATION CONTROL FOR MULTI-UNMANNED VEHICLES VIA DEEP REINFORCEMENT LEARNING
-
摘要: 针对多智能体编队控制问题, 研究了基于DDQN深度强化学习算法的多无人车系统的编队控制器, 采用一致性控制与伴随位形相结合的方法对编队控制问题进行建模和简化. 建立了基于相对距离和速度的状态空间, 使得控制输入不依赖于全局信息, 然后设计了基于九大典型运动方向的动作空间, 并设计了基于相对距离和相对速度的奖励函数, 基于以上参数进行了神经网络架构的设计和网络训练与运动仿真环境的搭建, 并成功训练出有效的控制器. 该控制器可以直接应用于带有非完整约束的欠驱动无人车的编队任务, 且控制器的网络训练只需要运动数据而不需要精确模型, 是一种无模型控制方法. 最后, 通过大量不同场景下的运动仿真验证了控制器的有效性, 包括多队形、多位置、多轨迹仿真以及时变队形、时变通讯和通讯故障等特殊情况的检验, 该控制器在所有场景中均能有效完成控制任务. 最后优化了编队起始阶段的策略, 定义了等候条件与启动条件, 有效节约了控制的能耗, 利用运动仿真和对比分析验证了优化作用.Abstract: Targeting the problem of multi-agent formation control, this work investigates the formation control of a multi-unmanned vehicle system using the DDQN deep reinforcement learning algorithm. The approach combines consensus control with accompanying configuration to model and simplify the formation control problem. This work establishes a state space based on relative distance and velocity, making control inputs independent of global information, and then designs an action space based on nine major motion directions and formulates reward functions based on relative distance and relative velocity. The work involves the design of neural network architecture, network training, and developing a motion simulation environment. The controller is successfully trained and can be directly applied to the formation task of underactuated unmanned vehicles with nonholonomic constraints, representing a model-free control approach that only requires motion data rather than precise models. Finally, the effectiveness of the controller is verified through extensive motion simulations in various scenarios, including multiple formations, positions, trajectories, as well as examinations of formation transformation, switching communication, and communication failures. The controller performs effectively in all scenarios. The paper concludes by optimizing the strategies in the initial stages of formation, defining waiting and starting conditions, which effectively reduces control energy consumption. The optimization is validated through motion simulations and comparison.