A Deep Deterministic Policy Gradient Learning Approach to Missile Autopilot Design

IRIS

In this paper, a Deep Reinforcement Learning algorithm, known as Deep Deterministic Policy Gradient (DDPG), is applied to the problem of designing a missile lateral acceleration control system. To this aim, the autopilot control problem is recast in the Reinforcement Learning framework, where the environment consists of a 2-Degrees-of-Freedom nonlinear model of the missile’s longitudinal dynamics, while the agent training procedure is carried out on a linearized version of the model. In particular, we show how to account not only for the stabilization of the longitudinal dynamic, but also for the main performances indexes (settling-time, undershoot, steady-state error, etc.) in the DDPG reward function. The effectiveness of the proposed DDPG-based missile autopilot is assessed through extensive numerical simulations on both the linearized and the fully nonlinear dynamics by considering different flight conditions and uncertainty in the aerodynamic coefficients, and its performance is compared against two model-based control strategies in order to check the capability of the proposed data-driven approach to achieve prescribed closed-loop response in a completely model-free fashion.

A Deep Deterministic Policy Gradient Learning Approach to Missile Autopilot Design

Candeli A.;De Tommasi G.;Lui D. G.;Mele A.;Santini S.;Tartaglione G.

2022-01-01

Abstract

In this paper, a Deep Reinforcement Learning algorithm, known as Deep Deterministic Policy Gradient (DDPG), is applied to the problem of designing a missile lateral acceleration control system. To this aim, the autopilot control problem is recast in the Reinforcement Learning framework, where the environment consists of a 2-Degrees-of-Freedom nonlinear model of the missile’s longitudinal dynamics, while the agent training procedure is carried out on a linearized version of the model. In particular, we show how to account not only for the stabilization of the longitudinal dynamic, but also for the main performances indexes (settling-time, undershoot, steady-state error, etc.) in the DDPG reward function. The effectiveness of the proposed DDPG-based missile autopilot is assessed through extensive numerical simulations on both the linearized and the fully nonlinear dynamics by considering different flight conditions and uncertainty in the aerodynamic coefficients, and its performance is compared against two model-based control strategies in order to check the capability of the proposed data-driven approach to achieve prescribed closed-loop response in a completely model-free fashion.

Scheda breve

Scheda completa

Scheda completa (DC)

Anno

2022

Appare nelle tipologie:

1.1 Articolo in rivista

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11367/102073

Citazioni

ND

18

14

social impact