DDPG

Deep Deterministic Policy Gradient (DDPG) is an algorithm which concurrently learns a Q-function and a policy. It uses off-policy data and the Bellman equation to learn the Q-function, and uses the Q-function to learn the policy.

In addition to PPO, we trained the quadrotor using DDPG. However, this yielded few results as the quadrotor did not learn to reach the goals. The model would likely perform better with further hyperparameter tuning.