Taking inspiration from Florensa et al. and Molanchov et al., we employ the assumption that the task of reaching goal states in smaller environments is simpler than in more spacious environments. We first train the quadrotor to reach multiple goal states in a reduced-volume environment. After the quadrotor has mastered the smaller task, we gradually grow the size of the environment, allowing the quadrotor to begin its trajectory from farther start states from the goal positions.
Though further exploration of this approach would likely yield positive results, it was not as successful as the reward function we developed.
In addition to standard reinforcement learning, we will explore the use of supervised learning to create a unified control policy. To do so, we’ll manually fly the quadrotor using a game controller. We’ll then train the quadrotor using these human demonstrated trajectories.
However, creating trajectories for training would have required a fine-tuned low-level controller, which we were unable to develop due to time constraints.