QUADROTOR PATH FOLLOWING IN SIMULATION

OBJECTIVES

Create a robust and generalized quadrotor control policy which will allow a simulated quadrotor to follow a trajectory in a near-optimal manner.

Utilize an OpenAI Gym environment as the simulation and train using Reinforcement Learning.

Explore state of the art training methodologies:

Proximal Policy Optimization

Curriculum learning

Supervised + Reinforcement Learning

THE PROBLEM AND SOME IDEAS

OVERVIEW

Simulation Environment

Our work extends research currently being done in Robotic Embedded Systems Laboratory (RESL) at the University of Southern California. We’re leveraging RESL’s quadrotor simulation environment, which is compatible with Open AI Gym. The environment simulates a quadrotor in an x configuration, as shown in Figure 1, and thoroughly models its dynamics.

Figure 1: X configuration of quadrotor.

Figure 2: Quadrotor visualization tool. The quadrotor appears in the bottom left along with 2 goal points.

Quadrotor Visualization

We have extended the visualization tool created by RESL that shows the quadrotor’s position in its environment and the goal points it is attempting to reach. Along with TensorBoard, the visualization software allows us to analyze and better understand the quadrotor’s performance. Figure 2 shows the visualization environment below.

Control Policy

Molchanov et al. from RESL learned a unified quadrotor control network to stabilize a quadrotor that hovers at a specified point. Our project extends this work by focusing on path following. Our work lets the quadrotor learn to navigate a given trajectory autonomously.

The final control policy output by our software will be a neural network which both stabilizes the quadrotor and guides it efficiently along specified trajectories. We have not yet attempted to run the control policy on a real quadrotor, but works well on simulation.

THE ENVIRONMENT

IMPROVED VISUALIZATION

METHODOLOGY

RESULTS

SQUARE TRAJECTORY

SQUARE DIAGONAL TRAJECTORY

UP-DOWN TRAJECTORY

TECHNICAL PAPER

PROJECT VIDEO

MEET THE TEAM

Amy Puente

amypuent@usc.edu

See profile

Mishari Aliesa

aliesa@usc.edu

See profile

Pushpreet Singh Hanspal

hanspal@usc.edu

See profile

Shilpa Thomas

shilpath@usc.edu

See profile

REFERENCES

[1] J. Hwangbo, I. Sa, R. Siegwart, and M. Hutter, “Control of a quadrotor with reinforcement learning,” IEEE Robotics and Automation Letters (RA-L), vol. 2, no. 4, pp. 2096–2103, 2017.

[2] F. Sadeghi and S. Levine, “CAD2RL: real single-image flight without a single real image,” Robotics: Science and Systems, 2017.

[3] K. Kang, S. Belkhale, G. Kahn, P. Abbeel, and S. Levine, “Generalization through simulation: Integrating simulated and real data into deep reinforcement learning for vision-based autonomous flight,” IEEE Intl Conf. on Robotics and Automation (ICRA), 2019.

[4] A. Y. Ng, D. Harada, and S. Russell, “Policy invariance under reward transformations: Theory and application to reward shaping,” International Conference in Machine Learning, volume 99, pages 278–287, 1999.

[5] C. Florensa, D. Held, M. Wulfmeier, and P. Abbeel, “Reverse curriculum generation for reinforcement learning,” in CoRL, 2017.

[6] Molchanov, A., Hausman, K., Birchfield, S., Sukhatme, G., “Region Growing Curriculum Generation for Reinforcement Learning,” ArXiv e-prints, 2018.

[7] Molchanov, A., Chen, T., Honig, W., Preiss, J. A., Ayanian, N., and Sukhatme, G. S., “Sim-to-(multi)-real: Transfer of low-level robust control policies to multiple quadrotors,” arXiv preprint arXiv:1903.04628, 2019.

QUADROTOR PATH FOLLOWING IN SIMULATION

OBJECTIVES

Create a robust and generalized quadrotor control policy which will allow a simulated quadrotor to follow a trajectory in a near-optimal manner.

Utilize an OpenAI Gym environment as the simulation and train using Reinforcement Learning.

Explore state of the art training methodologies:

Proximal Policy Optimization

Curriculum learning

Supervised + Reinforcement Learning

THE PROBLEM AND SOME IDEAS

OVERVIEW

Simulation Environment

Quadrotor Visualization

Control Policy

THE ENVIRONMENT

IMPROVED VISUALIZATION

METHODOLOGY

Click on each to learn more!

1. Explore PPO through REWARD SHAPING.

2. Try other RL algorithms like DDPG.

3. Explore state of the art training methodologies:

Curriculum learning

Supervised + Reinforcement Learning