Autonomous navigation of stratospheric balloons using reinforcement learning

7 min read Original article ↗

References

  1. Lally, V. E. Superpressure Balloons for Horizontal Soundings of the Atmosphere Technical report (National Center for Atmospheric Research, 1967).

  2. Anderson, B. & Moore, B. J. Optimal Control: Linear Quadratic Methods (Prentice-Hall, 1989).

  3. Camacho, E. F. & Bordons, C. Model Predictive Control (Springer, 2007).

  4. Bellman, R. E. Dynamic Programming (Princeton Univ. Press, 1957).

  5. Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction 2nd edn (MIT Press, 2018).

  6. Jakobi, N., Husbands, P. & Harvey, I. Noise and the reality gap: the use of simulation in evolutionary robotics. In Proc. European Conf. Artificial Life (eds Moran, F. et al.) 704–720 (Springer, 1995).

  7. Tobin, J. et al. Domain randomization and generative models for robotic grasping. In Proc. Intl Conf. Intelligent Robots and Systems 3482–3489 (IEEE, 2018).

  8. Levine, S., Kumar, A., Tucker, G. & Fu, J. Offline reinforcement learning: tutorial, review, and perspectives on open problems. Preprint at https://arxiv.org/abs/2005.01643 (2020).

  9. Kaelbling, L. P., Littman, M. L. & Cassandra, A. R. Planning and acting in partially observable stochastic domains. Artif. Intell. 101, 99–134 (1998).

    Article  MathSciNet  Google Scholar 

  10. Tesauro, G. Temporal difference learning and TD-Gammon. Commun. ACM 38, 58–68 (1995).

    Article  Google Scholar 

  11. Mnih, V. et al. Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015).

    Article  ADS  CAS  Google Scholar 

  12. Silver, D. et al. Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2016).

    Article  ADS  CAS  Google Scholar 

  13. Lauer, C. J., Montgomery, C. A. & Dietterich, T. G. Managing fragmented fire-threatened landscapes with spatial externalities. For. Sci. 66, 443–456 (2020).

    Article  Google Scholar 

  14. Simão, H. P. et al. An approximate dynamic programming algorithm for large-scale fleet management: a case application. Transport. Sci. 43, 178–197 (2009).

    Article  Google Scholar 

  15. Mannion, P., Duggan, J. & Howley, E. An experimental review of reinforcement learning algorithms for adaptive traffic signal control. In Autonomic Road Transport Support Systems (eds McCluskey, T. L. et al.) 47–66 (Springer, 2016).

  16. Mirhoseini, A. et al. Chip placement with deep reinforcement learning. Preprint at https://arxiv.org/abs/2004.10746 (2020).

  17. Nevmyvaka, Y., Feng, Y. & Kearns, M. Reinforcement learning for optimized trade execution. In Proc. Intl Conf. Machine Learning (eds Cohen, W. W. & Moore, A.) 673–680 (ACM, 2006).

  18. Pineau, J., Bellemare, M. G., Rush, A. J., Ghizaru, A. & Murphy, S. A. Constructing evidence-based treatment strategies using methods from computer science. Drug Alcohol Depend. 88, S52–S60 (2007).

    Article  Google Scholar 

  19. Anderson, R. N., Boulanger, A., Powell, W. B. & Scott, W. Adaptive stochastic control for the smart grid. Proc. IEEE 99, 1098–1115 (2011).

    Article  Google Scholar 

  20. Glavic, M., Fonteneau, R. & Ernst, D. Reinforcement learning for electric power system decision and control: past considerations and perspectives. IFAC PapersOnLine 50, 6918–6927 (2017).

    Article  Google Scholar 

  21. Theocharous, G., Thomas, P. S. & Ghavazamdeh, M. Personalized ad recommendation systems for life-time value optimization with guarantees. In Proc. Intl Joint Conf. Artificial Intelligence (eds Yang, Q. & Wooldridge, M.) 1806–1812 (AAAI Press, IJCAI, 2015).

  22. Ie, E. et al. SlateQ: a tractable decomposition for reinforcement learning with recommendation sets. In Proc. Intl Joint Conf. Artificial Intelligence (ed. Kraus, S.) 2592–2599 (IJCAI, 2019).

  23. Ross, S., Gordon, G. & Bagnell, D. A reduction of imitation learning and structured prediction to no-regret online learning. In Proc. 14th Intl Conf. Artificial Intelligence and Statistics, (eds Gordon, G. et al.) 627–635 (PMLR, 2011).

  24. Tan, J. et al. Sim-to-real: learning agile locomotion for quadruped robots. In Proc. Robotics: Science and Systems XIV (eds Kress-Gazir, H. et al.) 10 (2018).

  25. Ng, A. Y., Kim, H. J., Jordan, M. I. & Sastry, S. Autonomous helicopter flight via reinforcement learning. In Advances in Neural Information Processing Systems 16 (NIPS 2003) (eds Saul, L. K. et al.) 799–806 (2004).

  26. Abbeel, P., Coates, A., Quigley, M. & Ng, A. Y. An application of reinforcement learning to aerobatic helicopter flight. In Advances in Neural Information Processing Systems 19 (NIPS 2006) (eds Schölkopf, B. et al.) 1–8 (MIT Press, 2007).

  27. Reddy, G., Wong-Ng, J., Celani, A., Sejnowski, T. J. & Vergassola, M. Glider soaring via reinforcement learning in the field. Nature 562, 236–239 (2018).

    Article  ADS  CAS  Google Scholar 

  28. Lange, S., Riedmiller, M. & Voigtländer, A. Autonomous reinforcement learning on raw visual input data in a real world application. In Proc. Intl Joint Conf. Neural Networks https://doi.org/10.1109/IJCNN.2012.6252823 (IEEE, 2012).

  29. Levine, S., Pastor, P., Krizhevsky, A., Ibarz, J. & Quillen, D. Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection. Int. J. Robot. Res. 37, 421–436 (2018).

    Article  Google Scholar 

  30. Kalashnikov, D. et al. Scalable deep reinforcement learning for vision-based robotic manipulation. In Proc. Conf. Robot Learning Vol. 87 (eds Billard, A. et al.) 651–673 (PMLR, 2018).

  31. Andrychowicz, O. M. et al. Learning dexterous in-hand manipulation. Int. J. Robot. Res. 39, 3–20 (2020).

    Article  Google Scholar 

  32. Zhang, C. Madden–Julian Oscillation. Rev. Geophys. 43, RG2003 (2005).

    ADS  Google Scholar 

  33. Domeisen, D. I., Garfinkel, C. I. & Butler, A. H. The teleconnection of El Niño Southern Oscillation to the stratosphere. Rev. Geophys. 57, 5–47 (2018).

    Article  ADS  Google Scholar 

  34. Baldwin, M. et al. The quasi-biennial oscillation. Rev. Geophys. 39, 179–229 (2001).

    Article  ADS  Google Scholar 

  35. Friedrich, L. S. et al. A comparison of Loon balloon observations and stratospheric reanalysis products. Atmos. Chem. Phys. 17, 855–866 (2017).

    Article  ADS  CAS  Google Scholar 

  36. Coy, L., Schoeberl, M. R., Pawson, S., Candido, S. & Carver, R. W. Global assimilation of Loon stratospheric balloon observations. J. Geophys. Res. D 124, 3005–3019 (2019).

    Article  ADS  Google Scholar 

  37. Rasmussen, C. E. & Williams, C. K. I. Gaussian Processes for Machine Learning (MIT Press, 2006).

  38. Sondik, E. The Optimal Control of Partially Observable Markov Processes. PhD thesis, Stanford Univ. (1971).

  39. Hersbach, H. et al. The ERA5 global reanalysis. Q. J. R. Meteorol. Soc. 146, 1999–2049 (2020).

    Article  ADS  Google Scholar 

  40. Perlin, K. An image synthesizer. Comput. Graph. 19, 287–296 (1985).

    Article  Google Scholar 

  41. Bellemare, M. G., Naddaf, Y., Veness, J. & Bowling, M. The Arcade Learning Environment: an evaluation platform for general agents. J. Artif. Intell. Res. 47, 253–279 (2013).

    Article  Google Scholar 

  42. Kolter, Z. J. & Ng, A. Y. Policy search via the signed derivative. In Proc. Robotics: Science and Systems V (eds Trinkle, J. et al.) 27 (MIT Press, 2009).

  43. Levine, S. & Koltun, V. Guided policy search. In Proc. Intl Conf. Machine Learning Vol. 28-3 (eds Dasgupta, S. & McAllester, D.) 1–9 (ICML, 2013).

  44. Lin, L. Self-improving reactive agents based on reinforcement learning, planning and teaching. Mach. Learn. 8, 293–321 (1992).

    Google Scholar 

  45. Nair, V. & Hinton, G. E. Rectified linear units improve restricted Boltzmann machines. In Proc. 27th Intl Conf. Machine Learning (ed. Fürnkranz, J.) 807–814 (ICML, 2010).

  46. Dabney, W., Rowland, M., Bellemare, M. G. & Munos, R. Distributional reinforcement learning with quantile regression. In Proc. AAAI Conf. Artificial Intelligence 2892–2901 (AAAI Press, 2018).

  47. Mnih, V. et al. Asynchronous methods for deep reinforcement learning. In Proc. Intl Conf. Machine Learning Vol. 48 (eds Balcan, M.-F. &Weinberger, K. Q.) 1928–1937 (ICML, 2016).

  48. Munos, R. From bandits to Monte-Carlo tree search: the optimistic principle applied to optimization and planning. Found. Trends Mach. Learn. 7, 1–129 (2014).

    Article  ADS  Google Scholar 

  49. Gibson, J. J. The Ecological Approach to Visual Perception (Taylor & Francis, 1979).

  50. Brooks, R. Elephants don’t play chess. Robot. Auton. Syst. 6, 3–15 (1990).

    Article  Google Scholar 

  51. Alexander, M., Grimsdell, A., Stephan, C. & Hoffmann, L. MJO-related intraseasonal variation in the stratosphere: gravity waves and zonal winds. J. Geophys. Res. D Atmospheres 123, 775–788 (2018).

    Article  ADS  Google Scholar 

  52. Watkins, C. J. C. H. Learning from Delayed Rewards. PhD thesis, Cambridge Univ. (1989).

  53. Castro, P. S., Moitra, S., Gelada, C., Kumar, S. & Bellemare, M. G. Dopamine: a research framework for deep reinforcement learning. Preprint at https://arxiv.org/abs/1812.06110 (2018).

  54. Bellemare, M. G., Dabney, W. & Munos, R. A distributional perspective on reinforcement learning. In Proc. Intl Conf. Machine Learning Vol. 70 (eds Precup, D. & Teh, Y. W.) 449–458 (PMLR, 2017).

  55. Kingma, D. & Ba, J. Adam: A method for stochastic optimization. In Proc. Intl Conf. Learning Representations (eds Benigo, Y. & LeCun, Y.) (2015).

  56. Golovin, D. et al. Google Vizier: a service for black-box optimization. In Proc. ACM SIGKDD Intl Conf. Knowledge Discovery and Data Mining (eds Matwin, S. et al.) 1487–1496 (ACM, 2017).

Download references