Autonomous navigation of stratospheric balloons using reinforcement learning

References

Lally, V. E. Superpressure Balloons for Horizontal Soundings of the Atmosphere Technical report (National Center for Atmospheric Research, 1967).
Anderson, B. & Moore, B. J. Optimal Control: Linear Quadratic Methods (Prentice-Hall, 1989).
Camacho, E. F. & Bordons, C. Model Predictive Control (Springer, 2007).
Bellman, R. E. Dynamic Programming (Princeton Univ. Press, 1957).
Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction 2nd edn (MIT Press, 2018).
Jakobi, N., Husbands, P. & Harvey, I. Noise and the reality gap: the use of simulation in evolutionary robotics. In Proc. European Conf. Artificial Life (eds Moran, F. et al.) 704–720 (Springer, 1995).
Tobin, J. et al. Domain randomization and generative models for robotic grasping. In Proc. Intl Conf. Intelligent Robots and Systems 3482–3489 (IEEE, 2018).
Levine, S., Kumar, A., Tucker, G. & Fu, J. Offline reinforcement learning: tutorial, review, and perspectives on open problems. Preprint at https://arxiv.org/abs/2005.01643 (2020).
Kaelbling, L. P., Littman, M. L. & Cassandra, A. R. Planning and acting in partially observable stochastic domains. Artif. Intell. 101, 99–134 (1998).
Article MathSciNet Google Scholar
Tesauro, G. Temporal difference learning and TD-Gammon. Commun. ACM 38, 58–68 (1995).
Article Google Scholar
Mnih, V. et al. Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015).
Article ADS CAS Google Scholar
Silver, D. et al. Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2016).
Article ADS CAS Google Scholar
Lauer, C. J., Montgomery, C. A. & Dietterich, T. G. Managing fragmented fire-threatened landscapes with spatial externalities. For. Sci. 66, 443–456 (2020).
Article Google Scholar
Simão, H. P. et al. An approximate dynamic programming algorithm for large-scale fleet management: a case application. Transport. Sci. 43, 178–197 (2009).
Article Google Scholar
Mannion, P., Duggan, J. & Howley, E. An experimental review of reinforcement learning algorithms for adaptive traffic signal control. In Autonomic Road Transport Support Systems (eds McCluskey, T. L. et al.) 47–66 (Springer, 2016).
Mirhoseini, A. et al. Chip placement with deep reinforcement learning. Preprint at https://arxiv.org/abs/2004.10746 (2020).
Nevmyvaka, Y., Feng, Y. & Kearns, M. Reinforcement learning for optimized trade execution. In Proc. Intl Conf. Machine Learning (eds Cohen, W. W. & Moore, A.) 673–680 (ACM, 2006).
Pineau, J., Bellemare, M. G., Rush, A. J., Ghizaru, A. & Murphy, S. A. Constructing evidence-based treatment strategies using methods from computer science. Drug Alcohol Depend. 88, S52–S60 (2007).
Article Google Scholar
Anderson, R. N., Boulanger, A., Powell, W. B. & Scott, W. Adaptive stochastic control for the smart grid. Proc. IEEE 99, 1098–1115 (2011).
Article Google Scholar
Glavic, M., Fonteneau, R. & Ernst, D. Reinforcement learning for electric power system decision and control: past considerations and perspectives. IFAC PapersOnLine 50, 6918–6927 (2017).
Article Google Scholar
Theocharous, G., Thomas, P. S. & Ghavazamdeh, M. Personalized ad recommendation systems for life-time value optimization with guarantees. In Proc. Intl Joint Conf. Artificial Intelligence (eds Yang, Q. & Wooldridge, M.) 1806–1812 (AAAI Press, IJCAI, 2015).
Ie, E. et al. SlateQ: a tractable decomposition for reinforcement learning with recommendation sets. In Proc. Intl Joint Conf. Artificial Intelligence (ed. Kraus, S.) 2592–2599 (IJCAI, 2019).
Ross, S., Gordon, G. & Bagnell, D. A reduction of imitation learning and structured prediction to no-regret online learning. In Proc. 14th Intl Conf. Artificial Intelligence and Statistics, (eds Gordon, G. et al.) 627–635 (PMLR, 2011).
Tan, J. et al. Sim-to-real: learning agile locomotion for quadruped robots. In Proc. Robotics: Science and Systems XIV (eds Kress-Gazir, H. et al.) 10 (2018).
Ng, A. Y., Kim, H. J., Jordan, M. I. & Sastry, S. Autonomous helicopter flight via reinforcement learning. In Advances in Neural Information Processing Systems 16 (NIPS 2003) (eds Saul, L. K. et al.) 799–806 (2004).
Abbeel, P., Coates, A., Quigley, M. & Ng, A. Y. An application of reinforcement learning to aerobatic helicopter flight. In Advances in Neural Information Processing Systems 19 (NIPS 2006) (eds Schölkopf, B. et al.) 1–8 (MIT Press, 2007).
Reddy, G., Wong-Ng, J., Celani, A., Sejnowski, T. J. & Vergassola, M. Glider soaring via reinforcement learning in the field. Nature 562, 236–239 (2018).
Article ADS CAS Google Scholar
Lange, S., Riedmiller, M. & Voigtländer, A. Autonomous reinforcement learning on raw visual input data in a real world application. In Proc. Intl Joint Conf. Neural Networks https://doi.org/10.1109/IJCNN.2012.6252823 (IEEE, 2012).
Levine, S., Pastor, P., Krizhevsky, A., Ibarz, J. & Quillen, D. Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection. Int. J. Robot. Res. 37, 421–436 (2018).
Article Google Scholar
Kalashnikov, D. et al. Scalable deep reinforcement learning for vision-based robotic manipulation. In Proc. Conf. Robot Learning Vol. 87 (eds Billard, A. et al.) 651–673 (PMLR, 2018).
Andrychowicz, O. M. et al. Learning dexterous in-hand manipulation. Int. J. Robot. Res. 39, 3–20 (2020).
Article Google Scholar
Zhang, C. Madden–Julian Oscillation. Rev. Geophys. 43, RG2003 (2005).
ADS Google Scholar
Domeisen, D. I., Garfinkel, C. I. & Butler, A. H. The teleconnection of El Niño Southern Oscillation to the stratosphere. Rev. Geophys. 57, 5–47 (2018).
Article ADS Google Scholar
Baldwin, M. et al. The quasi-biennial oscillation. Rev. Geophys. 39, 179–229 (2001).
Article ADS Google Scholar
Friedrich, L. S. et al. A comparison of Loon balloon observations and stratospheric reanalysis products. Atmos. Chem. Phys. 17, 855–866 (2017).
Article ADS CAS Google Scholar
Coy, L., Schoeberl, M. R., Pawson, S., Candido, S. & Carver, R. W. Global assimilation of Loon stratospheric balloon observations. J. Geophys. Res. D 124, 3005–3019 (2019).
Article ADS Google Scholar
Rasmussen, C. E. & Williams, C. K. I. Gaussian Processes for Machine Learning (MIT Press, 2006).
Sondik, E. The Optimal Control of Partially Observable Markov Processes. PhD thesis, Stanford Univ. (1971).
Hersbach, H. et al. The ERA5 global reanalysis. Q. J. R. Meteorol. Soc. 146, 1999–2049 (2020).
Article ADS Google Scholar
Perlin, K. An image synthesizer. Comput. Graph. 19, 287–296 (1985).
Article Google Scholar
Bellemare, M. G., Naddaf, Y., Veness, J. & Bowling, M. The Arcade Learning Environment: an evaluation platform for general agents. J. Artif. Intell. Res. 47, 253–279 (2013).
Article Google Scholar
Kolter, Z. J. & Ng, A. Y. Policy search via the signed derivative. In Proc. Robotics: Science and Systems V (eds Trinkle, J. et al.) 27 (MIT Press, 2009).
Levine, S. & Koltun, V. Guided policy search. In Proc. Intl Conf. Machine Learning Vol. 28-3 (eds Dasgupta, S. & McAllester, D.) 1–9 (ICML, 2013).
Lin, L. Self-improving reactive agents based on reinforcement learning, planning and teaching. Mach. Learn. 8, 293–321 (1992).
Google Scholar
Nair, V. & Hinton, G. E. Rectified linear units improve restricted Boltzmann machines. In Proc. 27th Intl Conf. Machine Learning (ed. Fürnkranz, J.) 807–814 (ICML, 2010).
Dabney, W., Rowland, M., Bellemare, M. G. & Munos, R. Distributional reinforcement learning with quantile regression. In Proc. AAAI Conf. Artificial Intelligence 2892–2901 (AAAI Press, 2018).
Mnih, V. et al. Asynchronous methods for deep reinforcement learning. In Proc. Intl Conf. Machine Learning Vol. 48 (eds Balcan, M.-F. &Weinberger, K. Q.) 1928–1937 (ICML, 2016).
Munos, R. From bandits to Monte-Carlo tree search: the optimistic principle applied to optimization and planning. Found. Trends Mach. Learn. 7, 1–129 (2014).
Article ADS Google Scholar
Gibson, J. J. The Ecological Approach to Visual Perception (Taylor & Francis, 1979).
Brooks, R. Elephants don’t play chess. Robot. Auton. Syst. 6, 3–15 (1990).
Article Google Scholar
Alexander, M., Grimsdell, A., Stephan, C. & Hoffmann, L. MJO-related intraseasonal variation in the stratosphere: gravity waves and zonal winds. J. Geophys. Res. D Atmospheres 123, 775–788 (2018).
Article ADS Google Scholar
Watkins, C. J. C. H. Learning from Delayed Rewards. PhD thesis, Cambridge Univ. (1989).
Castro, P. S., Moitra, S., Gelada, C., Kumar, S. & Bellemare, M. G. Dopamine: a research framework for deep reinforcement learning. Preprint at https://arxiv.org/abs/1812.06110 (2018).
Bellemare, M. G., Dabney, W. & Munos, R. A distributional perspective on reinforcement learning. In Proc. Intl Conf. Machine Learning Vol. 70 (eds Precup, D. & Teh, Y. W.) 449–458 (PMLR, 2017).
Kingma, D. & Ba, J. Adam: A method for stochastic optimization. In Proc. Intl Conf. Learning Representations (eds Benigo, Y. & LeCun, Y.) (2015).
Golovin, D. et al. Google Vizier: a service for black-box optimization. In Proc. ACM SIGKDD Intl Conf. Knowledge Discovery and Data Mining (eds Matwin, S. et al.) 1487–1496 (ACM, 2017).

Download references