Solving the Rubik’s cube with deep reinforcement learning and search

References

Lichodzijewski, P. & Heywood, M. in Genetic Programming Theory and Practice VIII (eds Riolo, R., McConaghy, T. & Vladislavleva, E.) 35–54 (Springer, 2011).
Smith, R. J., Kelly, S. & Heywood, M. I. Discovering Rubik’s cube subgroups using coevolutionary GP: a five twist experiment. In Proceedings of the Genetic and Evolutionary Computation Conference 2016 789–796 (ACM, 2016).
Brunetto, R. & Trunda, O. Deep heuristic-learning in the Rubik’s cube domain: an experimental evaluation. Proc. ITAT 1885, 57–64 (2017).
Google Scholar
Johnson, C. G. Solving the Rubik’s cube with learned guidance functions. In Proceedings of 2018 IEEE Symposium Series on Computational Intelligence (SSCI) 2082–2089 (IEEE, 2018).
Korf, R. E. Macro-operators: a weak method for learning. Artif. Intell. 26, 35–77 (1985).
Article MathSciNet Google Scholar
Arfaee, S. J., Zilles, S. & Holte, R. C. Learning heuristic functions for large state spaces. Artif. Intell. 175, 2075–2098 (2011).
Article MathSciNet Google Scholar
Korf, R. E. Finding optimal solutions to Rubik’s cube using pattern databases. In Proceedings of the Fourteenth National Conference on Artificial Intelligence and Ninth Conference on Innovative Applications of Artificial Intelligence 700–705 (AAAI Press, 1997); http://dl.acm.org/citation.cfm?id=1867406.1867515
Korf, R. E. & Felner, A. Disjoint pattern database heuristics. Artif. Intell. 134, 9–22 (2002).
Article Google Scholar
Felner, A., Korf, R. E. & Hanan, S. Additive pattern database heuristics. J. Artif. Intell. Res. 22, 279–318 (2004).
Article MathSciNet Google Scholar
Bonet, B. & Geffner, H. Planning as heuristic search. Artif. Intell. 129, 5–33 (2001).
Article MathSciNet Google Scholar
Schmidhuber, J. Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015).
Article Google Scholar
Goodfellow, I., Bengio, Y., Courville, A. & Bengio, Y. Deep Learning Vol. 1 (MIT Press, 2016).
Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction Vol. 1 (MIT Press, 1998).
Bellman, R. Dynamic Programming (Princeton Univ. Press, 1957).
Puterman, M. L. & Shin, M. C. Modified policy iteration algorithms for discounted Markov decision problems. Manage. Sci. 24, 1127–1137 (1978).
Article MathSciNet Google Scholar
Bertsekas, D. P. & Tsitsiklis, J. N. Neuro-dynamic Programming (Athena Scientific, 1996).
Hart, P. E., Nilsson, N. J. & Raphael, B. A formal basis for the heuristic determination of minimum cost paths. IEEE Trans. Syst. Sci. Cybern. 4, 100–107 (1968).
Article Google Scholar
Pohl, I. Heuristic search viewed as path finding in a graph. Artif. Intell. 1, 193–204 (1970).
Article MathSciNet Google Scholar
Ebendt, R. & Drechsler, R. Weighted A* search—unifying view and application. Artif. Intell. 173, 1310–1342 (2009).
Article MathSciNet Google Scholar
McAleer, S., Agostinelli, F., Shmakov, A. & Baldi, P. Solving the Rubik’s cube with approximate policy iteration. Proceedings of International Conference on Learning Representations (ICLR) (PMLR, 2019).
Silver, D. et al. A general reinforcement learning algorithm that masters chess, shogi and Go through self-play. Science 362, 1140–1144 (2018).
Article MathSciNet Google Scholar
Rokicki, T. God’s Number is 26 in the Quarter-turn Metric http://www.cube20.org/qtm/ (2014).
Korf, R. E. Depth-first iterative-deepening: an optimal admissible tree search. Artif. Intell. 27, 97–109 (1985).
Article MathSciNet Google Scholar
Rokicki, T. cube20 https://github.com/rokicki/cube20src (2016).
Rokicki, T., Kociemba, H., Davidson, M. & Dethridge, J. The diameter of the Rubik’s cube group is twenty. SIAM Rev. 56, 645–670 (2014).
Article MathSciNet Google Scholar
Culberson, J. C. & Schaeffer, J. Pattern databases. Comput. Intell. 14, 318–334 (1998).
Article MathSciNet Google Scholar
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 770–778 (IEEE, 2016).
Kociemba, H. 15-Puzzle Optimal Solver http://kociemba.org/themen/fifteen/fifteensolver.html (2018).
Scherphuis, J. The Mathematics of Lights Out https://www.jaapsch.net/puzzles/lomath.htm (2015).
Dor, D. & Zwick, U. Sokoban and other motion planning problems. Comput. Geom. 13, 215–228 (1999).
Article MathSciNet Google Scholar
Guez, A. et al. An Investigation of Model-free Planning: Boxoban Levels https://github.com/deepmind/boxoban-levels/ (2018).
Orseau, L., Lelis, L., Lattimore, T. & Weber, T. Single-agent policy tree search with guarantees. In Advances in Neural Information Processing Systems (eds Bengio, S. et al.) 3201–3211 (Curran Associates, 2018).
Brüngger, A., Marzetta, A., Fukuda, K. & Nievergelt, J. The parallel search bench ZRAM and its applications. Ann. Oper. Res. 90, 45–63 (1999).
Article MathSciNet Google Scholar
Korf, R. E. Linear-time disk-based implicit graph search. JACM 55, 26 (2008).
Article MathSciNet Google Scholar
Moore, A. W. & Atkeson, C. G. Prioritized sweeping: reinforcement learning with less data and less time. Mach. Learn. 13, 103–130 (1993).
Google Scholar
Newell, A. & Simon, H. A. GPS, a Program that Simulates Human Thought Technical Report (Rand Corporation, 1961).
Fikes, R. E. & Nilsson, N. J. STRIPS: a new approach to the application of theorem proving to problem solving. Artif. Intell. 2, 189–208 (1971).
Article Google Scholar
Anthony, T., Tian, Z. & Barber, D. Thinking fast and slow with deep learning and tree search. In Advances in Neural Information Processing Systems (eds Guyon, I. et al.) 5360–5370 (Curran Associates, 2017).
Wilt, C. M. & Ruml, W. When does weighted A* fail? In Proc. SOCS (eds Borrajo, D. et al.) 137–144 (AAAI Press, 2012).
Ioffe, S. & Szegedy, C. Batch normalization: accelerating deep network training by reducing internal covariate shift. In Proceedings of International Conference on Machine Learning (eds Bach, F. & Blei, D.) 448–456 (PMLR, 2015).
Glorot, X., Bordes, A. & Bengio, Y. Deep sparse rectifier neural networks. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics (eds Gordon, G., Dunson, D. & Dudík, M.) 315–323 (PMLR, 2011).
Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. In Proceedings of International Conference on Learning Representations (ICLR) (eds Bach, F. & Blei, D.) (PMLR, 2015).
Samadi, M., Felner, A. & Schaeffer, J. Learning from multiple heuristics. In Proceedings of the 23rd National Conference on Artificial Intelligence (ed. Cohn, A.) (AAAI Press, 2008).
Agostinelli, F., McAleer, S., Shmakov, A. & Baldi, P. Learning to Solve the Rubiks Cube (Code Ocean, 2019); https://doi.org/10.24433/CO.4958495.v1

Download references