Competition-level code generation with AlphaCode

Machine learning systems can program too

Computer programming competitions are popular tests among programmers that require critical thinking informed by experience and creating solutions to unforeseen problems, both of which are key aspects of human intelligence but challenging to mimic by machine learning models. Using self-supervised learning and an encoder-decoder transformer architecture, Li et al. developed AlphaCode, a deep-learning model that can achieve approximately human-level performance on the Codeforces platform, which regularly hosts these competitions and attracts numerous participants worldwide (see the Perspective by Kolter). The development of such coding platforms could have a huge impact on programmers’ productivity. It may even change the culture of programming by shifting human work to formulating problems, with machine learning being the main one responsible for generating and executing codes. —YS

Abstract

Programming is a powerful and ubiquitous problem-solving tool. Systems that can assist programmers or even generate programs themselves could make programming more productive and accessible. Recent transformer-based neural network models show impressive code generation abilities yet still perform poorly on more complex tasks requiring problem-solving skills, such as competitive programming problems. Here, we introduce AlphaCode, a system for code generation that achieved an average ranking in the top 54.3% in simulated evaluations on recent programming competitions on the Codeforces platform. AlphaCode solves problems by generating millions of diverse programs using specially trained transformer-based networks and then filtering and clustering those programs to a maximum of just 10 submissions. This result marks the first time an artificial intelligence system has performed competitively in programming competitions.

Register and access this article for free

As a service to the community, this article is available for free.

Access the full article

View all access options to continue reading this article.

Supplementary Materials

This PDF file includes:

Supplementary Text

Figs. S1 to S29

Tables S1 to S23

References (34–88)

Download
1.99 MB

References and Notes

Z. Manna, R. J. Waldinger, Toward automatic program synthesis. Commun. ACM 14, 151–165 (1971).

A. Church, Application of recursive arithmetic to the problem of circuit synthesis. J. Symb. Log. 28, 289–290 (1963).

C. C. Green, “Application of theorem proving to problem solving” in Readings in Artificial Intelligence, B. L. Webber, N. J. Nilsson, Eds. (Elsevier, 1981), pp. 202–222.

N. D. Matsakis, F. S. Klock II, The rust language. ACM SIGAda Ada Lett. 34, 103–104 (2014).

M. Resnick, J. Maloney, A. Monroy-HernAndez, N. Rusk, E. Eastmond, K. Brennan, A. Millner, E. Rosenbaum, J. Silver, B. Silverman, Y. Kafai, Scratch: programming for all. Commun. ACM 52, 60–67 (2009).

S. Gulwani, Automating string processing in spreadsheets using input-output examples. SIGPLAN Not. 46, 317–330 (2011).

V. Raychev, M. Vechev, E. Yahav, Code completion with statistical language models. SIGPLAN Not. 49, 419–428 (2014).

M. Bruch, M. Monperrus, M. Mezini, “Learning from examples to improve code completion systems,” Proceedings of the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE ’09), Amsterdam, Netherlands, 24 to 28 August 2009 (Association for Computing Machinery, 2009), pp. 213–222.

A. Hindle, E. T. Barr, Z. Su, M. Gabel, P. Devanbu, “On the naturalness of software,” Proceedings of the 34th International Conference on Software Engineering (ICSE ’12), Zurich, Switzerland, 2 to 9 June 2012 (IEEE Press, 2012), pp. 837–847.

A. Svyatkovskiy, S. K. Deng, S. Fu, N. Sundaresan, “IntelliCode compose: code generation using transformer,” Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering(ESEC/FSE ’20), 8 to 13 November 2020 (Association for Computing Machinery, 2020), pp. 1433–1443.

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin, Attention is all you need. Adv. Neural Inf. Process. Syst. 30, 5998–6008 (2017).

T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, D. Amodei, Language models are few-shot learners. arXiv:2005.14165 [cs.CL] (2020).

M. Chen, J. Tworek, H. Jun, Q. Yuan, H. Ponde de Oliveira Pinto, J. Kaplan, H. Edwards, Y. Burda, N. Joseph, G. Brockman, A. Ray, R. Puri, G. Krueger, M. Petrov, H. Khlaaf, G. Sastry, P. Mishkin, B. Chan, S. Gray, N. Ryder, M. Pavlov, A. Power, L. Kaiser, M. Bavarian, C. Winter, P. Tillet, F. Petroski Such, D. Cummings, M. Plappert, F. Chantzis, E. Barnes, A. Herbert-Voss, W. Hebgen Guss, A. Nichol, A. Paino, N. Tezak, J. Tang, I. Babuschkin, S. Balaji, S. Jain, W. Saunders, C. Hesse, A. N. Carr, J. Leike, J. Achiam, V.Misra, E.Morikawa, A.Radford, M.Knight, M.Brundage, M.Murati, K. Mayer, P.Welinder, B.McGrew, D.Amodei, S.McCandlish, I.Sutskever, W.Zaremba, Evaluating large language models trained on code. arXiv:2107.03374 [cs.LG] (2021).

J. Austin, A. Odena, M. Nye, M. Bosma, H. Michalewski, D. Dohan, E. Jiang, C. Cai, M. Terry, Q. Le, C. Sutton, Program synthesis with large language models. arXiv:2108.07732 [cs.PL] (2021).

D. Hendrycks, S. Basart, S. Kadavath, M. Mazeika, A. Arora, E. Guo, C. Burns, S. Puranik, H. He, D. Song, J. Steinhardt, Measuring coding challenge competence with APPS. arXiv:2105.09938 [cs.SE] (2021).

I. Sutskever, O. Vinyals, Q. V. Le, Sequence to sequence learning with neural networks. Adv. Neural Inf. Process. Syst. 27, 3104–3112 (2014).

T. Kudo, J. Richardson, SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing. arXiv:1808.06226 [cs.CL] (2018).

N. Shazeer, Fast transformer decoding: One write-head is all you need, arXiv:1911.02150 [cs.NE] (2019).

J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (Association for Computational Linguistics, 2019), pp. 4171–4186.

R. Dabre, A. Fujita, Softmax tempering for training neural machine translation models. arXiv:2009.09372 [cs.CL] (2020).

R. Y. Pang, H. He, Text generation by learning from demonstrations. arXiv:2009.07839 [cs.CL] (2020).

O. Vinyals, I. Babuschkin, W. M. Czarnecki, M. Mathieu, A. Dudzik, J. Chung, D. H. Choi, R. Powell, T. Ewalds, P. Georgiev, J. Oh, D. Horgan, M. Kroiss, I. Danihelka, A. Huang, L. Sifre, T. Cai, J. P. Agapiou, M. Jaderberg, A. S. Vezhnevets, R. Leblond, T. Pohlen, V. Dalibard, D. Budden, Y. Sulsky, J. Molloy, T. L. Paine, C. Gulcehre, Z. Wang, T. Pfaff, Y. Wu, R. Ring, D. Yogatama, D. WA1/4nsch, K. McKinney, O. Smith, T. Schaul, T. Lillicrap, K. Kavukcuoglu, D. Hassabis, C. Apps, D. Silver, Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575, 350–354 (2019).

For problems permitting multiple correct outputs, we changed the example test outputs to be the most canonical, which gives our approach a slight advantage in the evaluation. See SM text section D for more details.

N. Carlini F. TramA"r, E. Wallace, M. Jagielski, A. Herbert-Voss, K. Lee, A. Roberts, T. Brown, D. Song, As. Erlingsson, A. Oprea, C. Raffel, “Extracting training data from large language models,” 30th USENIX Security Symposium (USENIX Security 21), 11 to 13 August 2021, pp. 2633–2650.

Y. Li, D. Choi, J. Chung, N. Kushman, J. Schrittwieser, R. Leblond, T. Eccles, J. Keeling, F. A. Gimeno Gil, A. Dal Lago, T. Hubert, P. Choy, C. de Masson d’Autume, I. Babuschkin, X. Chen, P.-S. Huang, J. Welbl, S. Gowal, A. Cherepanov, J. Molloy, D. Mankowitz, E. Sutherland Robson, P. Kohli, N. de Freitas, K. Kavukcuoglu, O. Vinyals AlphaCode data materials, version 1.0.0, Zenodo (2022); https://doi.org/10.5281/zenodo.6975437.

R. Puri, D. S. Kung, G. Janssen, W. Zhang, G. Domeniconi, V. Zolotov, J. Dolby, J. Chen, M. Choudhury, L. Decker, V. Thost, L. Buratti, S. Pujar, S. Ramji, U. Finkler, S. Malaika, F. Reiss, CodeNet: A large-scale AI for code dataset for learning a diversity of coding tasks. arXiv:2105.12655 [cs.SE] (2021).

S. Gulwani, O. Polozov, R. Singh, Program synthesis. Found. Trends Program. Lang. 4, 1–119 (2017).

S. Borgeaud, A. Mensch, J. Hoffmann, T. Cai, E. Rutherford, K. Millican, G. van den Driessche, J.-B. Lespiau, B. Damoc, A. Clark, D. de Las Casas, A. Guy, J. Menick, R. Ring, T. Hennigan, S. Huang, L. Maggiore, C. Jones, A. Cassirer, A. Brock, M. Paganini, G. Irving, O. Vinyals, S. Osindero, K. Simonyan, J.W. Rae, E. Elsen, L. Sifre, Improving language models by retrieving from trillions of tokens. arXiv:2112.04426 [cs.CL] (2021).

M. Allamanis, ?oThe adverse effects of code duplication in machine learning models of code,?? Proceedings of the 2019 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software, Athens, Greece, 23 and 24 October 2019 (Association for Computing Machinery, 2019), pp. 143–153.

J. Bradbury, R. Frostig, P. Hawkins, M. J. Johnson, C. Leary, D. Maclaurin, G. Necula, A. Paszke, J. VanderPlas, S. Wanderman-Milne, Q. Zhang, JAX: composable transformations of Python+NumPy programs, version 0.3.13, GitHub (2018); https://github.com/google/jax.

J. Kaplan, S. McCandlish, T. Henighan, T. B. Brown, B. Chess, R. Child, S. Gray, A. Radford, J. Wu, D. Amodei, Scaling laws for neural language models. arXiv:2001.08361 [cs.LG] (2020).

I. Loshchilov, F. Hutter, Decoupled weight decay regularization. arXiv:1711.05101 [cs.LG] (2017).

D. P. Kingma, J. Ba, Adam: A method for stochastic optimization. arXiv:1412.6980 [cs.LG] (2014).

Y. Nandwani, D. Jindal, Mausam, P. Singla, ?oNeural learning of one-of-many solutions for combinatorial problems in structured output spaces,?? International Conference on Learning Representations (ICLR 2021), 3 to 7 May 2021.

A. Fan, M. Lewis, Y. Dauphin, ?oHierarchical neural story generation,?? Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Melbourne, Australia, 15 to 20 July 2018 (Association for Computational Linguistics, 2018), pp. 889–898.

A. Holtzman, J. Buys, L. Du, M. Forbes, Y. Choi, ?oThe curious case of neural text degeneration,?? Proceedings of the 8th International Conference on Learning Representations (ICLR 2020), 26 April to 1 May 2020.

N. P. Jouppi, D. H. Yoon, M. Ashcraft, M. Gottscho, T. B. Jablin, G. Kurian, J. Laudon, S. Li, P. Ma, X. Ma, N. Patil, S. Prasad, C. Young, Z. Zhou, D. Patterson, ?oTen lessons from three generations shaped Google’s TPUv4i,?? 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA 2021), 14 to 19 June 2021 (IEEE, 2021), pp. 1–14.

J. W. Rae, S. Borgeaud, T. Cai, K. Millican, J. Hoffmann, F. Song, J. Aslanides, S. Henderson, Ro. Ring, S. Young, E. Rutherford, T. Hennigan, J. Menick, A. Cassirer, R. Powell, G. van den Driessche, L. A. Hendricks, M. Rauh, P.-S. Huang, A. Glaese, J. Welbl, S. Dathathri, S. Huang, J. Uesato, J. Mellor, I. Higgins, A. Creswell, N. McAleese, A. Wu, E. Elsen, S. Jayakumar, E. Buchatskaya, D. Budden, E. Sutherland, K. Simonyan, M. Paganini, L. Sifre, L. Martens, X. L. Li, A. Kuncoro, A. Nematzadeh, E. Gribovskaya, D. Donato, A. Lazaridou, A. Mensch, J.-B. Lespiau, M. Tsimpoukelli, N. Grigorev, D. Fritz, T. Sottiaux, M. Pajarskas, T. Pohlen, Z. Gong, D. Toyama, C. de Masson d’Autume, Y. Li, T. Terzi, V. Mikulik, I. Babuschkin, A. Clark, D. de Las Casas, A. Guy, C. Jones, J. Bradbury, M. Johnson, B. Hechtman, L. Weidinger, I. Gabriel, W. Isaac, E. Lockhart, S. Osindero, L. Rimell, C. Dyer, O. Vinyals, K. Ayoub, J. Stanway, L. Bennett, D. Hassabis, K. Kavukcuoglu, G. Irving, Scaling language models: methods, analysis & insights from training Gopher. arXiv:2112.11446 [cs.CL] (2021).

Q. U. Ain, W. H. Butt, M. W. Anwar, F. Azam, B. Maqbool, A systematic review on code clone detection. IEEE Access 7, 86121–86144 (2019).

G. Pruthi, F. Liu, S. Kale, M. Sundararajan, Estimating training data influence by tracing gradient descent. Adv. Neural Inf. Process. Syst. 33, 19920–19931 (2020).

P.-S. Huang, R. Stanforth, J. Welbl, C. Dyer, D. Yogatama, S. Gowal, K. Dvijotham, P. Kohli, Achieving verified robustness to symbol substitutions via interval bound propagation. Proc. Conf. Empir. Methods Nat. Lang. Process. 4081–4091 (2019).

J. Ganitkevitch, B. Van Durme, C. Callison-Burch, ?oPPDB: The paraphrase database,?? Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Atlanta, Georgia, 9 to 15 June 2013 (Association for Computational Linguistics, 2013), pp. 758–764.

S. Edunov, M. Ott, M. Auli, D. Grangier, ?oUnderstanding back-translation at scale,?? Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October to 4 November 2018 (Association for Computational Linguistics, 2018), pp. 489–500.

A. Solar-Lezama, ?oProgram synthesis by sketching,?? thesis, University of California, Berkeley (2008).

P. Yin, G. Neubig, ?oA syntactic neural model for general-purpose code generation,?? Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver, Canada, 30 July to 4 August 2017 (Association for Computational Linguistics, 2017), pp. 440–450.

W. Ling, P. Blunsom, E. Grefenstette, K. Moritz Hermann, T. Ko?iskA1/2, F. Wang, A. Senior, ?oLatent predictor networks for code generation,?? Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Berlin, Germany, 7 to 12 August 2016 (Association for Computational Linguistics, 2016), pp. 599–609.

M. Balog, A. L. Gaunt, M. Brockschmidt, S. Nowozin, D. Tarlow, DeepCoder: Learning to write programs. arXiv:1611.01989 [cs.LG] (2016).

V. Murali, L. Qi, S. Chaudhuri, C. Jermaine, Neural sketch learning for conditional program generation. arXiv:1703.05698 [cs.PL] (2017).

D. Guo, A. Svyatkovskiy, J. Yin, N. Duan, M. Brockschmidt, M. Allamanis, Learning to complete code with sketches. arXiv:2106.10158 [cs.LG] (2021).

S. Kulal, P. Pasupat, K. Chandra, M. Lee, O. Padon, A. Aiken, P. S. Liang, SPoC: Search-based Pseudocode to Code. Adv. Neural Inf. Process. Syst. 32, 11883–11894 (2019).

J. Devlin, J. Uesato, S. Bhupatiraju, R. Singh, A. Mohamed, P. Kohli, RobustFill: Neural program learning under noisy I/O. Proc. Mach. Learn. Res. 70, 990–998 (2017).

D. Trivedi, J. Zhang, S.-H. Sun, J. J. Lim, Learning to synthesize programs as interpretable and generalizable policies. Adv. Neural Inf. Process. Syst. 34, 25146–25163 (2021).

R. Robbes, M. Lanza, ?oHow program history can improve code completion,?? ASE ’08: Proceedings of the 2008 23rd IEEE/ACM International Conference on Automated Software Engineering, L’Aquila, Italy, 15 to 19 September 2008 (IEEE, 2008), pp. 317–326.

G. A. Aye, S. Kim, H. Li, ?oLearning autocompletion from real-world datasets,?? 2021 IEEE/ACM 43rd International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP), 25 to 29 May 2021 (IEEE, 2021), pp. 131–139.

Z. Feng, D. Guo, D. Tang, N. Duan, X. Feng, M. Gong, L. Shou, B. Qin, T. Liu, D. Jiang, M. Zhou, ?oCodeBERT: A pre-trained model for programming and natural languages,?? Findings of the Association for Computational Linguistics: EMNLP 2020 (Association for Computational Linguistics, 2020), pp. 1536–1547.

C. B. Clement, D. Drain, J. Timcheck, A. Svyatkovskiy, N. Sundaresan, PyMT5: multi-mode translation of natural language and Python code with transformers. arXiv:2010.03150 [cs.LG] (2020).

L. Tang, E. Ke, N. Singh, N. Verma, I. Drori, Solving probability and statistics problems by program synthesis. arXiv:2111.08267 [cs.LG] (2021).

I. Drori, N. Verma, Solving linear algebra by program synthesis. arXiv:2111.08171 [cs.LG] (2021).

K. Cobbe, V. Kosaraju, M. Bavarian, M. Chen, H. Jun, L. Kaiser, M. Plappert, J. Tworek, J. Hilton, R. Nakano, C. Hesse, J. Schulman, Training verifiers to solve math word problems. arXiv:2110.14168 [cs.LG] (2021).

S. Ren, D. Guo, S. Lu, L. Zhou, S. Liu, D. Tang, N. Sundaresan, M. Zhou, A. Blanco, S. Ma, CodeBLEU: a method for automatic evaluation of code synthesis. arXiv:2009.10297 [cs.SE] (2020).

M. Zavershynskyi, A. Skidanov, I. Polosukhin, NAPS: Natural Program Synthesis Dataset. arXiv:1807.03168 [cs.LG] (2018).

L. Weidinger, J. Mellor, M. Rauh, C. Griffin, J. Uesato, P.-S. Huang, M. Cheng, Mia Glaese, B. Balle, A. Kasirzadeh, Z. Kenton, S. Brown, W. Hawkins, T. Stepleton, C. Biles, A. Birhane, J. Haas, L. Rimell, L. A. Hendricks, W. S. Isaac, S. Legassick, G. Irving, I. Gabriel, Ethical and social risks of harm from language models. arXiv:2112.04359 [cs.CL] (2021).

J. Cai, R. Shin, D. Song, ?oMaking neural programming architectures generalize via recursion,?? Proceedings of the 5th International Conference on Learning Representations (ICLR), Toulon, France, 24 to 26 April 2017.

H. Pearce, B. Ahmad, B. Tan, B. Dolan-Gavitt, R. Karri, Asleep at the keyboard? Assessing the security of GitHub Copilot’s code contributions. arXiv:2108.09293 [cs.CR] (2021).