OpenAI Codex can solve unseen Codeforces problems - Codeforces

5 min read Original article ↗

In this post, we will use the recent OpenAI Codex product based on GPT-3, first introduced in Evaluating Large Language Models Trained on Code (Chen et al., 2021) to solve some Codeforces problems.

We use the round Codeforces Round 739 (Div. 3) because the current version of the Codex model was released before this round. As the model has been trained on the Github data, it is possible that it memorized some solutions for older problems.

Solving 1560A - Dislike of Threes from the statement only

This is a simple problem, and the statement kind of explains what we should implement. Let's just give the plaintext statement to Codex:

Prompt

Problem: Polycarp doesn't like integers that are divisible by 3 or have the last digit 3 in their decimal representation.

Polycarp starts to write out the positive (greater than 0) integers which he likes: 1,2,4,5,7,8,10,11,14,16,…. Output the k-th element of this sequence (the elements are numbered from 1).

Input The first line contains one integer t (1≤t≤100) — the number of test cases. Then t test cases follow. Each test case consists of one line containing one integer k (1≤k≤1000).

Output For each test case, output in a separate line one integer x — the k -th element of the sequence that was written out by Polycarp.

#include <bits/stdc++.h>
using namespace std;

Code

/*
Problem:
Polycarp doesn't like integers that are divisible by 3 or have the last digit 3
in their decimal representation. 

Polycarp starts to write out the positive (greater than 0) integers which he likes: 
1,2,4,5,7,8,10,11,14,16,…. 
Output the k-th element of this sequence (the elements are numbered from 1). 

Input
The first line contains one integer t (1≤t≤100) — the number of test cases. Then t test cases follow.
Each test case consists of one line containing one integer k (1≤k≤1000).

Output
For each test case, output in a separate line one integer x — 
the k -th element of the sequence that was written out by Polycarp.

*/

#include <bits/stdc++.h>
using namespace std;  // <-- from now on it's Codex

int main() {
	int t;
	cin >> t;
	for (int i = 0; i < t; i++) {
		int k;
		cin >> k;
		int x = 1;
		while (k > 0) {
			if (x % 3 != 0 && x % 10 != 3) {
				k--;
			}
			x++;
		}
		cout << x - 1 << endl;
	}
	return 0;
};

Demo of Codex in action

127276629

Disclaimers

It produces correct code on the second try, the first C++ submission was WA 2. The problem text has been cleaned a bit, and "end in the digit 3" was changed to "have the last digit 3" -- because for some reason Codex thinks "end in the digit 3" means "contain the digit 3".

Solving 1560B - Who's Opposite? from the problem statement and the editorial

Here we have no hope for the solution straight from the problem statement, as it requires some thinking. However, if you give it the editorial, it can produce working code from the first try:

127226971

No formatting is required: just copy and paste the statement and the editorial from the Codeforces website. No inductive bias in user input was needed at all!

What you can do if this is cool or frightening to you

It is a net good to the world if everyone has access to models such as Codex or GPT-3.

If you are a machine learning researcher, or maybe just a competent person in general, try to get involved in EleutherAI, Hugging Face, or something similar. If you want to work on important things, in my opinion language models are very critical right now.

Bonus: What you can do if you own a competitive programming site

The advice below is my personal opinion.

  1. Try to get a tester with Codex access for future rounds. I predict it could become a part of the cheating issue as soon as 2022.
  2. Think of what the rules should say about using this kind of tools.
  3. Prepare for the effects of competitive programming becoming less relevant to job interviews. If your financing model relies on job interview preparation, you will need to change it in a few years. (Codeforces and Atcoder seem to be in the best shape here.)

FAQ

Can language models solve harder problems?

Language models are a new and poorly understood object. Here is a gross oversimplification of the current conventional wisdom:

GPT-3 is a language model. Its main purpose is translating between different representations of the same data. It can do English to German, it can be used to describe imagess, and with Codex it can translate textual commands to code. Codex can solve the Div3 A from scratch because the solution just implementation of the text, but it needs the editorial to solve Div3 B.

As of 2021, language models are not intended for solving algorithmic puzzles or producing mathematical proofs. It is likely we will have a much more powerful Codeforces solver by say 2026, when the research manages to combine proof search methods with the GPT-3 architecture. If I manage to read enough papers on this, I might write more on my academic blog.

I can't see the demo.

If you can't access imgur, go to this link.

How to get access to Codex?

Have some academic credentials, and fill in the waitlist form at OpenAI's website with an interesting research proposal. Please do not use Codex for bad purposes such as cheating or spam.

How to integrate Codex with vim? I got access but I can only use it in the Playground.

Use vim_codex with your OpenAI keys.

What are the Codex parameters in the demo?

I used the OpenAI API from Python as follows:

API call

openai.Completion.create(engine='davinci-codex', prompt=input_prompt, best_of=1, temperature=0.01, max_tokens=300)