WizardCoder-34B-Python surpasses GPT-4 on HumanEval
twitter.comExcept it doesn't. They even mention in the same Tweet that their own test showed 82 percent for GPT-4.
But if CodeLlama can make that claim then I guess it's fair for WizardCoder to say it also.
Wherever the old number is coming from.shouod be updated.