MPT-30B: Raising the bar for open-source foundation models
mosaicml.comIt's interesting that they've appeared to have undertrained their 30B model at least compared to LLama/Falcon.
The coding ability performed better, but it's still far behind WizardCoder which is half the size - of course WizardCoder wasn't released why they started training MPT-30B.
The 8k context is an interesting addition. Are there any standard benchmarks to show how coherently models perform at different context lengths - 1k, 2k, 4k, 8k, etc?
Correction: Foundation _Series_ models