RedPajama at 440B tokens higher quality than Pythia and StableLM
together.xyzA week ago we announced RedPajama, a project to create leading open-source models. We released the first step in the project a training dataset of over 1.2 trillion tokens following the LLaMA recipe.
Today we shared progress on training our first model on this dataset, a 7B parameter model using the Pythia architecture. So far we are a bit less than 50% through the training - 440B parameters. We published HELM benchmark results on 16 different scenarios for this checkpoint, showing the model accuracy to be quite high for this stage of training.
So excited about this! Are there going to be bigger sized models as well?