Paper page - LLaMA Pro: Progressive LLaMA with Block Expansion

2 min read Original article β†—

Abstract

A new post-pretraining method using expanded Transformer blocks for Large Language Models improves knowledge without catastrophic forgetting, yielding LLaMA Pro-8.3B that excels in general tasks, programming, and mathematics.

Humans generally acquire new skills without compromising the old; however, the opposite holds for Large Language Models (LLMs), e.g., from LLaMA to CodeLLaMA. To this end, we propose a new post-pretraining method for LLMs with an expansion of Transformer blocks. We tune the expanded blocks using only new corpus, efficiently and effectively improving the model's knowledge without catastrophic forgetting. In this paper, we experiment on the corpus of code and math, yielding LLaMA Pro-8.3B, a versatile foundation model initialized from LLaMA2-7B, excelling in general tasks, programming, and mathematics. LLaMA Pro and its instruction-following counterpart (LLaMA Pro-Instruct) achieve advanced performance among various benchmarks, demonstrating superiority over existing open models in the LLaMA family and the immense potential of reasoning and addressing diverse tasks as an intelligent agent. Our findings provide valuable insights into integrating natural and programming languages, laying a solid foundation for developing advanced language agents that operate effectively in various environments.

View arXiv page View PDF GitHub 514 auto Add to collection

Models citing this paper 95

Browse 95 models citing this paper

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2401.02415 in a dataset README.md to link it from this page.

Spaces citing this paper 2

Collections including this paper 18