LIMA: Less Is More for Alignment
arxiv.orgFor me, the paper contained many gems. Besides the superficial alignment hypothesis and its consequences on the fine-tuning dataset, Figure 7 about instruction alignment vs conversation alignment and Figure 9 about the positive correlation of the perplexity number with the quality score (i.e. negative correlation of the perplexity based model quality and with the response based quality) were very insightful.
What I missed: How does the superficial alignment hypothesis related to model size (they only investigate disjoint aspects on 7B vs 65B llama models). Since the paper focuses on data quality, I would have expected an annotation guideline.
Still, I think the paper is an excellent read.
I think they have some mistake in their analysis
> "B Anticorrelation between Perplexity and Generation Quality"
> "When fine-tuning LIMA, we observe that perplexity on held-out Stack Exchange data (2,000 examples) negatively correlates with the model’s ability to produce quality responses. To quantify this manual observation, we evaluate model generations using ChatGPT, following the methodology described in Section 5. Figure 9 shows that as perplexity rises with more training steps – which is typically a negative sign that the model is overfitting – so does the quality of generations increase"
I think where they say "anticorrelation" it should say "correlation" and where they say "negatively correlates" it should say "positively correlates" if they are basing their statement on what they observed in their experiments.
EDIT: I see they say "Preprint. Under review" so maybe they will fix it if it's wrong. This is the kind of thing that peer review is really good at fixing. Also not every submission on arxiv is a preprint or under review but I guess this one is.
This is such an interesting direction for LLM research (especially because it's easy to imagine applicability in industry as well).
If all it takes is ~1k high-quality examples (of course, quality can be tricky to define) to tune an LLM successfully, then we should expect to see these tuned LLMs for many different narrow use cases.
Of course, devil is likely in the details. Even in this paper, the prompts on which the model is evaluated were written by the authors and "inspired by their own interests or those of their friends." Can be tricky to make a jump from these prompts and answers to real world LLM use cases, but super super promising.
I went in expecting progress on "alignment" as in "how to make sure AI doesn't kill us all" and I saw nothing at all about that in the paper. Disappointing.
Using the term "alignment" for what they're trying to do is misleading.
Your understanding of alignment is somewhat out of date. Training a model to produce human-valued responses and training a model not to decide to destroy all the humans are not separate problems. RLHF may actually be an excellent solution to many of the problems you care about for today's LLMs, even though it is done for a practical reason (we want LLMs that will answer our questions with useful answers) instead of an existential risk reason.
It's not misleading. It's the way the term is used in the field. The usage of the term as you are thinking of it is just another usage.
Seems interesting as it runs counter to the "common knowledge" that fine-tuning large LMs needs a lot of data and RLHF for good results.
Not that the absolute results are extremely strong, most likely I'd suspect as the base model is just not competitive to GPT4 atm, but the relative results seem very impactful. Maybe fine-tuning a large LM for specific tasks is more practical than thought before?
In human learning at least, you need a good teacher that can give you a self consistent and correct basis and then you build on that. If you learn randomly your understanding will be "blurry" and then you have to spend time to unlearn the bad lessons. I have personal experience with this. I definitely like the message of this result, if it is true.
I wish they would say more about their training setup like how many of which kind of gpgpu for how long.