Larger and More Instructable Language Models Become Less Reliable
nature.comThe authors claim to have found that as LLMs become larger and more instructable, they:
* become less reliable in tasks of low difficulty,
* become more prone to giving sensible-sounding but wrong answers than smaller, less instructable models, and
* become more stable to different phrasings of the same question, but continue to exhibit pockets of variability persist across task difficulty levels.
All these claims ring true... but it remains to be seen if LLMs will continue to suffer from the same issues as they become orders of magnitude larger. No one knows yet.
Nonetheless, the authors take a big leap: They somehow conclude that we need (quoting) "a fundamental shift in the design and development of general-purpose AI models, particularly in high-stakes areas for which a predictable distribution of errors is paramount."
I'm not sure there's enough evidence to reach that conclusion today.