Which Nested Data Format Do LLMs Understand Best? JSON vs. YAML vs. XML vs. MD

2 points by mattcollins 5 months ago · 1 comment

Reader

This is a follow-up to previous work looking at which format of TABULAR data LLMs understand best: https://www.improvingagents.com/blog/best-input-data-format-...

(There was some good discussion on Hacker News around that here: https://news.ycombinator.com/item?id=45458455)

We often want to feed NON-TABULAR data to LLMs, though, such as typical API responses or config files.

This new work looks out how the format of such nested / hierarchical data affects how well LLMs can answer questions about it; specifically how several models get on with JSON, YAML, XML and Markdown.

Settings

Which Nested Data Format Do LLMs Understand Best? JSON vs. YAML vs. XML vs. MD

Keyboard Shortcuts