Settings

Theme

A Scalable Standard for Clean ECommerce Data in LLMs (Fork of Llms.txt)

2 points by nicola_alessi 10 months ago · 1 comment · 1 min read


The Problem: LLMs are terrible at understanding eCommerce sites. They: Hallucinate prices/specs from messy HTML Waste tokens on UI boilerplate (headers, popups, ads) Struggle with real-time inventory/pricing updates

Our solution: A fork of Answer.AI’s llms.txt that introduces site-llms.xml, an XML sitemap protocol for product data.

Stores expose: /site-llms.xml: Index of all product URLs /product/123/llms.txt: Clean Markdown with specs/pricing (example in repo)

Benefits: AI gets structured data instead of scraping Stores control what’s exposed (like robots.txt) Scales to millions of products (sitemap indexes supported)

We’re open-sourcing this under CC BY-SA (same as sitemap protocol). Would love HN’s thoughts:

Is this the right abstraction? Could it work for non-eCommerce sites?

Repo: github.com/Lumigo-AI/site-llms (stars welcome!)

No comments yet.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection