Show HN: Vlm Run, Extract JSON from images, videos and documents in a simple API

2 points by EarlyOom a year ago · 0 comments · 1 min read

Reader

Hey HN,

We’ve been building out an API for ‘Visual ETL’ that we call vlm.run. We’ve been working with foundation models (GPT4o, Gemini) for a few months and kept running into failure modes like:

- Hallucinations: even the best foundation models continue to hallucinate outputs for complex visual inputs, even when adhering to a schema.

- Rate limits: frontier models like GPT4o are still too expensive or rate limited for high volume visual data. Our API is designed for production workloads which means speed, stability, monitoring and, if needed, private deployments.

- Off the shelf schemas: Defining a schema takes trial and error to get right. We’ve put together a taxonomy for common visual tasks that are ready to go from day 1.

Some examples we’ve put together:

- Presentations: https://docs.vlm.run/guides/guide-pdf-presentations

- TV News: https://docs.vlm.run/guides/guide-tv-news

Sign up for an API key and try us out on a 2 week free trial. Check out our docs at https://docs.vlm.run/what-is-vlm-1 and reach out if you have questions!

No comments yet.

Settings

Show HN: Vlm Run, Extract JSON from images, videos and documents in a simple API

Keyboard Shortcuts