Settings

Theme

Show HN: Chat with Orion – a visual agent that sees, reasons and acts

chat.vlm.run

22 points by fzysingularity a month ago · 11 comments · 2 min read

Reader

Hey HN! We’re excited to share Orion [1] — our new visual agent that sees, reasons, and acts across images, videos, and documents.

Frontier VLMs (GPT, Claude, Gemini) can describe what they see, but they can’t reliably act on visual inputs. Ask them to detect objects, segment images, or chain visual steps — they’ll fail in surprisingly inconsistent ways. High-res images collapse to ~1024px. And the visual AI ecosystem is fragmented across separate APIs for image understanding, OCR, image-gen, video-gen, etc.

We built Orion to fix this.

Orion combines VLM reasoning with reliable computer-vision tools inside a unified chat-completions interface. You can chain visual steps, inspect results, and treat visual tasks the same way you treat text workflows. Here’s a quick demo [2].

What Orion can do today: - Detect objects, faces, people (with precise, visualized boxes) - Segment objects or salient regions interactively - Edit, remix, and re-imagine images/videos from prompts - Summarize visual content (images or videos) - Transform images: crop, rotate, upscale - Transform videos: trim, sample, highlight scenes - Parse and structure documents: pagination, layout, OCR, extraction

One unified “chat-completions”-like interface — no juggling multiple vision APIs. Check out the tours in the chat [3] or read the announcement [4].

API access opens next week. Happy to answer any questions — otherwise, feel free to try the tours and break things!

[1] Learn more about Orion: https://vlm.run/orion

[2] Promo video: https://youtu.be/cPJN4iZz6QQ

[3] Chat: https://chat.vlm.run

[4] LinkedIn announcement: https://www.linkedin.com/posts/sudeeppillai_ai-computervisio...

hackintothings a month ago

I just tried out generating and editing this video it performed a pretty good results which is not possible with other chat interfaces. can you tell what is the bottleneck of this agents?

  • fzysingularityOP a month ago

    It's still early days, but we'll expand to more capabilities very quickly given that we're not bottlenecked by training a single large VLM to do these tasks - think video tracking, in-image editing, and 3D.

orm a month ago

The video was interesting. Seems like a nice way to start a shopping search if you have a picture with something you want where the look matters. Eg, cars, furniture. etc.

kernel33 a month ago

I tried object segmentation and it’s really good

  • fzysingularityOP a month ago

    Hey, thanks! Curious what you tried to test it. Segmentation models like SAM2 only gets you so far, but by make this instruction-driven with reasoning in the loop, it's remarkable what you can do these days.

    Stay tuned for more updates here, tracking segments is coming soon!

aivisionperson a month ago

Really crazy results there. would love to test more

SoftwareManHere a month ago

It's really cool how good of a job it did!

Lona_Kiragu a month ago

The AI world just got better with Orion!

slater a month ago

wow, so many astro-turfed responses in this post. it must be a really good app!!

....

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection