Show HN: Chat with Orion – a visual agent that sees, reasons and acts
chat.vlm.runHey HN! We’re excited to share Orion [1] — our new visual agent that sees, reasons, and acts across images, videos, and documents.
Frontier VLMs (GPT, Claude, Gemini) can describe what they see, but they can’t reliably act on visual inputs. Ask them to detect objects, segment images, or chain visual steps — they’ll fail in surprisingly inconsistent ways. High-res images collapse to ~1024px. And the visual AI ecosystem is fragmented across separate APIs for image understanding, OCR, image-gen, video-gen, etc.
We built Orion to fix this.
Orion combines VLM reasoning with reliable computer-vision tools inside a unified chat-completions interface. You can chain visual steps, inspect results, and treat visual tasks the same way you treat text workflows. Here’s a quick demo [2].
What Orion can do today: - Detect objects, faces, people (with precise, visualized boxes) - Segment objects or salient regions interactively - Edit, remix, and re-imagine images/videos from prompts - Summarize visual content (images or videos) - Transform images: crop, rotate, upscale - Transform videos: trim, sample, highlight scenes - Parse and structure documents: pagination, layout, OCR, extraction
One unified “chat-completions”-like interface — no juggling multiple vision APIs. Check out the tours in the chat [3] or read the announcement [4].
API access opens next week. Happy to answer any questions — otherwise, feel free to try the tours and break things!
[1] Learn more about Orion: https://vlm.run/orion
[2] Promo video: https://youtu.be/cPJN4iZz6QQ
[3] Chat: https://chat.vlm.run
[4] LinkedIn announcement: https://www.linkedin.com/posts/sudeeppillai_ai-computervisio... I just tried out generating and editing this video it performed a pretty good results which is not possible with other chat interfaces. can you tell what is the bottleneck of this agents? It's still early days, but we'll expand to more capabilities very quickly given that we're not bottlenecked by training a single large VLM to do these tasks - think video tracking, in-image editing, and 3D. The video was interesting. Seems like a nice way to start a shopping search if you have a picture with something you want where the look matters. Eg, cars, furniture. etc. Do you mean like creating a personalized item from another product image? I tried object segmentation and it’s really good Hey, thanks! Curious what you tried to test it. Segmentation models like SAM2 only gets you so far, but by make this instruction-driven with reasoning in the loop, it's remarkable what you can do these days. Stay tuned for more updates here, tracking segments is coming soon! Really crazy results there. would love to test more It's really cool how good of a job it did! The AI world just got better with Orion! wow, so many astro-turfed responses in this post. it must be a really good app!! ....