Settings

Theme

Unified Vision-Language Agents – Detect, Segment, OCR, Generate and More

github.com

5 points by fzysingularity 13 days ago · 1 comment

Reader

fzysingularityOP 13 days ago

Here's a short cookbook exploring an agentic approach to vision–language tasks: detection, segmentation, OCR, generation, and combining classical CV tools with VLM reasoning.

Happy to run examples if you leave a comment.

[1] IPython notebook: https://github.com/vlm-run/vlmrun-cookbook/blob/main/noteboo...

[2] Colab: https://colab.research.google.com/github/vlm-run/vlmrun-cook...

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection