Settings

Theme

Show HN: AnyModal – Train Your Own Multimodal LLMs

github.com

8 points by ritabratamaiti a year ago · 0 comments · 1 min read

Reader

I’ve been working on AnyModal, a framework for integrating different data types (like images and audio) with LLMs. Existing tools felt too limited or task-specific, so I wanted something more flexible. AnyModal makes it easy to combine modalities with minimal setup—whether it’s LaTeX OCR, image captioning, or chest X-ray interpretation.

You can plug in models like ViT for image inputs, project them into a token space for your LLM, and handle tasks like visual question answering or audio captioning. It’s still a work in progress, so feedback or contributions would be great.

GitHub: https://github.com/ritabratamaiti/AnyModal

No comments yet.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection