Settings

Theme

Show HN: VLM Inference Engine in Rust

mixpeek.com

1 points by Beefin 8 days ago · 1 comment

Reader

storystarling 8 days ago

What hardware are you running this on to get 2-3s latency? A 14GB model plus KV cache seems like it would require a 24GB card (3090/4090) to avoid swapping. I've found that once you spill over to system RAM on consumer gear the performance usually falls off a cliff.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection