Pool spare GPU capacity to run LLMs at larger scale

11 points by i386 a month ago · 3 comments

Reader

lostmsu a month ago

> MoE models via expert sharding with zero cross-node inference traffic

This makes the whole project questionable

vagrantJin a month ago

This is very promising, definitely looks more user friendly than exo. Can't wait to try it out.

iwinux a month ago

You lost me on "spare GPU". I don't have any capable GPUs, let alone spare ones :)

Settings