Major upgrades to Ray Serve: 88% lower latency and 11.1x higher throughput

2 points by robertnishihara 3 months ago · 1 comment

Reader

The real story here is replacing Ray Core's actor task dispatch with direct gRPC for the hot path. Essentially admitting that the general-purpose actor model was the bottleneck for inference workloads. Smart move — let Ray handle orchestration and get out of the way for actual data flow.

Settings

Major upgrades to Ray Serve: 88% lower latency and 11.1x higher throughput

Keyboard Shortcuts