Major upgrades to Ray Serve: 88% lower latency and 11.1x higher throughput
anyscale.comThe real story here is replacing Ray Core's actor task dispatch with direct gRPC for the hot path. Essentially admitting that the general-purpose actor model was the bottleneck for inference workloads. Smart move — let Ray handle orchestration and get out of the way for actual data flow.