grpyc — Up to 8x Faster gRPC for Python | Rust Safety, Drop-in Compatible

6 min read Original article ↗

Drop-in API-compatible gRPC for Python

grpyc is a drop-in replacement for grpcio 1.80, built in Rust. Up to 8x throughput on GKE, 2x lower latency, zero memory leaks. Runs everywhere from Raspberry Pi to IBM Z mainframes. Change one import.

Up to 8x

Faster than grpcio

814 tok/s

vLLM streaming throughput

grpyc delivers tokens faster than REST or grpcio in simulated vLLM inference with 50-token responses.

2.2x

Lower P50 latency

Lower transport latency means requests reach the GPU faster, improving batch fill rates and overall GPU utilization.

Better batching

Higher GPU efficiency

Faster request delivery fills GPU batches more efficiently. Less time waiting on Python means more tokens per second per dollar.

Python API

Cython Wrappers

C Core Shim

gRPC C Core

OS / Network

Python API (compatible surface)

PyO3 Rust ↔ Python

h2 + rustls HTTP/2 + TLS

Tokio async I/O

OS / Network

  • Memory safe by default — Rust ownership eliminates entire vulnerability classes
  • Minimal GIL contention — I/O and serialization in Rust, GIL released during network ops
  • No C toolchain — prebuilt native wheels, no OpenSSL, installs in seconds
  • No memory leaks — impossible by design, not just tested

Edge & IoT

Raspberry Pi · ARM64 · ARMv7

Cloud & servers

x86_64 · ARM64 · Alpine/musl

IBM Z mainframe

s390x · no prebuilt grpcio wheel

Linux x86_64 ARM64 / aarch64 ARMv7 · Raspberry Pi 32-bit x86 IBM Z · s390x Alpine / musl macOS · Apple Silicon Windows 64 / 32-bit

Tokio Async Runtime

All I/O runs on the Tokio runtime, completely outside Python's GIL. True async without contention — the #1 source of tail latency in grpcio is gone.

Memory Safe by Design

Rust's ownership model eliminates use-after-free, buffer overflows, and data races at compile time. No more memory leaks under load. Security audited.

xDS Service Mesh

Full xDS support — LDS, RDS, CDS, EDS — for proxyless gRPC. Connect directly to your control plane. No sidecar proxy overhead.

TLS / mTLS via rustls

Modern TLS via rustls — no OpenSSL, no dependency conflicts. Mutual TLS for zero-trust architectures.

Intelligent Load Balancing

Round-robin, ring-hash, weighted round-robin, outlier detection — all built in. ORCA load reporting for advanced traffic management.

All 4 Streaming Modes

Unary, server streaming, client streaming, bidirectional — all fully async through Tokio. Flow control and backpressure handled in Rust.

Drop-in Compatible

Same grpc Python API. Change one import line. Your existing protobuf definitions, handlers, and interceptors work unchanged.

Raspberry Pi to IBM Z

Prebuilt wheels for Linux x86_64, ARM64, ARMv7 and 32-bit — plus IBM Z (s390x), where standard grpcio ships no prebuilt wheel. macOS and Windows too. Prebuilt — no build step.

AI & ML Inference

Ship faster inference — Model serving frameworks (vLLM, Triton, TensorFlow Serving) rely on gRPC between clients and inference workers. grpyc's Tokio runtime eliminates the gRPC overhead that wastes expensive GPU time.

  • Tighter latency → fewer retries → more effective GPU utilization
  • Lower memory per connection → more room for model weights
  • Streaming support for token-by-token generation

Google Cloud

Accelerate every GCP API call — BigQuery, Pub/Sub, Spanner, Firestore — the Google Cloud Python SDK uses gRPC for every call. grpyc replaces that transport layer with Rust-powered performance, zero code changes.

  • Higher throughput for BigQuery Storage API reads
  • Faster Pub/Sub publish and subscribe
  • Predictable Spanner latency for real-time apps

Service Mesh at Scale

Scale your service mesh without limits — In a mesh with hundreds of Python gRPC services, one slow service cascades into timeouts across the network. grpyc's predictable latency breaks the cascade chain.

  • Proxyless gRPC via built-in xDS — eliminate sidecar overhead
  • Tight P50–P99 spread keeps timeouts meaningful
  • Built-in load balancing, health checking, outlier detection

Async Unary QPS (Java server) GKE c2-standard-8 — higher is better

Unary Latency (P50) GKE ping-pong — lower is better

Cross-node QPS (c=64) GKE e2-standard-4 — higher is better

vLLM tokens/sec (c=8) simulated inference, 50 tokens

GKE c2-standard-8 & e2-standard-4, Go client, Python 3.13 servers. Baseline: grpcio 1.80.0. Cross-node anti-affinity. Full benchmark dashboard →

Memory safe by design

Rust ownership rules out use-after-free, buffer overflows, and data races at compile time — the class of C/C++ vulnerabilities that haunts grpcio's C core is gone from grpyc's Rust gRPC stack.

Fuzzed continuously

libFuzzer runs around the clock across HTTP/2 framing, HPACK, the gRPC message and timeout codecs, xDS config parsing, and SPIFFE identity handling. Adversarial input is part of CI, not an afterthought.

We harden the foundation

Our fuzzing uncovered a subtle, adversarially-triggerable crash in h2 — the production-grade Rust HTTP/2 stack used across the ecosystem — reproducible from a 59-byte frame sequence. We reported it and fixed it.

Reviewed & audited

Every change is peer-reviewed before merge. The codebase is security-audited with zero unsafe code in the hot path, and cargo-audit + cargo-deny scan the supply chain on every release.

Rust gRPC Stack

47 Rust modules implementing the complete gRPC protocol. h2 for HTTP/2 framing, rustls for TLS, prost for protobuf. Native Rust — not a Cython wrapper around a C core.

Tokio-Powered Async

All network I/O runs on the Tokio runtime outside Python's GIL. PyO3 bridges Rust futures to asyncio seamlessly. Batched event delivery minimizes GIL acquisition overhead.

Memory Safe by Design

Rust's ownership model prevents use-after-free, buffer overflows, and data races at compile time. Zero unsafe code in the hot path. Security audited with all findings addressed.

Runs everywhere your fleet does

Prebuilt wheels for Linux x86_64, ARM64, ARMv7, 32-bit and IBM Z (s390x), plus macOS and Windows — five glibc tiers and Alpine/musl. Prebuilt, installs in seconds. See full platform support →

Pro

Annual License

For teams running gRPC in production

  • Full feature set
  • Commercial production use
  • Email support
  • Migration guidance

Talk to Sales

Most Popular

Enterprise

Annual License

Production use with enterprise support

  • Commercial production use
  • Priority support with SLA
  • Architecture reviews & optimization
  • Dedicated Slack channel
  • Migration assistance

Talk to Sales

OEM / SDK Vendor

Custom

Embed grpyc in your product

  • Redistribution rights
  • Co-engineering support
  • Priority patches & builds
  • Custom build configurations

Contact for OEM

Your Python fleet is up to 8x faster with one change.

Free for evaluation. Prove it on your workload, then talk to us.