Show HN: YPerf – Monitor LLM Inference API Performance
yperf.comOur team operates several real-time AI applications, where both latency(TTFT) and throughput(TPS) are critical to most of our users. Unfortunately, nearly all of the major LLM APIs lack consistent stability.
To address this, I developed YPerf—a simple webpage designed to monitor the performance of inference APIs. I hope it helps you select better models and discover new trending ones as well.
The data is sourced from OpenRouter, an excellent provider that aggregates LLM API services. Nice one. It would be great to use filtering. For example, I want to check the TPS of Llama 3.3 across multiple providers. [Updated] I've added the filtering and the multiple provider comparison! --- Great suggestion! I currently pick the fastest TPS among providers, and you can see a detail performance list by clicking the <Learn More> icon at the last column. For example, here's the detailed OR page of Llama 3.3: https://openrouter.ai/meta-llama/llama-3.3-70b-instruct I'll add the filtering soon.