Settings

Theme

Show HN: YPerf – Monitor LLM Inference API Performance

yperf.com

2 points by xjconlyme a year ago · 2 comments · 1 min read

Reader

Our team operates several real-time AI applications, where both latency(TTFT) and throughput(TPS) are critical to most of our users. Unfortunately, nearly all of the major LLM APIs lack consistent stability.

To address this, I developed YPerf—a simple webpage designed to monitor the performance of inference APIs. I hope it helps you select better models and discover new trending ones as well.

The data is sourced from OpenRouter, an excellent provider that aggregates LLM API services.

Oras a year ago

Nice one. It would be great to use filtering. For example, I want to check the TPS of Llama 3.3 across multiple providers.

  • xjconlymeOP a year ago

    [Updated] I've added the filtering and the multiple provider comparison!

    ---

    Great suggestion!

    I currently pick the fastest TPS among providers, and you can see a detail performance list by clicking the <Learn More> icon at the last column. For example, here's the detailed OR page of Llama 3.3: https://openrouter.ai/meta-llama/llama-3.3-70b-instruct

    I'll add the filtering soon.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection