Settings

Theme

Tips for building performant LLM applications

moduloware.ai

4 points by zuzuen_1 5 months ago · 2 comments

Reader

zuzuen_1OP 5 months ago

I've been building Modulo AI for the past year - an AI system that fixes GitHub issues.

Early versions took 5+ minutes to analyze a single issue.

After months of optimization, we're now sub-60 seconds with better accuracy. This presentation encapsulates what we learned about the performance characteristics of production LLM systems that nobody talks about.

- Strategies for faster token throughput.

- Strategies for quick time to first token.

- Effective context window management and

- Model routing strategies.

If you're interested in building AI agents, I'm sure you'll find some interesting insights in it!

Install and try out our Github application: https://github.com/apps/solve-bug Try Modulo via browser at: https://moduloware.ai

Here are the code examples for the presentation: https://github.com/kirtivr/pydelhi-talk

What performance issues have you been seeing in your AI agents? And how did you tackle them?

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection