Settings

Theme

Ask HN: Devs using LLMs, how are you keeping costs low for LLM calls locally?

5 points by spruce_tips a year ago · 10 comments · 1 min read

Reader

My project has a multi step LLM flow using gpt-4o.

While developing new features/testing locally, the LLM flow frequently runs, and I use a bunch of tokens. My openAI bill spikes.

I've made some efforts to stub LLM responses but it adds a decent bit of complexity and work. I don't want to run a model locally with ollama because I need to output to be high quality and fast.

Curious how others are handling similar situations.

throwaway888abc a year ago

Use The Cache Luke...

Langchaing exemples:

[1] Caching https://python.langchain.com/v0.1/docs/modules/model_io/llms...

[2] Fake LLM https://js.langchain.com/v0.1/docs/integrations/llms/fake/

roh26it a year ago

Here's a mega guide on keeping costs low with LLMs - https://portkey.ai/blog/implementing-frugalgpt-smarter-llm-u...

tl;dr: - Keep prompts short, combine prompts or make more detailed prompts but go to a smaller model - Simple and semantic cache lookups - Classify tasks and route to the best LLM using an AI gateway

Portkey.ai could help with a lot of this

  • retrovrv a year ago

    came across this guide earlier - valuable insights. thanks for sharing!

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection