Show HN: Reduce ChatGPT costs 10x with distributed cache for LLMs

22 points by zaiste 2 years ago · 3 comments · 2 min read

Reader

Hello HN!

We're building a caching solution for LLMs (ChatGPT, Claude). By combining cutting-edge approaches, such as edge computing, prompt compression, vectorization, and others - it can reduce your AI bills by up to 10x and significantly lower response times.

Key Features: - cost efficiency: our system stores frequent queries, reducing the number of upstream (paid) API calls - fast responses: with various nodes globally, we reduce latency by serving data from the nearest location - scalability: designed to handle increasing loads and data sizes without degrading performance.

The cache operates transparently, requiring minimal changes to your existing setup. It's like your local content delivery network (CDN), but for LLMs, ensuring that users get the fastest possible access to information.

While still in its early stages, we are excited about the potential and are looking to gather feedback from the community. We will share the demo once you subscribe to our mailing list (to control our spending ;))

This could be a game-changer for those using LLMs in prod environments where cost and response time are critical.

We’re eager to hear what you think!

(launching from the Bunny Coworking House in SF - thanks, Henry, Liyen, and Grace, for having us)

throwaway888abc 2 years ago

Looks great, do you have any concrete data how much money it will save ?

Also, how does it compare to for example GptCache[0] ? or any other semantic cache solution[1] ?

[0] https://gptcache.readthedocs.io/en/latest/

[1] https://portkey.ai/blog/reducing-llm-costs-and-latency-seman...

zaisteOP 2 years ago

We are still exploring. We don’t have any concrete data yet, but in some instances, we've observed reductions up to ten times. This seems especially relevant to specific areas, e.g. chatbots, where similar questions happen more often.
- throwaway888abc 2 years ago
  
  >We are still exploring. Fair point. Worth of looking into, is to create/train/tune small model (2b/7b) based on previous cached answers in case your knowledge index/domain is without changes in time.
  Exciting times

Settings

Show HN: Reduce ChatGPT costs 10x with distributed cache for LLMs

Keyboard Shortcuts