Settings

Theme

Estimating required GPU memory for serving LLMs

substratus.ai

2 points by samosx 2 years ago · 2 comments

Reader

samosxOP 2 years ago

Having a hard time with estimating how much GPU memory that LLM needs to serve it? What kind of GPUs to use and how many?

Wrote a blog post to demystify the process of GPU memory usage estimating.

  • brianjking 2 years ago

    My issue is figuring out how to identify how many concurrent users you can support on average on a given GPU.

    Understanding the vram to simply load the weights is easy enough. When you are allowing for something like content generation with varying lengths of input/output tokens, how do you even begin to identify the GPUs you need?

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection