Understanding VRAM Requirements to Train/Inference with Large Language Models (LLMs)

Press enter or click to view image in full size

Image of DGX H100 from NVIDIA

In the ever-evolving landscape of artificial intelligence, Large Language Models (LLMs) have become pivotal in shaping the future of natural language processing tasks. These sophisticated models, however, come at a cost — a significant demand for computational resources. Among these, one of the critical components is Video Random Access Memory (VRAM), which plays a crucial role in the training process.

In this article, we will delve into the intricacies of calculating VRAM requirements for training Large Language Models. Whether you are an AI enthusiast, a data scientist, or a researcher, understanding how VRAM impacts the training of LLMs is essential for optimizing performance and ensuring efficient utilization of hardware resources.

Read without Medium Membership here.

Formula to Calculate activations in Transformer Neural Network

This paper "Reducing Activation Recomputation in Large Transformer Models" has good information on calculating the size of a Transformer layer.

Activations per layer = s*b*h*(34 +((5*a*s)/h))

Where,
b: batch size
s: sequence length
l: layers
a: attention heads
h: hidden dimensions