Nvidia Nemotron Nano v2

research.nvidia.com

8 points by bcatanzaro 4 months ago · 1 comment

Reader

slacka 4 months ago

Very interesting model. Some key points from the blog:

* NVIDIA is also releasing most of the data they used to create it, including the pretraining corpus

* The model uses a hybrid architecture consisting primarily of Mamba-2 and MLP layers combined with just four Attention layers. For the architecture, please refer to the Nemotron-H tech report. The model was trained using Megatron-LM and NeMo-RL.

At this size and with only 4 attention layers, it should run very fast locally on cheap 12GB GPUs.

Settings

Nvidia Nemotron Nano v2

Keyboard Shortcuts