Nemotron 3 Ultra: Open Moe Hybrid Mamba-Transformer for Agentic Reasoning [pdf]
research.nvidia.comIs this the one from Jensens Computex presentation the other day?
It is significantly bigger than Qwen for the same level of intelligence, but I think the key strength was inference speed.
This model seems like a really big deal. Is this the biggest Western open-source AI model in the world (beating out Llama3 405B)?