Nemotron-4-340B
blogs.nvidia.com> The Nemotron-4 340B family includes base, instruct and reward models that form a pipeline to generate synthetic data used for training and refining LLMs.
I feel like everyone is missing this from the announcement. They explicitly are releasing this to help generate synthetic training data. Most big models and APIs have clauses that ban its use to improve other models. Sure it maybe can compete with other big commercial models at normal tasks, but this would be a huge opportunity for ML labs and startups to expand training data of smaller models.
Nvidia must see a limit to the growth of new models (and new demand for training with their GPUs) based on the availability of training data, so they're seeking to provide a tool to bypass those restrictions.
All for the low price of 2x A100s...
> Most big models and APIs have clauses that ban its use to improve other models.
I will never get over the gall of anything and everything being deemed fair game to use as training data for a model, except you're not allowed to use the output of a model to train your own model without permission, because model output has some kind of exclusive super-copyright apparently.
> because model output has some kind of exclusive super-copyright apparently
Well, its not copyright that is being used to forbid this, its terms of service, but yea, it is quite a hypocrisy.
It's likely unenforceable since there is no copyright and copying it to someone else not in the contract trivially bypasses it. Still hypocritical nonsense though.
They are just saying they will close your account if they catch you and feel like it.
>They explicitly are releasing this to help generate synthetic training data
Synthetic training data is basically free money for NVidia; there's only a fixed amount of high-quality original data around, but there's a potential for essentially infinite synthetic data, and more data means more training hours means more GPU demand.
GIGOaaS
This is (possibly) a GPT-4 level dense model with an open source license. Nvidia has released models with issues before, but reports on this so far indicate it's a solid contender without any of the hiccups of previous releases.
A 340B model should require around 700GB vram or ram to run inference. To train or finetune, you're looking at almost double, which is probably why Nvidia recommends 2xA100 nodes with 1.28TB vram.
Jensen Huang is the king of AI summer.
I wonder if the open-source LLM community understands what just happened here - we finally got a truly large LLM (a whopping 340B!) but it costs ... $15K per A100 x 16 GPUs = a minimum of $240K to just get started. Probably closer to $500K or half a million dollars once you factor in space, power, cooling, infrastructure etc.
You could probably run it as a Q4 (definitely as a Q3) on 4 x A6000 (so on a $25K workstation), although you'd probably also be looking about 3-4 tok/s text generation. I do think that it's a big landmark to have a true GPT4-class model (with some questionable RL though from my initial testing). The best thing about it is that it's almost certainly now the strongest model available for generating synthetic data without any licensing restrictions.
Funnily enough, I don't think it's actually the most interesting model that Nvidia released this week. Nvidia also published this paper https://arxiv.org/abs/2406.07887 and released https://huggingface.co/nvidia/mamba2-hybrid-8b-3t-128k (Apache 2.0 licensed, to boot). It looks like it matches (and sometimes even edges out) Transformer performance, while having linear scaling for context length. Can't wait for a scaled up version of this.
Nvidia also released a top-notch Llama3 70B SteerLM reward model as well (although RLHFlow/ArmoRM-Llama3-8B-v0.1 might still be a better choice).
Or you could run it quantized on about $6k worth of 192GB Mac Studio — probably not that fast.
How would a server/workstation like this be setup?
I thought you could only use the vram on the GPU, so for 700GB you would need 8-9 A100 nodes as 2 only gives 160GB.
I've been trying to figure out how to build a local system to run inference and train on top of LLM models, I thought there was no way to add vram to a system outside of adding more and more GPU's or use system ram (DDR5) even though that would be considerably slower.
An A100 node has 8 A100s in it, each with 80GB, which is how they got the 1.28TB number 2 * (80 * 8).
With CPU inference you just need a server with 1.28TB RAM. Yes, the inference will be super slow, but it is more realistic than to spend 100k+ dollars for A100 clusters with 1.28TB VRAM.
One example: HP DL580 Gen8. Use the 32GB PC3L-14900L LRDIMMs (HP PN 715275-001; 712384-001, 708643-B21) for a maximum of 3TB. You can get the LRDIMMs in the $32-$45 range on the second-hand market.
I was thinking the same. Jart has gotten very impressive performance out of her 8 channel Zen 4 Threadripper Pro 7995WX (Storm peak). I'm using a Zen 3 TR Pro (Chagall) myself.
do you mean 1.28 TB?
Yes, thank you for catching that!
It would only require 4x AMD MI300x.
The "open" and "permissive" license has an interesting section on "AI Ethics":
> AI Ethics. NVIDIA is committed to safety, trust and transparency in AI development. NVIDIA encourages You to (a) ensure that the product or service You develop, use, offer as a service or distributes meets the legal and ethical requirements of the relevant industry or use case, (b) take reasonable measures to address unintended bias and to mitigate harm to others, including underrepresented or vulnerable groups, and (c) inform users of the nature and limitations of the product or service. NVIDIA expressly prohibits the use of its products or services for any purpose in violation of applicable law or regulation, including but not limited to (a) illegal surveillance, (b) illegal collection or processing of biometric information without the consent of the subject where required under applicable law, or (c) illegal harassment, abuse, threatening or bullying of individuals or groups of individuals or intentionally misleading or deceiving others
https://developer.download.nvidia.com/licenses/nvidia-open-m...
Besides limiting the freedom of use (making it less "open" in my eyes), it's interesting that they tell you to meet "ethical requirements of the relevant industry or use case". Seems like that'd be super hard to pin down in a precise way.
I read that as "NVIDIA encourages you to be ethical and prohibits breaking the law. That doesn't seem so bad to me. What is bad, however, is section 2.1.
> 2.1 ... If You institute ... litigation against any entity ... alleging that the Model or a Derivative Model constitutes direct or contributory copyright or patent infringement, then any licenses granted to You under this Agreement for that Model or Derivative Model will terminate...
If you sue or file a copyright claim that the model violates copyright, you lose your license to use the model. That's a really weird restriction, I'm not sure what the point is.
The point is: if you sue claiming this model breaks the law, you lose your license to use it.
Apache 2.0 has a similar restriction: “ If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed.”
True, although it's unusual to see it for copyright not patents.
That said, the far bigger issue is the end of the same clause 2.1:
> NVIDIA may update this Agreement to comply with legal and regulatory requirements at any time and You agree to either comply with any updated license or cease Your copying, use, and distribution of the Model and any Derivative Model
Oh, I didn't realize that it was a standard term. I'm sure there's a good motivation then, it doesn't seem so bad.
Sounds reasonable to me. If you are going to claim in court the the model is illegal then why exactly are you using it?
It says "NVIDIA encourages You to..."
Which, in terms of a contract, means absolutely nothing at all.
I'm not sure what business GP is in, but being encouraged not to be unethical and explicitly forbidding illegal activity doesn't seem like much of an infringement on one's freedom more than the applicable laws. I guess being arrested for crimes is one thing, but having a license revoked on top of that is just one step too far?
Google famously removed "don't be evil" because lawyers pushed back on who gets to define evil. I can imagine same logic applies here: Nvidia isn't about to define objective morality, so the best alternative is to ask people to try their best.
Very weaselly worded. Some things that appear to be allowed:
Ie, state sanctioned killbots are just fine!* intended bias * legal surveillance * legal collection of biometrics without consent * legal harrassmentNo copyright license is going to stop a state from using the model for the military use that they really need. First of all, I’m pretty sure most countries have laws allowing the state to ignore copyright in the case of national defense. More importantly, power does what it wants and what it can get away with.
It's good they have included this clause, despite it being difficult to legally pin down. Hopefully, there will be a lawsuit at some point which will create some ethical boundaries that AI developers and users much not cross.
it's 5x the price of llama3/qwen2 70b. the performance on the benchmark is similar. but with 70b you can break a task in steps and do 5+ steps. doesn't seem like it is worth it in general cases for the price. is 340 better for synthetic data generation (which is my primary usecase) are there tests for that? seems like synthetic data would benefit from multi step reasoning and reduction of hallucination and in those tests, the difference is small.
3 models are included: base, instruct, and reward. All under license permitting synthetic data generation and commercial use.
Has anyone runs evaluations to compare the instruct version with gpt-4o or llama3-70b etc.? It's so much larger than the leading open source models. So one would hope it would perform significantly better?
Or is this in one of the chat arenas or whatever? Very curious to see some numbers related to the performance.
But if it's at least somewhat better than the existing open source models then that is a big boost for open source training and other use cases.
this is june-chatbot model currently running on chatbot arena from lmsys
https://d1qx31qr3h6wln.cloudfront.net/publications/Nemotron_...
"...Nemotron-4-340B-Base was trained using 768 DGX H100 nodes"
That is 350 million dollars for you...Poor Startups, better have a rich sponsor.
I'm so confused.
Isn't "training LLMs on LLM output" the very definition of "model collapse" or "model poisoning"?
The claim (which is not uncontested, I should add) is that doing so repeatedly inevitably produces model collapse. Even if that is true, however, you can still derive benefit from using larger models to generate large amounts of synthetic training data for smaller models. Most LLaMA finetunes out there are trained on GPT-4 output, for example.
"...and were sized to fit on a single DGX H100 with 8 GPUs when deployed in FP8 precision"
OK I see the goal is to sell more H100s, they made it big enough so it's not compatible with a cheaper GPU
"Nemotron-4-340B-Instruct is a chat model intended for use for the English language" - frustrating
What is it? Is it an llms or what?
Oh NVIDIA released an open weights 340 billion parameter LLM!
It should be the biggest open weights to date I think (Grok 314b).
It's trained on 8 trillion tokens, and some benchmarks show it does better than or equal to GPT-4o!
They released 3 checkpoints - the base, the instruct and a reward aligned model.
See https://huggingface.co/collections/nvidia/nemotron-4-340b-66... for all the checkpoints
Why does nvidia release models that compete with its customers businesses but don’t make any money for nvidia?
Are they commodotising their complements?
> [commoditizing] their complements
That's exactly what this would be.
> compete with its customers businesses
I suspect most of their business comes from a few massive corporate spenders, not a "long tail" of smaller businesses, so it seems like a questionable goal to disrupt those customers without a clear path to new customers. Then again, few have the resources to run this model, so I guess this just ensures that their big customers are all working with some floor in model size? Probably won't impact anything realistically.
Nvidia offers AI Enterprise suite with NeMo, NIMS and many other services and consultancy to enterprise customers. These customers than can either use any AI models or Nvidia models.
Nvidia has no intention to earn money on models but to offer foundation models and extending their SW products which require their HW platform.
Basically, just like CUDA costs you nothing, it costs you nothing to use Nvidia models. And since you're on it you might want to use Nvidia HW for better performance and then you might want security and get interested in Nvidia SW enterprise.
They target this model at generating synthetic data. Data is the lifeblood of LLM training; quality synthetic data means more training can occur which means more demand for GPUs.
The model is big enough that you need expensive Nvidia GPUs to run it effectively