Paragraph Pollution: AI is (probably) greener than you typing on a laptop
blog.plinth.org.ukRandom text generator is even greener than AI and me typing. Does not mean that result is understandable or even coherent.
Pretty shallow analysis.
How much energy was used for training chat gpt 2,3 and 4?
Also
> We’ll assume that AI models have roughly the same split of operating costs as a typical data centre
I don’t think this is a good assumption. GPT is not a typical application, it requires a massive amount of power hungry GPUs.
It’d be better to compare the power cost to non-asic crypto mining farms.
That's fair about ignoring the training cost. I did write a bit more going into that in a follow up piece here: https://notfunatparties.substack.com/p/ai-is-good-for-the-pl...
Do you have any better sources for the power usage stats? It would be good to get a bit closer on that front. Having said that, even if the cost share is closer to 80%, that still puts it on par with a laptop for an average person.
Well openai has about 30k A100s https://www.tomshardware.com/news/chatgpt-nvidia-30000-gpus
What’s the power consumption on that assuming full load at all times?
Also, I would expect openai to be taking a loss on each individual inference request as they also have a monthly fee, dalle, and loads of VC capital.
No source for that though, I just wouldn’t assume that they’re breaking even
I can definitely imagine they're not covering the amortised cost of the training with the cost per individual inference request. It seems less likely to me that they're making a significant loss on each subsequent request, but again no source from me on that either.
Looking a bit more into this, I found this paper: https://arxiv.org/pdf/2311.16863.pdf. It references a table saying that text generation uses 0.047 kWh per 1000 inferences, which is 1-2 orders of magnitude lower than my estimate. Though that is for GPT2, so possibly tracks to something roughly in the ~0.001 kWh per inference for GPT3.5.
Well doesn’t the compute time for transformers scale roughly quadratically with model size?
Would it make sense for power consumption to also scale roughly quadratically?
I'm not sure. The figures I've seen suggest that GPT3 required 10x more energy to train than GPT2 (e.g. https://www.nnlabs.org/power-requirements-of-large-language-....), so I think a roughly 1-2 order of magnitude increase in energy usage from GPT2 to GPT3.5 makes sense.
Do people actually use gpt-3.5 turbo? Full-fat gpt-4 is 40 times more expensive...
It's the default for the free version of ChatGPT no? That's what the majority of people use.
So, I don't use this stuff, but every time I see someone complaining about it doing something stupid, the response they get tends to be "that's because it's GPT-3, everyone uses GPT-4 now"; I took this on face value.
I think it's a case of tech bubble vs the rest of the world. Most people are not subscribing to the paid version of ChatGPT, but a lot of people who spend a lot of time with these things are.