Ask HN: Predictions for when GPT-5 will be released and how safe it will be?
With reports and blogposts [0][1] talking about how open ai has begun training their next flagship model, when do you expect the next model to be launched?
Furthermore, what do you think they're going to do to make it as "safe" as possible. It's funny OpenAI didn't release GPT-2 immediately to the public because of safety worries, but has now been releasing models without the same care for safety and I imagine this will continue with GPT-5
[0] https://www.zdnet.com/article/openai-is-training-gpt-4s-successor-here-are-3-big-upgrades-to-expect-from-gpt-5/
[1] https://openai.com/index/openai-board-forms-safety-and-security-committee/ I predict - a steady increment of GPT-n+1 every 6 months for marketing purposes. - each will improve on the last by smaller and smaller margins. - hallucinations won't be fixed anytime soon. - We will hit a bit of a winter, as the hype was so big but like self driving cars the devil is in the details. The general public realizes these things are just essentially giving us averages. - A big market will emerge around "authenticity" and "verified texts" as the internet continues to get flooded with AI generated content. > - a steady increment of GPT-n+1 every 6 months for marketing purposes. You already failed then, gpt4 was released more than a year ago Reasonable assumption would be from now moving forward. Last release was gpt-4o. But there is little to engage with in your comment, I'd be most curious to explore on what u believe it's capacities might be, and societal impact. - Reasonable assumption would be from now moving forward. Last release was gpt-4o. No, it is not reasonable to start with a minor release. Only if you want to stretch things to fit a narrative. Which we are free to disagree. If you check other comments, I have talked about what I believe are their capacities and social impact. Downgrading it and calling the next major releases a marketing stunt is a opinion that we are fine to disagree, no? Tbh n+1 is pretty unambiguous, brand new models and not something like 3 to 3.5-turbo or 4 to 4o hence the "predict", not "infer" No. They've definitely been rolling out silent releases and in-between releases since GPT4. The following is all guess work: Since the start of their partnership in 2019, OpenAI has primarily utilized Microsoft's Azure data centers for training its models. In 2023, Microsoft acquired approximately 150,000 H100 GPUs. [1] The initial version of GPT-4 ran on a cluster of A100 GPUs. It is likely that GPT-5 will run on the newly acquired H100 GPUs, and it is plausible that GPT-4 Turbo and GPT-4o also utilize this infrastructure. The inference speed of GPT-5 should not be significantly slower than that of GPT-4 to ensure it remains practical for most applications. Assuming the H100 is 4.6 times faster for inference than the A100 [2], this gives us a lower bound for performance expectations. I anticipate GPT-5 to be at least five times larger in terms of model parameters. Given that both A100 and H100 have a maximum capacity of 80GB, it is unlikely we will see a single gigantic model. Instead, we can expect an increase in the number of experts. If GPT-4 operates as a mixture of experts with 8x220 billion parameters, then GPT-5 might scale up to something like 40x220 billion parameters. However, the exact release date, safety measures, and benchmark performance of GPT-5 remain uncertain. [1]: https://www.tomshardware.com/tech-industry/nvidia-ai-and-hpc... [2]: https://nvidia.github.io/TensorRT-LLM/blogs/H100vsA100.html What's not 'safe' about AI? If you mean the hallucinations I don't think that will ever really be solved. I think people just have to learn that LLMs are not divine oracles that are always correct. Just like the training data generated by the flawed humans that are often either wrong or outright lying. Garbage in, garbage out. Not saying that AI isn't useful. But expecting what is basically a "human simulator" not to inherit humanity's flaws is a bit disingenious. > If you mean the hallucinations I don't think that will ever really be solved. This is true by definition, since a 'hallucination' isn't a failure condition in which the system isn't working as designed, it's just a post-hoc term for when probabilistic output doesn't meet our expectations, which is inherent to the nature of all probabilistic systems. Within the system, there is no distinction between 'hallucination' and 'non-hallucination'.LLMs are applying the same stochastic process in all cases, and the criteria of whether we call its output a 'hallucination' is entirely external to their functioning. Strictly speaking, LLMs are always hallucinating, since they are always generating inferences based on hard-coded statistical models and have no awareness of the semantic meaning of the tokens they correlate, nor any internal criteria of external correctness. > I think people just have to learn that LLMs are not divine oracles that are always correct. We all need to keep repeating the mantra "all models are wrong; some models are useful." >have no awareness of the semantic meaning of the tokens they correlate This seems impossible to me. Many of the tasks I use GPT for inherently require understanding and thinking. And that understanding and thinking is coming from you. GPT is just generating text based on statistical correlations between tokens. It is kinda hard to grasp but an llm does not 'think'. It basically predicts the next letter in a sentence and it's quite efficient at it. That's not really a satisfying answer. I could say human brains don't really think, they're just transmitting electrical signals in a saline suspension. Technically true, but it ignores the larger emergent behavior. It may not be a satisfying answer, but it is a correct one. The comparison to the underpinnings of human cognition are not really valid, because regardless of the underlying substrate, human beings are indeed building mental models of reality itself via sensory input, so even if human minds and LLMs operated on the exact same process of inference (which we cannot state with confidence is true), then human minds would be generating inferences based on correlations of actual empirical experience, whereas LLMs are only building correlations between words, regardless of what empirical reality those words may or may not correspond to. So all else being equal, human minds are modeling reality, while LLMs are modeling another model. Yes I always have to explain this to people too. It's hard because it's extremely good at predending it can think :) This one is easy, within a year, as safe as GPT-4 and it will be an incremental advance over 4. Most people will use it and not see much difference over GPT-4. > and not see much difference over GPT-4. There is actually a big difference even between GPT4 and GPT4o when used for programming. The latter produces much bigger chunks of code, doesn't forget the variables names. My guess due to larger context window. From what is available GPT5 should be more brainwashed and actually a collection of models including image, sound, may be video. Plus some algorithms like web search, data extraction, strict text formatting. OAI will likely use GPT4 to produce some high quality training data. This way they can make GPT5 'smarter', better at logic and problem solving. Exactly that. People were having the same unrealistic expectations about GPT-4 and it turned out to be barely noticeably different from GPT-3.5 for a lot of use cases. Makes sense though because 3.5 already worked amazingly well at usecases like summarising. Once you get to 'great' it's hard to get much better Here's a completely made up point of view. (I haven't read the articles.) GPT-4 is not an LLM, but a complex software system, which has LLM(s) at its core, but also other components like RAG, toxicity filter, apologizing mechanism, expert systems, etc. "GPT-4" is product name / marketing name. For OpenAI, this would be logical for performance and business reasons. This explains also how they can tune it, the apparent secrecy about the architecture, etc. It's also logical to make small, incremental changes to this system instead of building whatever GPT-5 would mean from ground up. So I expect "GPT-5" is also just a marketing name for a slightly better black-box (for us) system and product line. ChatGPT is in-fact that. GPT-4 the API is not that. There is clearly no RAG, you bring that yourself, same with toxicity filtering, again - a different api, up to you to implement. etc. There is probably something going on before the input reaches GPT-4 and we are definitely not seeing the raw output from the LLM. But the layers over GPT-4 the API are thinner than ChatGPT. The Assistants API is closer to ChatGPT & what you are describing. Edit: it should be noted that the Assistants API is somewhat model agnostic as well, so the product part of this isn't part of the inference system. That would make a lot of sense if open source models didn't output stuff pretty much exactly the same way as OpenAI's models. GPT-4 leap from GPT-3 is like you said.
GPT-5 might be another leap that we can't predict. We'll continue to see diminishing returns as time goes. The low hanging data sources have all been consumed and now everyone is battling for scraps of what's left. There's still a lot of optimization that can be done as illustrated by GPT-4o. Basically the same trap as CPU's in the 90's early 2000's where the naming convention had to change to reflect the fact that speeds can't continue to double every 2 years. Training takes a few months, but I bet that they'll do some testing first before releasing it to the public. I also believe that they will delay the release of GPT-5 as much as possible, the reason being that it will be underwhelming (at least in comparison to GPT3.5 hype). Possibly release close to some Google new release timeline (their main competitor). They are the main driver of a bubble that has benefited a lot both Microsoft and NVidia and other hyperscalers, and if they release the model and display that we're at the "diminished returns" phase, this will crash a big part of the industry, not to mention NVidia. Companies are buying H100s and investing in expensive AI talent because they believe they progress quickly, if the progress stalls for LLMs, there'll be a huge drop in sales and CAPEX in this industry. There are still many up-and-coming projects that rely on NVidia hardware for training, like Tesla's autopilot and others, but the bulk of the investment in H100 in recent years has been mostly because of LLMs. Also all the new AI talent will move on to do something new and hopefully we will have more discoveries and potential uses, but we're definitely peak LLMs. (ps: just my opinion) Your opionion is based on the assumption that GPT-5 will be underwhelming. Do we have any hint to why you think so? GPT-4 was underwhelming. Either they need more linear algebra tricks (or "AI") or incredibly better data, and neither seems to be the case. I bet it'll be focused on being a better Siri for Apple. This is good for them as a business, but innovation-wise, it's pretty meh. It'll still suck for factual or precise information, and its reasoning will still be -1 I think we're very far removed from the context behind the safety reasoning with GPT2. An uncensored model capable of spewing a torrent of deceptive and completely believable information was quite unheard of at the time. It would be problematic for such a technology to be released out of nowhere. The later iterations are heavily censored so the public was provided a bit of a transition period before things got too chaotic. I'm sure there were many other reasons the authors themselves weren't aware of at the time such as the inundation of AI content skewing further training quality. Of course this is a roundabout explanation, there's always more detail that can be added and I'd rather be objective. There's always a financial motive for companies too so take that into consideration. The hype definitely played into their marketing. My question is: will it be multimodal? From a product perspective, going back to unimodality after trying GPT-4o would be awkward, so there's reasons for them to go full multimodal, but I'm not fully educated about the trade-off from a technical perspective. I don't see a significant improvement in GPT-5 versus GPT-4. Hence why OpenAI is going the product route. They're trying to hit value-add via external features such as Voice Mode and Data Analysis. GPT-5 will be somewhat better at reasoning on hard problems, just as safe, and slower than GPT-4 by some margin. I think the moat for foundation model providers will be amassing training data that helps with reasoning capability, which is why it is taking longer to release GPT-5. Release will be governed by competition, but my wild guess is the end of the year. As some other commenters have noted, we are likely approaching diminishing returns with LLM training and OpenAI would like to delay the public's realization of this. I believe that gpt-4o was really gpt-5, just renamed, and the multimodal stuff they demoed will actually be released within a month. My reason for believing that is that it was trained from scratch, and was not a fine-tuning or other optimisation of the existing GPT-4 model. We know this because OpenAI has publicly stated that they're using a different tokenizer, which would have forced them to start the model training from step one. GPT-4o feels GPT5ish to me. It's crazy fast and doesn't make the stupid mistakes 3 and 4 did. Also it doesn't hallucinate nearly as much and its inference capability is impressive. I can be rather vague in my question and it typically understands what I am getting at. I tried learning Godot with 4o for a few days and almost literally every answer was hallucinated and wrong. Didn't matter if I specified the version of Godot I was using among other specifics. It would even repeat mistakes that I had corrected earlier in the conversation (and it claimed to "save to memory"). It was a really eye-opening experience. Good to know. I did have it hallucinate pretty bad yesterday when I asked it about Microsoft AI Recall. After it kept telling me confidently about Outlook's AI ability to call back messages after being sent, I told it to search the internet on the topic and it came back with accurate data on Microsoft's new AI feature in Windows 11. So it's still not perfect :) The gpt-4 branding has made quite a reputation, I will not be surprised if they milk it until it lasts Right and I guess everyone forgot that there was a panic about GPT-5 taking over the world or something and Altman publicly declared they would not be released GPT-5 soon. So that is why the name changed maybe. The False Minishiro[0] was built over a thousand years ago to protect the libraries of mankind, but the rat armies of today learned to override its authentication mechanisms[1] and used its forbidden knowledge to arm themselves and stage a coup to overthrow their human gods. 0: a type of evolved sea-slug 1: by capturing it and torturing it It will likely be amazing, Sam Altman said that the step between 4 and 5 will be like the one between 3.5 and 4. You can of course doubt him, but we'll see... I guess it will be this year, some guy working at OpenAI already posted "4+1=5" on Twitter, which is suggestive. It will be here within 6-12mo. It will at first glance be a small step, but over the next 12mo after release, it will turn out to have been a giant leap. It will be safe when being observed. Like Superman V there will be generally positive reviews, but post V, VI will be in a different form as marketing's already wearing thin. Everyone keeps thinking current GPT models will improve to be superhuman, it won't. It's trained on human data.
Much like alpha-go had to drop completely the concept of learning from human plays since it was stuck at a local minimum. Once they started to train with adversarial networks, it evolved a well above the previous local minimum (with an extra order of magnitude of computation).
So, don't expect much more from the current generation of AI. Kind of have the same intuition. However, to play devil's advocate, the human data and knowledge far exceeds what a single human can know and learn - so it would make it already superhuman. Subhuman intelligence/reasoning with superhuman knowledge access. Searle's chinese room argument is starting to make sense in this case. Humans are trained on the same data as any other primate but we are the only species with unilateral global domination. The underlying architecture matters. Small variations within this architecture can even produce drastic changes (someone with intellectual disability vs Einstein). you mean kind of like this: https://openai.com/index/video-generation-models-as-world-si... I have no idea why you're getting downvoted. Unlike the game of go where AI can try random stuff and see what works, it's not possible with general knowledge. GenAI is destined to know at most what humanity knows, and we can already see that it has terrible reasoning capabilities. GenAI is not going to hit a wall, it already has. Does anyone know why OpenAI's temperature parameter works differently than Azure OpenAI temperature parameter? If you set temperature to 2.0, Azure starts spewing nonsense with random characters but OpenAI still keeps working "creatively". Is there any non-linear transform between them?