GPT-4.1 in the API

openai.com

680 points by maheshrijal 12 days ago


lxgr - 11 days ago

As a ChatGPT user, I'm weirdly happy that it's not available there yet. I already have to make a conscious choice between

- 4o (can search the web, use Canvas, evaluate Python server-side, generate images, but has no chain of thought)

- o3-mini (web search, CoT, canvas, but no image generation)

- o1 (CoT, maybe better than o3, but no canvas or web search and also no images)

- Deep Research (very powerful, but I have only 10 attempts per month, so I end up using roughly zero)

- 4.5 (better in creative writing, and probably warmer sound thanks to being vinyl based and using analog tube amplifiers, but slower and request limited, and I don't even know which of the other features it supports)

- 4o "with scheduled tasks" (why on earth is that a model and not a tool that the other models can use!?)

Why do I have to figure all of this out myself?

modeless - 12 days ago

Numbers for SWE-bench Verified, Aider Polyglot, cost per million output tokens, output tokens per second, and knowledge cutoff month/year:

             SWE  Aider Cost Fast Fresh
 Claude 3.7  70%  65%   $15  77   8/24
 Gemini 2.5  64%  69%   $10  200  1/25
 GPT-4.1     55%  53%   $8   169  6/24
 DeepSeek R1 49%  57%   $2.2 22   7/24
 Grok 3 Beta ?    53%   $15  ?    11/24
I'm not sure this is really an apples-to-apples comparison as it may involve different test scaffolding and levels of "thinking". Tokens per second numbers are from here: https://artificialanalysis.ai/models/gpt-4o-chatgpt-03-25/pr... and I'm assuming 4.1 is the speed of 4o given the "latency" graph in the article putting them at the same latency.

Is it available in Cursor yet?

swyx - 12 days ago

don't miss that OAI also published a prompting guide WITH RECEIPTS for GPT 4.1 specifically for those building agents... with a new recommendation for:

- telling the model to be persistent (+20%)

- dont self-inject/parse toolcalls (+2%)

- prompted planning (+4%)

- JSON BAD - use XML or arxiv 2406.13121 (GDM format)

- put instructions + user query at TOP -and- BOTTOM - bottom-only is VERY BAD

- no evidence that ALL CAPS or Bribes or Tips or threats to grandma work

source: https://cookbook.openai.com/examples/gpt4-1_prompting_guide#...

omneity - 11 days ago

I have been trying GPT-4.1 for a few hours by now through Cursor on a fairly complicated code base. For reference, my gold standard for a coding agent is Claude Sonnet 3.7 despite its tendency to diverge and lose focus.

My take aways:

- This is the first model from OpenAI that feels relatively agentic to me (o3-mini sucks at tool use, 4o just sucks). It seems to be able to piece together several tools to reach the desired goal and follows a roughly coherent plan.

- There is still more work to do here. Despite OpenAI's cookbook[0] and some prompt engineering on my side, GPT-4.1 stops quickly to ask questions, getting into a quite useless "convo mode". Its tool calls fails way too often as well in my opinion.

- It's also able to handle significantly less complexity than Claude, resulting in some comical failures. Where Claude would create server endpoints, frontend components and routes and connect the two, GPT-4.1 creates simplistic UI that calls a mock API despite explicit instructions. When prompted to fix it, it went haywire and couldn't handle the multiple scopes involved in that test app.

- With that said, within all these parameters, it's much less unnerving than Claude and it sticks to the request, as long as the request is not too complex.

My conclusion: I like it, and totally see where it shines, narrow targeted work, adding to Claude 3.7 - for creative work, and Gemini 2.5 Pro for deep complex tasks. GPT-4.1 does feel like a smaller model compared to these last two, but maybe I just need to use it for longer.

0: https://cookbook.openai.com/examples/gpt4-1_prompting_guide

marsh_mellow - 12 days ago

From OpenAI's announcement:

> Qodo tested GPT‑4.1 head-to-head against Claude Sonnet 3.7 on generating high-quality code reviews from GitHub pull requests. Across 200 real-world pull requests with the same prompts and conditions, they found that GPT‑4.1 produced the better suggestion in 55% of cases. Notably, they found that GPT‑4.1 excels at both precision (knowing when not to make suggestions) and comprehensiveness (providing thorough analysis when warranted).

https://www.qodo.ai/blog/benchmarked-gpt-4-1/

pbmango - 11 days ago

I think an under appreciated reality is that all of the large AI labs and OpenAI in particular are fighting multiple market battles at once. This is coming across in both the number of products and the packaging.

1, to win consumer growth they have continued to benefit on hyper viral moments, lately that was was image generation in 4o, which likely was technically possible a long time before launched. 2, for enterprise workloads and large API use, they seem to have focused less lately but the pricing of 4.1 is clearly an answer to Gemini which has been winning on ultra high volume and consistency. 3, for full frontier benchmarks they pushed out 4.5 to stay SOTA and attract the best researchers. 4, on top of all they they had to, and did, quickly answer the reasoning promise and DeepSeek threat with faster and cheaper o models.

They are still winning many of these battles but history highlights how hard multi front warfare is, at least for teams of humans.

simonw - 12 days ago

Here's a summary of this Hacker News thread created by GPT-4.1 (the full sized model) when the conversation hit 164 comments: https://gist.github.com/simonw/93b2a67a54667ac46a247e7c5a2fe...

I think it did very well - it's clearly good at instruction following.

Total token cost: 11,758 input, 2,743 output = 4.546 cents.

Same experiment run with GPT-4.1 mini: https://gist.github.com/simonw/325e6e5e63d449cc5394e92b8f2a3... (0.8802 cents)

And GPT-4.1 nano: https://gist.github.com/simonw/1d19f034edf285a788245b7b08734... (0.2018 cents)

elashri - 12 days ago

Are there any benchmarks or someone who did tests of performance of using this long max token models in scenarios where you actually use more of this token limit?

I found from my experience with Gemini models that after ~200k that the quality drops and that it basically doesn't keep track of things. But I don't have any numbers or systematic study of this behavior.

I think all providers who announce increased max token limit should address that. Because I don't think it is useful to just say that max allowed tokens are 1M when you basically cannot use anything near that in practice.

minimaxir - 12 days ago

It's not the point of the announcement, but I do like the use of the (abs) subscript to demonstrate the improvement in LLM performance since in these types of benchmark descriptions I never can tell if the percentage increase is absolute or relative.

999900000999 - 12 days ago

Have they implemented "I don't know" yet.

I probably spend 100$ a month on AI coding, and it's great at small straightforward tasks.

Drop it into a larger codebase and it'll get confused. Even if the same tool built it in the first place due to context limits.

Then again, the way things are rapidly improving I suspect I can wait 6 months and they'll have a model that can do what I want.

taikahessu - 12 days ago

> They feature a refreshed knowledge cutoff of June 2024.

As opposed to Gemini 2.5 Pro having cutoff of Jan 2025.

Honestly this feels underwhelming and surprising. Especially if you're coding with frameworks with breaking changes, this can hurt you.

runako - 12 days ago

ChatGPT currently recommends I use o3-mini-high ("great at coding and logic") when I start a code conversation with 4o.

I don't understand why the comparison in the announcement talks so much about comparing with 4o's coding abilities to 4.1. Wouldn't the relevant comparison be to o3-mini-high?

4.1 costs a lot more than o3-mini-high, so this seems like a pertinent thing for them to have addressed here. Maybe I am misunderstanding the relationship between the models?

comex - 11 days ago

Sam Altman wrote in February that GPT-4.5 would be "our last non-chain-of-thought model" [1], but GPT-4.1 also does not have internal chain-of-thought [2].

It seems like OpenAI keeps changing its plans. Deprecating GPT-4.5 less than 2 months after introducing it also seems unlikely to be the original plan. Changing plans is necessarily a bad thing, but I wonder why.

Did they not expect this model to turn out as well as it did?

[1] https://x.com/sama/status/1889755723078443244

[2] https://github.com/openai/openai-cookbook/blob/6a47d53c967a0...

vinhnx - 12 days ago

• Flagship GPT-4.1: top‑tier intelligence, full endpoints & premium features

• GPT-4.1-mini: balances performance, speed & cost

• GPT-4.1-nano: prioritizes throughput & low cost with streamlined capabilities

All share a 1 million‑token context window (vs 120–200k on 4o-o3/o1), excelling in instruction following, tool calls & coding.

Benchmarks vs prior models:

• AIME ’24: 48.1% vs 13.1% (~3.7× gain)

• MMLU: 90.2% vs 85.7% (+4.5 pp)

• Video‑MME: 72.0% vs 65.3% (+6.7 pp)

• SWE‑bench Verified: 54.6% vs 33.2% (+21.4 pp)

ZeroCool2u - 12 days ago

No benchmark comparisons to other models, especially Gemini 2.5 Pro, is telling.

kristianp - 11 days ago

Looks like the Quasar and Optimus stealth models on Openrouter were in fact GPT-4.1. This is what I get when I try to access the openrouter/optimus-alpha model now:

    {"error":
        {"message":"Quasar and Optimus were stealth models, and 
        revealed on April 14th as early testing versions of GPT 4.1. 
        Check it out: https://openrouter.ai/openai/gpt-4.1","code":404}
osigurdson - 11 days ago

Sam made a strange statement imo in a recent Ted Talk. He said (something like) models come and go but they want to be the best platform.

For me, it was jaw dropping. Perhaps he didn't mean it the way it sounded, but seemed like a major shift to me.

clbrmbr - 11 days ago

The deprecation of GPT-4.5 makes me sad. It's an amazing model with great world-knowledge and subtly. It KNOWS THINGS that, on a quick experiment, 4.1 just does not. 4.5 could tell me what I would see from a random street corner in New Jersey, or how to use minor features of my niche API (well, almost), and it could write remarkably. But 4.1 doesn't hold a candle to it. Please, continue to charge me $150/1M tokens. Sometimes you need a Big Model. Tells me it was costing more than $150/1M to serve (!).

miki123211 - 11 days ago

Most of the improvements in this model, basically everything except the longer context, image understanding and better pricing, are basically things that reinforcement learning (without human feedback) should be good at.

Getting better at code is something you can verify automatically, same for diff formats and custom response formats. Instruction following is also either automatically verifiable, or can be verified via LLM as a judge.

I strongly suspect that this model is a GPT-4.5 (or GPT-5???) distill, with the traditional pretrain -> SFT -> RLHF pipeline augmented with an RLVR stage, as described in Lambert et al[1], and a bunch of boring technical infrastructure improvements sprinkled on top.

[1] https://arxiv.org/abs/2411.15124

muzani - 11 days ago

The real news for me is GPT 4.5 being deprecated and the creativity is being brought to "future models" and not 4.1. 4.5 was okay in many ways but it was absolutely a genius in production for creative writing. 4o writes like a skilled human, but 4.5 can actually write a 10 minute scene that gives me goosebumps. I think it's the context window that allows for it to actually build up scenes to hammer it down much later.

Tiberium - 12 days ago

Very important note:

>Note that GPT‑4.1 will only be available via the API. In ChatGPT, many of the improvements in instruction following, coding, and intelligence have been gradually incorporated into the latest version

If anyone here doesn't know, OpenAI does offer the ChatGPT model version in the API as chatgpt-4o-latest, but it's bad because they continuously update it so businesses can't reliably rely on it being stable, that's why OpenAI made GPT 4.1.

sharkjacobs - 11 days ago

    > You're eligible for free daily usage on traffic shared with OpenAI through April 30, 2025.
    > Up to 1 million tokens per day across gpt-4.5-preview, gpt-4.1, gpt-4o and o1
    > Up to 10 million tokens per day across gpt-4.1-mini, gpt-4.1-nano, gpt-4o-mini, o1-mini and o3-mini
    > Usage beyond these limits, as well as usage for other models, will be billed at standard rates. Some limitations apply. 
I just found this option in https://platform.openai.com/settings/organization/data-contr...

Is just this something I haven't noticed before? Or is this new?

NewUser76312 - 11 days ago

As a user I'm getting so confused as to what's the "best" for various categories. I don't have time/want to dig into benchmarks for different categories, look into the example data to see which best maps onto my current problems.

The graphs presented don't even show a clear winner across all categories. The one with the biggest "number", GPT-4.5, isn't even in the best in most categories, actually it's like 3rd in a lot of them.

This is quite confusing as a user.

Otherwise big fan of OAI products thus far. I keep paying $20/mo, they keep improving across the board.

nikcub - 12 days ago

Easy to miss in the announcement that 4.5 is being shut down

> GPT‑4.5 Preview will be turned off in three months, on July 14, 2025

frognumber - 12 days ago

Marginally on-topic: I'd love if the charts included prior models, including GPT 4 and 3.5.

Not all systems upgrade every few months. A major question is when we reach step-improvements in performance warranting a re-eval, redesign of prompts, etc.

There's a small bleeding edge, and a much larger number of followers.

theturtletalks - 12 days ago

With these being 1M context size, does that all but confirm that Quasar Alpha and Optimus Alpha were cloaked OpenAI models on OpenRouter?

pcwelder - 12 days ago

Did some quick tests. I believe its the same model as Quasar. It struggles with agentic loop [1]. You'd have to force it to do tool calls.

Tool use ability feels ability better than gemini-2.5-pro-exp [2] which struggles with JSON schema understanding sometimes.

Llama 4 has suprising agentic capabilities, better than both of them [3] but isn't as intelligent as the others.

[1] https://github.com/rusiaaman/chat.md/blob/main/samples/4.1/t...

[2] https://github.com/rusiaaman/chat.md/blob/main/samples/gemin...

[3] https://github.com/rusiaaman/chat.md/blob/main/samples/llama...

impure - 12 days ago

I like how Nano matches Gemini 2.0 Flash's price. That will help drive down prices which will be good for my app. However I don't like how Nano behaves worse than 4o Mini in some benchmarks. Maybe it will be good enough, we'll see.

exizt88 - 12 days ago

For conversational AI, the most significant part is GPT-4.1 mini being 2x faster than GPT-4o at basically the same reasoning capabilities.

porphyra - 12 days ago

pretty wild versioning that GPT 4.1 is newer and better in many regards than GPT 4.5.

oofbaroomf - 12 days ago

I'm not really bullish on OpenAI. Why would they only compare with their own models? The only explanation could be that they aren't as competitive with other labs as they were before.

jmkni - 12 days ago

The increased context length is interesting.

It would be incredible to be able to feed an entire codebase into a model and say "add this feature" or "we're having a bug where X is happening, tell me why", but then you are limited by the output token length

As others have pointed out too, the more tokens you use, the less accuracy you get and the more it gets confused, I've noticed this too

We are a ways away yet from being able to input an entire codebase, and have it give you back an updated version of that codebase.

starchild3001 - 11 days ago

I feel there's some "benchmark-hacking" is going on with GPT4.1 model as its metrics on livebench.com aren't all that exciting.

- It's basically GPT4o level on average.

- More optimized for coding, but slightly inferior in other areas.

It seems to be a better model than 4o for coding tasks, but I'm not sure if it will replace the current leaders -- Gemini 2.5 Pro, o3-mini / o1, Claude 3.7/3.5.

elAhmo - 11 days ago

Company worth hundreds of billions of dollars, on paper at least, has one of the worst naming schemes for their products in the recent history.

Sam acknowledged this a few months ago, but with another release not really bringing any clarity, this is getting ridiculous now.

ComputerGuru - 12 days ago

The benchmarks and charts they have up are frustrating because they don’t include 03-mini(-high) which they’ve been pushing as the low-latency+low-cost smart model to use for coding challenges instead of 4o and 4o-mini. Why won’t they include that in the charts?

bartkappenburg - 12 days ago

By leaving out scale or prior models they are effectively manipulating improvement. If from 3 to 4 it was from 10 to 80, and from 4 to 4o it was 80 to 82, leaving out 3 would let us see a steep line instead of steep decrease of growth.

Lies, damn lies and statistics ;-)

lsaferite - 11 days ago

Is there an API endpoint at OpenAI that gives the information on this page as structured data?

https://platform.openai.com/docs/models/gpt-4.1

As far as I can tell there's no way to discover the details of a model via the API right now.

Given the announced adoption of MCP and MCP's ability to perform model selection for Sampling based on a ranking for speed and intelligence, it would be great to have a model discovery endpoint that came with all the details on that page.

asdev - 12 days ago

> We will also begin deprecating GPT‑4.5 Preview in the API, as GPT‑4.1 offers improved or similar performance on many key capabilities at much lower cost and latency.

why would they deprecate when it's the better model? too expensive?

XCSme - 11 days ago

I tried 4.1-mini and 4.1-nano. The response are a lot faster, but for my use-case they seem to be a lot worse than 4o-mini(they fail to complete the task when 4o-mini could do it). Maybe I have to update my prompts...

Ninjinka - 11 days ago

I've been using it in Cursor for the past few hours and prefer it to Sonnet 3.7. It's much faster and doesn't seem to make the sort of stupid mistakes Sonnet has been making recently.

wongarsu - 11 days ago

Is the version number a retcon of 4.5? On OpenAI's models page the names appear completely reasonable [1]: The o1 and o3 reasoning models, and non-reasoning there is 3.5, 4, 4o and 4.1 (let's pretend 4o makes sense). But that is only reasonable as long as we pretend 4.5 never happened, which the models page apparently does

1: https://platform.openai.com/docs/models

thund - 11 days ago

Hey OpenAI if you ever need a Version Engineer, I’m available.

- 12 days ago
[deleted]
nsoonhui - 11 days ago

  We will also begin deprecating GPT‑4.5 Preview in the API, as GPT‑4.1 offers improved or similar performance on many key capabilities at much lower cost and latency. GPT‑4.5 Preview will be turned off in three months
Here's something I just don't understand, how can ChatGPT 4.5 be worse than 4.1? Or the only thing bad is that the OpenAI naming ability?
neal_ - 12 days ago

The better the benchmarks, the worse the model is. Subjectively for me the more advanced models dont follow instructions, and are less capable of implementing features or building stuff. I could not tell a difference in blind testing SOTA models gemini, claude, openai, deepseek. There has been no major improvements in the LLM space since the original models gained popularity. Each release claims to be much better the last, and every time i have been disappointed and think this is worse.

First it was the models stopped putting in effort and felt lazy, tell it to do something and it will tell you to do it your self. Now its the opposite and the models go ham changing everything they see, instead of changing one line, SOTA models rather rewrite the whole project and still not fix the issue.

Two years back I totally thought these models are amazing. I always would test out the newest models and would get hyped up about it. Every problem i had i thought if i just prompt it differently I can get it to solve this. Often times i have spent hours prompting starting new chats, adding more context. Now i realize its kinda useless and its better to just accept the models where they are, rather then try and make them a one stop shop, or try to stretch capabilities.

I think this release I won’t even test it out, im not interested anymore. I’ll probably just continue using deepseek free, and gemini free. I canceled my openai subscription like 6 months ago, and canceled claude after 3.7 disappointment.

composableaide - 11 days ago

Excited to see 4.1 in the API. The Nano model pricing is comparable to Gemini Flash but not where we would like it to be: https://composableai.de/openai-veroeffentlicht-4-1-nano-als-...

forbiddenvoid - 12 days ago

Lots of improvements here (hopefully), but still no image generation updates, which is what I'm most eager for right now.

flakiness - 12 days ago

Big focus on coding. It feels like a defensive move against Claude (and more recently, Gemini Pro) which became very popular in that regime. I guess they recently figured out some ways to train the model for these "agentic" coding through RL or something - and the finding is too new to apply 4.5 on time.

sc077y - 11 days ago

I'm wondering if one of the big reasons that OpenAI is making gpt-4.5 deprecated is not only because it's not cost-effective to host but because they don't want their parent model being used to train competitors' models (like deepseek).

asdev - 12 days ago

it's worse than 4.5 on nearly every benchmark. just an incremental improvement. AI is slowing down

esafak - 11 days ago

More information here:

  https://platform.openai.com/docs/models/gpt-4.1
  https://platform.openai.com/docs/models/gpt-4.1-mini
  https://platform.openai.com/docs/models/gpt-4.1-nano
rvz - 12 days ago

The big change about this announcement is the 1M context window on all models.

But the price is what matters.

growt - 12 days ago

My theory: they need to move off the 4o version number before releasing o4-mini next week or so.

intended - 11 days ago

If reasoning models are any good, then can they figure out overpowered builds for poe2?

Wait, wouldn’t this be a decent test for reasoning ?

Every patch changes things, and there’s massive complexity with the various interactions between items, uniques, runes, and more.

yberreby - 12 days ago

> Note that GPT‑4.1 will only be available via the API. In ChatGPT, many of the improvements in instruction following, coding, and intelligence have been gradually incorporated into the latest version (opens in a new window) of GPT‑4o, and we will continue to incorporate more with future releases.

The lack of availability in ChatGPT is disappointing, and they're playing on ambiguity here. They are framing this as if it were unnecessary to release 4.1 on ChatGPT, since 4o is apparently great, while simultaneously showing how much better 4.1 is relative to GPT-4o.

One wager is that the inference cost is significantly higher for 4.1 than for 4o, and that they expect most ChatGPT users not to notice a marginal difference in output quality. API users, however, will notice. Alternatively, 4o might have been aggressively tuned to be conversational while 4.1 is more "neutral"? I wonder.

tdehnke - 11 days ago

I just wish they would start using human friendly names for them, and use a YY.rev version number so it's easier to know how new/old something is.

Broad Knowledge 25.1 Coder: Larger Problems 25.1 Coder: Line focused 25.1

gcy - 11 days ago

4.10 > 4.5 — @stevenheidel

@sama: underrated tweet

Source: https://x.com/stevenheidel/status/1911833398588719274

aitchnyu - 11 days ago

I'm using models which scored at least 50% in Aider leaderboard but I'm micromanaging 50 line changes instead of being more vibe. Is it worth experimenting with a model that didnt crack 10%?

archeantus - 11 days ago

“GPT‑4.1 scores 54.6% on SWE-bench Verified, improving by 21.4%abs over GPT‑4o and 26.6%abs over GPT‑4.5—making it a leading model for coding.”

4.1 is 26.6% better at coding than 4.5. Got it. Also…see the em dash

meetpateltech - 12 days ago

GPT-4.1 Pricing (per 1M tokens):

gpt-4.1

- Input: $2.00

- Cached Input: $0.50

- Output: $8.00

gpt-4.1-mini

- Input: $0.40

- Cached Input: $0.10

- Output: $1.60

gpt-4.1-nano

- Input: $0.10

- Cached Input: $0.025

- Output: $0.40

codingwagie - 12 days ago

GPT-4.1 probably is a distilled version of GPT-4.5

I dont understand the constant complaining about naming conventions. The number system differentiates the models based on capability, any other method would not do that. After ten models with random names like "gemini", "nebula" you would have no idea which is which. Its a low IQ take. You dont name new versions of software as completely different software

Also, Yesterday, using v0, I replicated a full nextjs UI copying a major saas player. No backend integration, but the design and UX were stunning, and better than I could do if I tried. I have 15 years of backend experience at FAANG. Software will get automated, and it already is, people just havent figured it out yet

msp26 - 12 days ago

I was hoping for native image gen in the API but better pricing is always appreciated.

Gemini was drastically cheaper for image/video analysis, I'll have to see how 4.1 mini and nano compare.

pcwelder - 12 days ago

Can someone explain to me why we should take Aider's polyglot benchmark seriously?

All the solutions are already available on the internet on which various models are trained, albeit in various ratios.

Any variance could likely be due to the mix of the data.

Aeroi - 11 days ago

The user shoudn't have to research which model is the best for them. OpenAI needs to do a better job in UX and putting the best model forward in chatgpt.

vzaliva - 11 days ago

They continue to baffle users with their version numbering. Intiutively 4.5 is newer/better than 4.1 and perhaps 4o but of course this is not the case.

sandspar - 11 days ago

Is this correct: OpenAI will sequester 4.1 in the API permanently? And, since November 2024, they've already wrapped much of 4.1's features into ChatGPT 4o?

user14159265 - 11 days ago

And it is available at https://t3.chat/ (as well as claude, grok, gemini etc) for 8usd/month

elias_t - 12 days ago

Does someone have the benchmarks compared to other models?

htrp - 11 days ago

anyone want to guess parameter sizes here for

GPT‑4.1, GPT‑4.1 mini GPT‑4.1 nano

I'll start with

800 bn MoE (probably 120 bn activated), 200 bn MoE (33 bn activated), and 7bn parameter for nano

furyofantares - 11 days ago

It's another Daft Punk day. Change a string in your program* and it's better, faster, cheaper: pick 3.

*Then fix all your prompts over the next two weeks.

lich-001 - 11 days ago

I wish they would deprecate all existing ones when they bake a new model instead of aiming for pointless model diversity.

croemer - 12 days ago

Testing against unspecified other "leading" models allows for shenanigangs:

> Qodo tested GPT‑4.1 head-to-head against other leading models [...] they found that GPT‑4.1 produced the better suggestion in 55% of cases

The linked blog post goes 404: https://www.qodo.ai/blog/benchmarked-gpt-4-1/

__mharrison__ - 11 days ago

I know this is somewhat off topic, but can someone explain the naming convention used by OpenAI? Number vs "mini" vs "o" vs "turbo" vs "chat"?

simianwords - 12 days ago

Could any one guess the reason as to why they didn't ship this in the chat UI?

bli940505 - 11 days ago

Does this mean that the o1 and o3-mini models are also using 4.1 as the base now?

soheil - 12 days ago

Main takeaways:

- Coding accuracy improved dramatically

- Handles 1M-token context reliably

- Much stronger instruction following

- 12 days ago
[deleted]
p1dda - 11 days ago

LLMs are not intelligent

LeicaLatte - 11 days ago

i've recently set claude 3.7 as the default option for customers when they start new chats in my app. this was a recent change, and i'm feeling good about it. supporting multiple providers can be a nightmare for customer service, especially when it comes to billing and handling response quality queries. with so many choices from just one provider, it simplifies things significantly. curious about how openai manages customer service internally.

yieldcrv - 11 days ago

More season 4’s than attack on titan

i_love_retros - 11 days ago

I feel overwhelmed

bbstats - 11 days ago

ok.

polytely - 12 days ago

It seems that OpenAI is really differentiating itself in the AI market by developing the most incomprehensible product names in the history of software.

oidar - 12 days ago

I need an AI to understand the naming conventions that OpenAI is using.

bakugo - 12 days ago

> We will also begin deprecating GPT‑4.5 Preview in the API, as GPT‑4.1 offers improved or similar performance on many key capabilities at much lower cost and latency. GPT‑4.5 Preview will be turned off in three months, on July 14, 2025, to allow time for developers to transition.

Well, that didn't last long.

T3uZr5Fg - 12 days ago

[dead]

j_maffe - 12 days ago

OAI are so ahead of the competition, they don't need to compare with the competition anymore /s

curtisszmania - 12 days ago

[dead]

Yoplaid - 12 days ago

[dead]

- 12 days ago
[deleted]
pastureofplenty - 11 days ago

The plagiarism machine got an update! Yay!