GPT-4.1 mini Model | OpenAI API

1 min read Original article ↗

Models

gpt-4.1-mini

Smaller, faster version of GPT-4.1

Smaller, faster version of GPT-4.1

GPT-4.1 mini excels at instruction following and tool calling. It features a 1M token context window, and low latency without a reasoning step.

Note that we recommend starting with GPT-5 mini for more complex tasks.

Jun 01, 2024 knowledge cutoff

Pricing

Pricing is based on the number of tokens used, or other metrics based on the model type. For tool-specific models, like search and computer use, there’s a fee per tool call. See details in the

pricing page.

Endpoints

Chat Completions

v1/chat/completions

Fine-tuning

v1/fine-tuning

Image generation

v1/images/generations

Image edit

v1/images/edits

Speech generation

v1/audio/speech

Transcription

v1/audio/transcriptions

Translation

v1/audio/translations

Completions (legacy)

v1/completions

Features

Function calling

Supported

Structured outputs

Supported

Distillation

Not supported

Predicted outputs

Supported

Snapshots

Snapshots let you lock in a specific version of the model so that performance and behavior remain consistent. Below is a list of all available snapshots and aliases for

GPT-4.1 mini

.

gpt-4.1-mini

Rate limits

Rate limits ensure fair and reliable access to the API by placing specific caps on requests or tokens used within a given time period. Your usage tier determines how high these limits are set and automatically increases as you send more requests and spend more on the API.

TierRPMRPDTPMBatch queue limit
Free320040,000-
Tier 150010,000200,0002,000,000
Tier 25,000-2,000,00020,000,000
Tier 35,000-4,000,00040,000,000
Tier 410,000-10,000,0001,000,000,000
Tier 530,000-150,000,00015,000,000,000