Smaller, faster version of GPT-4.1 Smaller, faster version of GPT-4.1 GPT-4.1 mini excels at instruction following and tool calling. It features a
1M token context window, and low latency without a reasoning step. Note that we recommend starting with GPT-5 mini for
more complex tasks. Jun 01, 2024 knowledge cutoff Pricing Pricing is based on the number of tokens used, or other metrics based on the model type. For tool-specific models, like search and computer use, there’s a fee per tool call. See details in the Endpoints Chat Completions v1/chat/completions Fine-tuning v1/fine-tuning Image generation v1/images/generations Image edit v1/images/edits Speech generation v1/audio/speech Transcription v1/audio/transcriptions Translation v1/audio/translations Completions (legacy) v1/completions Features Function calling Supported Structured outputs Supported Distillation Not supported Predicted outputs Supported Snapshots Snapshots let you lock in a specific version of the model so that performance and behavior remain consistent. Below is a list of all available snapshots and aliases for GPT-4.1 mini . Rate limits Rate limits ensure fair and reliable access to the API by placing specific caps on requests or tokens used within a given time period. Your usage tier determines how high these limits are set and automatically increases as you send more requests and spend more on the API.![]()
![]()
Tier RPM RPD TPM Batch queue limit Free 3 200 40,000 - Tier 1 500 10,000 200,000 2,000,000 Tier 2 5,000 - 2,000,000 20,000,000 Tier 3 5,000 - 4,000,000 40,000,000 Tier 4 10,000 - 10,000,000 1,000,000,000 Tier 5 30,000 - 150,000,000 15,000,000,000