Introducing vision to the fine-tuning API

Today, we’re introducing vision fine-tuning⁠(opens in a new window) on GPT‑4o¹, making it possible to fine-tune with images, in addition to text. Developers can customize the model to have stronger image understanding capabilities which enables applications like enhanced visual search functionality, improved object detection for autonomous vehicles or smart cities, and more accurate medical image analysis.

Since we first introduced fine-tuning on GPT‑4o, hundreds of thousands of developers have customized our models using text-only datasets to improve performance on specific tasks. However, for many cases, fine-tuning models on text alone doesn’t provide the performance boost expected.