Why OpenAI Models Struggle with PDFs (And Why Gemini Fairs Much Better)

4 min read Original article ↗

Ali Basiri

If you’ve ever tried using GPT-4o to extract data from anything beyond simple PDFs you’ve likely noticed something frustrating: the output is riddled with errors. Incorrect characters, garbled words, and misplaced numbers are scattered throughout the document. But why does this happen?

At first glance, it might seem like OpenAI’s models simply struggle with understanding text in images. Are their OCR capabilities weak? Is their training data insufficient? Could it be how they encode/tokenize images? While these factors might play a role, we discovered likely the largest contributing factor: resolution degradation.

GPT-4o Loses The Bid

When converting PDF to Markdown, Doctly.ai’s ‘Precision Ultra’ service routes every page to different LLMs and has them compete for accuracy. The most accurate and consistent extraction will win in a tournament style competition and is selected as the output. Looking at the win stats for different LLMs, we noticed that GPT-4o was underperforming.

So we were curious why?

OpenAI’s Image Input Resizing

When you upload an image of a PDF page, ChatGPT drastically reduces the resolution of the image before using it. According to their documentation, they resize the image so that the shortest side is no bigger than 768 pixels. That means for a letter size page you end up with 994x768 pixels or 90 DPI. This aggressive resizing means that if your PDF contains small print or detailed tables and data, ChatGPT is more prone to errors in text extraction.

In contrast, Gemini seems to resize to a much higher resolution at around 210 DPI. In our testing input token count didn’t increase when the image size of the page was above 2300x1778, suggesting some internal resizing, but far larger than ChatGPT’s.

A Simple Test

Here is a simple test. We take an image of a PDF page at 210 DPI resolution (The Gemini size) and cut out a section of this page that is under 768 pixels. This ensures that it is not further scaled down by ChatGPT. We also create a copy of this cut out and scale it down to 90 DPI. Now we have two versions, one at 210 DPI, similar to what Gemini 2.0 Flash would normally see and one at 90 DPI, similar to what ChatGPT would normally see.

Press enter or click to view image in full size

Figure 1. This is the page we will be using for this example. The page is resized to 2300x1778 pixels. The red section is the area we cut out measuring 655x445 at 210 DPI. The next image is the 90 DPI version, measuring 283x192 pixels. (This figure is not actual size, it is shown to demonstrate the process and relative sizes)

We then run both of the images though GPT-4o with the exact same prompt. GPT-4o manages to successfully extract the content of the 210 DPI image without any errors, while the 90 DPI image is full of errors:

Press enter or click to view image in full size

Figure 2. This image shows the diff of the markdown output between 90 DPI (left/red) vs 210 DPI (right/green). All errors are highlighted in dark red.

What about 4o-mini?
I tried this same experiment across multiple files for 4o-mini. The hope was to see if 4o-mini could be a cheap alternative if OpenAI ever increased their low resolution input restrictions. Alas, the high resolution image conversion was still full of errors, even if it was much less than the low resolution version.

Final Thoughts

This result suggests that GPT-4o, the model, might not be fundamentally bad at handling PDFs and that it’s problems mainly stem from the poor input quality due to the aggressive downscaling. Don’t expect future models from OpenAI, such as GPT-4.5 or 5 to get any better at extraction without changes to the input downscaling.

Regardless, even if they do fix the input size, it still cannot compete with Gemini 2.0 Flash on price.

If you need the highest level of accuracy in extraction, Doctly.ai is the way to go. Our Precision Ultra service performs even better than Gemini alone and it’s designed to be integrated into your existing workflows. Try it today. To get 250 extra free credits, email support@doctly.ai and reference this article.