ChatGPT’s bug with scanned PDFs

While extracting structured data from PDFs, I noticed something strange: a small fraction of scanned PDFs were dropping content at the top of the page. After narrowing it down, I found out that ChatGPT was only seeing the bottom section of the page.

This bug affects every OpenAI model I tested. On the API and on the website. I tested it on gpt-5.4, gpt-5.3-codex, and gpt-4.1-mini.

Two PDFs. One ChatGPT reads. One it doesn’t.

The two PDFs below display identically in Chrome, Acrobat, macOS Preview, and any other compliant viewer.

They are two A4 pages, with the same big “Number: 942194” header, same dimensions, same content. You cannot tell them apart by looking at them.

Press enter or click to view image in full size

The difference between them is one small thing:

- OK.pdf: bitmap stored upright, no “/Rotate” entry.
- NOK.pdf: bitmap stored rotated 90° clockwise, with a “/Rotate 270” flag telling the viewer to rotate it back at display time.

Now let’s ask the model the question: ”What number is shown in this PDF?”

Press enter or click to view image in full size

ChatGPT: I can’t determine it.

Three different generations of OpenAI models — 4.1, 5.3, 5.4 — all can’t see the number. Either saying “I can’t determine it”, or hallucinating some number.

Gemini reads both files correctly.

What’s broken

OpenAI runs PDFs through an internal PDF-to-image rasterizer before handing the result to the vision model. That rasterizer ignores /Rotate. It reads the bitmap stream as-stored and feeds it to the model directly.

When a scanner produces a landscape bitmap with /Rotate 270 so it displays portrait, OpenAI’s pipeline hands the model a sideways landscape image. The model sees text running 90° to its expected orientation and either gives up or — worse — latches onto whatever fragment it can parse and confidently invents the rest.

I tested at five different bitmap resolutions, from 8.7 million pixels down to 87,000 pixels. The bug reproduces 100% at every resolution, including resolutions small enough that no tiling is involved. So this isn’t a tiling edge case or a downsampling artifact. It’s just /Rotate being ignored.

Under the hood
If you peek inside NOK.pdf with any PDF library, here’s what you find:

Press enter or click to view image in full size

In the “broken” PDF, both the MediaBox and the embedded bitmap stream
are landscape. The /Rotate 270 flag is the only thing that tells a viewer to rotate it 270° clockwise at display time so it appears as a portrait page.

A PDF viewer reads the landscape MediaBox, applies /Rotate 270, and shows you a portrait page. OpenAI’s pipeline reads the landscape MediaBox, ignores /Rotate, and hands the model a sideways landscape image — which is bad enough on its own, but from checking what the model can read on real-world failing PDFs, it looks like only a square region from one side of the raw bitmap is actually visible to the model.

ChatGPT can only read the bottom part of the page

“But /Rotate is exotic, right?”

/Rotate is defined in ISO 32000-1 §7.7.3.3, the official PDF specification. It has been part of the PDF standard since PDF 1.0, in 1993.

It is the standard way that office scanners encode page orientation.

Scanners and apps write the bitmap in the sensor’s native orientation and add /Rotate to avoid the CPU cost of re-encoding. A large fraction of the PDFs have /Rotate set because they came from a scanner.

If you’re using scanned PDFs with OpenAI’s API, you are probably getting errors due to this issue.

The repro, in 50 lines of Python

from PIL import Image, ImageDraw, ImageFont
import pypdf
import random
import ioPAGE_W = 1240
PAGE_H = 1754
NO_ROTATE_PDF = "OK.pdf"
WITH_ROTATE_PDF = "NOK.pdf"
def make_page(number: str) -> Image.Image:
    img = Image.new("RGB", (PAGE_W, PAGE_H), "white")
    draw = ImageDraw.Draw(img)
    font = ImageFont.load_default(100)
    text = f"Number: {number}"
    bbox = draw.textbbox((0, 0), text, font=font)
    text_width = bbox[2] - bbox[0]
    x = (PAGE_W - text_width) / 2
    y = 100 - bbox[1]
    draw.text((x, y), text, fill="black", font=font)
    return img
def save_pdf(image: Image.Image, path: str, rotate: int) -> None:
    buf = io.BytesIO()
    image.save(buf, format="PDF", resolution=200.0)
    buf.seek(0)
    reader = pypdf.PdfReader(buf)
    writer = pypdf.PdfWriter()
    page = reader.pages[0]
    if rotate:
        # page.rotate() only updates the /Rotate entry in the page
        page.rotate(rotate)
    writer.add_page(page)
    with open(path, "wb") as f:
        writer.write(f)
def generate_pdfs(number: str) -> None:
    upright = make_page(number)
    # PDF A: bitmap stored upright (1240 x 1754), no /Rotate flag.
    save_pdf(upright, NO_ROTATE_PDF, rotate=0)
    # PDF B: bitmap stored rotated 90 degrees clockwise. /Rotate 270 to rotate the page 
    sideways = upright.rotate(-90, expand=True)
    save_pdf(sideways, WITH_ROTATE_PDF, rotate=270)
number = f"{random.randint(100000, 999999)}"
generate_pdfs(number)

Workaround

Until OpenAI fixes their pipeline, consider rendering PDF pages to PNG client-side with a library that honors /Rotate (`pypdfium2`, `pdf2image`) and sending the result as `input_image` instead of `input_file`.

You can search your PDFs to see if you have this issue using this code:

import pypdffor path in your_pdfs:
    if pypdf.PdfReader(path).pages[0].get("/Rotate"):
        print(path, "is affected")

I reported this to OpenAI on April 10, 2026. The bug is still present.