Recent advances in image generation models have demonstrated remarkable capabilities in creating photorealistic and imaginative visuals. However, a persistent challenge remains: accurately rendering reflections in mirrors. We anecdotally evaluate five image generation models and four video generation models using five prompts featuring both humans and objects. Our findings reveal that AI models frequently struggle with reflections, often generating distorted, inconsistent, or entirely incorrect images. Here is the data.
Press enter or click to view image in full size
Introduction
Generative image models, particularly those based on deep learning, have achieved impressive results in synthesizing realistic images of various scenes and objects. From generating human faces to creating fantastical landscapes, these models have shown a remarkable ability to learn complex data distributions and produce novel content. However, despite their progress, a seemingly simple element — the mirror — continues to pose a significant challenge. Reflections, governed by the precise laws of optics, often appear distorted, misplaced, or entirely absent in generated images. This article explores how mirrors pose a significant challenge for generative models and suggests that addressing this blind spot is crucial to achieve more realistic and physically plausible image synthesis.
Experiments and Results
We chose a range of generative models to assess how effectively popular image and video generation models can synthesize content with accurate mirror reflections. These models are readily available to the public.
Image generation models
We evaluated five image generation models including:
- Gemini which uses Imagen 3 as its generation backbone
- Adobe Firefly
- Bing which uses DALL-E 3
- Ideogram
- Freepik.com
These models were evaluated using the following prompts, some featuring humans and others containing only objects.
- An image of a young lady holding a pen in front of a mirror
- An image of two cats playing in front of a mirror
- An image of a chair in front of a mirror
- An image of a group of people in a room with a mirror in it
- An image of a kitchen with a mirror in it
The results from various models (some examples are shown below) show consistent patterns of reflection and perspective issues. The Gemini model struggles with incorrect or missing reflections and misjudged object placements, particularly with cats, chairs, and kitchen scenes. Some errors are subtle but noticeable.
The Ideogram model generally produces higher fidelity images, but also faces recurring issues. Hand reflections are often incorrect, and objects can appear inconsistently reflected. It particularly struggles with group images and faces, making significant errors in reflections and image coherence. Quality of faces in group images is poor.
Adobe Firefly has more severe errors, such as objects extending unnaturally outside mirrors and misaligned or missing reflections, leading to reduced realism.
Bing Image Creator often produces cartoonish images with significant reflection issues, misplacing or distorting elements.
Freepik-generated cat images show high visual quality but still suffer from similar reflection errors, highlighting a common challenge across models.
Press enter or click to view image in full size
Press enter or click to view image in full size
Press enter or click to view image in full size
Press enter or click to view image in full size
Press enter or click to view image in full size
Press enter or click to view image in full size
High-resolution versions of the generated images are available on the GitHub page associated with this article for further examination.
Video generation models
Additionally, we evaluated the following text-to-video generation models using only the first prompt from the previous subsection.
- veed.io
- pollo.ai (poolo 1.5)
- ltx.studio
- vidnoz.com
These models exhibit similar issues to those observed in the image generation models. In addition to errors in appearance and consistency, they also struggle with accurately generating motion in reflections. Reflected elements often move incorrectly or fail to correspond to the real-world physics of mirrored motion, further degrading the realism of the generated videos. As a result, their overall performance in handling reflections is particularly poor, making the generated videos noticeably flawed.
For further analysis, these videos are available on the GitHub page associated with this article.
Take away
The challenge of reflections highlights a deeper issue: the need for improved 3D scene understanding and geometric reasoning in generative models.
The primary objective was to highlight this persistent issue and demonstrate that, despite years of continuous advancements, these models still struggle to generate accurate images!
Addressing this blind spot requires a multi-pronged approach:
- Improved Architectures: Exploring novel neural network architectures that explicitly incorporate geometric constraints and 3D scene representations could be beneficial.
- Enhanced Training Data: Creating larger and more diverse datasets with explicit annotations of reflective surfaces and object relationships is crucial. Synthetic data generation may also play a role.
- Physics-based Rendering Integration: Incorporating elements of physics-based rendering into generative models could improve the accuracy of reflection generation.
- Explicit Reflection Modeling: Developing methods that explicitly model the physics of reflection, perhaps through differentiable ray tracing or other techniques, could offer a more robust solution.
Broader perspective: The failure of generative models to accurately render mirror reflections highlights limitations in their understanding of physical laws, geometry, and 3D scene relationships. This issue affects applications like medical imaging, autonomous systems, and digital visualization, where precise spatial reasoning is essential. It also points to gaps in training data and AI’s generalization abilities. Addressing these challenges will require integrating 3D reasoning, physics simulations, and more diverse datasets, pushing AI models toward more reliable, physically grounded applications.