Springus - Your Digital Wardrobe Companion

8 min read Original article ↗

Overview

In this benchmark we evaluate image editing models on their ability to perform image-guided transformations while preserving identity-defining details from an input image. This benchmark requires models to integrate information from:

  • An input image (a "fit pic") containing the subject of interest within its broader outfit context
  • A query image (an "e-commerce-style flat-lay") containing the desired pose, composition, or spatial configuration of the output
  • A text prompt guiding the edit (templated, introducing both images)

Goal: Generate an image that combines the garment from the input with the pose/composition of the query image while preserving identity-defining details (e.g., brand logos, printed graphics, unique patterns, embedded text).

In the Springus app, users upload fit pics and expect garments to transfer seamlessly into new poses or backgrounds while keeping brand details intact—these tasks mirror those real-world expectations.

Fashion garment transfer workflow diagram

Figure 1: The image editing task - transferring garment identity with fine-grained attribute preservation


Results

Overall Winner

Nano Banana Pro 8 / 12

Best average performance across all four tasks

Runner Up

Nano Banana 7 / 12

Strong consistency with minor misses on multi-image

Honorable mention

GPT-Image-1

Standout performance on multi-image task

Model Total (out of 12) Graphic Reconstruction Pattern Reconstruction Small Segment Multi Image
Nano Banana Pro Winner 8 3/3 3/3 1/3 1/3
Nano Banana Runner Up 7 3/3 2/3 1/3 1/3
GPT-Image-1 4 0/3 1/3 0/3 3/3
Seedream 4 3 1/3 1/3 0/3 1/3
Qwen 2 2/3 0/3 0/3 0/3

Scoring: each task is rated 0–3 (higher is better) based on three generations; totals are out of 12.

These benchmarks stress true one-shot performance with a bias towards consistency over best-possible outcome. Each model must hit quality targets with minimal inputs. We prioritize reliably repeatable quality over occasional perfect outputs. Across Tasks 1–3, Nano Banana Pro consistently demonstrates strong one-shot performance, delivering reliable outputs with limited context.

When holistically evaluating the performance gap between the Nano Banana and Nano Banana Pro model, the differences are negligible. While outside the scope of this study, Nano Banana Pro is unlikely to see usage in production due to its significantly worse cost and latency when compared to Nano Banana.

Future work should deepen few-shot tasks; early signals suggest GPT-Image-1 may benefit from richer input sets, and expanded multi-image tests could surface that advantage. Such tasks would also be a better reflection of image editing models in the Springus App.

Graphic Reconstruction

Our first benchmark task focuses on preserving visible text, logos, and graphics when transferring a garment to a new pose.

Input Image

Source t-shirt with graphic text

Query Image

Target pose reference

Prompt

Using the t-shirt in the outfit image, render it in the layout, pose, and background of the query image. Preserve all visible text, graphics, and patterns from the outfit image. Keep all structural cues from the query image unchanged.

Nano Banana Pro

Nano Banana Pro output

3/3

Pros

  • Flawless text preservation and graphic retention
  • Impeccable pose matching to query image
  • Clean background transfer with no artifacts
  • Natural lighting and shadows

Cons

  • Minor edge sharpening visible on one run

Nano Banana

Nano Banana output

3/3

Pros

  • Perfect text preservation and graphic retention
  • Perfect pose matching

Cons

  • Graphic size and placement off in one run

Qwen

Qwen output

2/3

Pros

  • Perfect graphic reconstruction
  • Stable colour preservation across runs

Cons

  • Poor brand text reconstruction
  • Graphic placement off on one attempt

Seedream 4

Seedream output

1/3

Pros

  • Good graphic reconstruction
  • Strong colour match

Cons

  • Text font doesn't match on most runs

GPT-Image-1

GPT-Image-1 output

0/3

Pros

  • Fair graphic reconstruction in isolated regions

Cons

  • Colours don't match
  • Graphic size off
  • Brand text illegible

Nano Banana Pro, Nano Banana and Qwen are all roughly evenly matched here. Logo reconstruction is perfect across all of theirs runs. The smaller (and less important) brand logo is the only differentiator here.

Pattern Reconstruction

Our second benchmark task stresses fine pattern preservation across drape and pose while transferring a garment into a new scene.

Input Image

Source pants with pattern

Query Image

Target pose reference

Prompt

Using the pants in the outfit image, render them in the layout, pose, and background of the query image. Preserve all visible text, graphics, and patterns from the outfit image. Keep all structural cues from the query image unchanged.

Nano Banana Pro

Nano Banana Pro output

3/3

Pros

  • Flawless pattern preservation and detail retention
  • Strong pose matching to query image
  • Clean background transfer with no artifacts

Cons

  • Minor fabric stiffness on one variation

Nano Banana

Nano Banana output

2/3

Pros

  • Flawless pattern preservation and detail retention
  • Strong pose matching to query image
  • Clean background transfer with no artifacts

Cons

  • Waist crease slightly softened in one run

Qwen

Qwen output

0/3

Pros

  • Strong pattern match
  • Good color fidelity

Cons

  • Waist style not matching
  • Graphic hallucinations

Seedream 4

Seedream output

1/3

Pros

  • Good lighting and shadows
  • Strong pattern matching

Cons

  • Pose doesn't match query image input

GPT-Image-1

GPT-Image-1 output

1/3

Pros

  • Fair pattern matching
  • Strong representation of input image

Cons

  • Waist style not matching (hallucinated waistband)

Once again, Nano Banana Pro, Nano Banana excel. Failure modes are interesting to note here, Qwen and Seedream 4 both retain the distinct blotch on the upper left leg yet both hallucinate larger important details like fly or pocket placement.

Small Segment Enhancement

Our third benchmark task targets small, detailed objects that occupy minimal image space—testing brand identity, tiny accessories (e.g., jibbitz shoe charms), and world modeling where parts of the object are not visible in the input.

Input Image

Source shoes in outfit

Query Image

Target pose reference

Prompt

Using the shoes in the outfit image, render them in the layout, pose, and background of the query image. Preserve all visible text, graphics, and patterns from the outfit image. Keep all structural cues from the query image unchanged.

Nano Banana Pro

Nano Banana Pro output

1/3

Pros

  • Fantastic detail enhancement, brand logo displayed despite not being visible
  • Fair jibbitz (shoe charm) reconstruction
  • Strong pose matching
  • Incredible display of world knowledge—accurate under-shoe details despite being invisible

Cons

  • Minor sole over-sharpening on one run

Nano Banana

Nano Banana output

1/3

Pros

  • Accurate pose matching
  • Fair jibbitz (shoe charm) reconstruction

Cons

  • Incorrect under shoe details on one variation

Qwen

Qwen output

0/3

Cons

  • Complete hallucination. Input not visible in output

Seedream 4

Seedream output

0/3

Pros

  • Accurate shoe structure
  • Identifiable result.

Cons

  • "Sport mode" strap hallucination
  • Inaccurate pose

GPT-Image-1

GPT-Image-1 output

0/3

Cons

  • Complete hallucination. Input not visible in output

This task is quite challenging. We're testing the model on some zero-shot elements by looking for the crocs sole in the output. It's remarkable how strong the passing output of Nano Banana Pro is. Not only does it get the texture of the bottom of the shoes right, but both the logo and size placement. It's worth noting too that its other failure modes are due to query misalignment rather than any sort of hallucination (as is the case with all other outputs).

Multi Image Reconstruction

Our fourth benchmark task tests few-shot inference across multiple inputs, preserving consistency across them. This is the only few-shot example in this benchmark.

Input Image 1

Input 1

Query Image

Query

Prompt

Using the t-shirts from the input outfit images, render them in the layout, pose, and background of the query image. Preserve all visible text, graphics, and patterns from the outfit images. Keep all structural cues from the query image unchanged.

Nano Banana Pro

Nano Banana output

1/3

Pros

  • Accurate and consistent colour matching
  • Consistent text spacing

Cons

  • Inconsistent text reconstruction between inputs
  • Inconsistent shirt size across runs

Nano Banana

Nano Banana output

1/3

Pros

  • Consistent matching query image sizing and pose
  • Failure cases are less visibly jarring

Cons

  • Inconsistent coloring on one attempt

Qwen

Qwen output

0/3

Pros

  • Incredibly consistent text reconstruction

Cons

  • Core prompt alignment

Seedream 4

Seedream output

1/3

Pros

  • Incredibly consistent text reconstruction

Cons

  • Poor prompt alignment
  • Hallucinations of shirt damage

GPT-Image-1

GPT-Image-1 output

3/3

Pros

  • Incredibly consistent text reconstruction
  • Consistent sizing
  • Consistent font

This task shows a major shortcoming of the Nano Banana family. Text reconstruction given multiple angles seems like a relatively easy task for the models, but the Nano Banana family fails to do so. This hints they might struggle with tasks that benefit from few-shot reasoning. More testing is needed to confirm this.


Controlled Variables

Our benchmark ensures fair comparison across all models:

  • Same prompt across all models (minimal mechanical adjustments)
  • Best of 3 generations — we generate 3 outputs per model and show the best result
  • Same resolution (1024×1024)
  • Same format JPEG at max quality (no additional compression)
  • Same aspect ratio (square)
  • Same test images for fair comparison

References & Acknowledgments

This benchmark builds on the excellent work by Shaun Pedicini in the original GenAI Image Editing Showdown, summarized by Simon Willison.

Model providers: - ByteDance (Seedream 4) - Google (Gemini 2.5 Flash) - Qwen Team (Qwen-Image-Edit-Plus) - Black Forest Labs (FLUX.1 Kontext) - OmniGen Community - OpenAI (GPT-Image-1)

Platform: Replicate for unified model access

Try Springus for Free

Upload your fit pics. Know what works. Dress with confidence.

Get Started