GitHub - aymenfurter/gpt-image-gen-labs

3 min read Original article ↗

Azure OpenAI - GPT-Image-1 Labs

GPT-Image-1 Lab Preview

Welcome to the GPT-Image-1 Lab: A hands-on exploration of Azure OpenAI’s GPT-Image-1 for image generation, edits, and semantic validation workflows.


Repository Overview

In this repository, you’ll find:

File/Folder Description
app.py A Gradio-based interactive UI showcasing various image generation features and techniques.
image_generation.py A Python class wrapping calls to the GPT-Image-1 endpoints (Generate, Edit, etc.) and GPT-4o Vision (for semantic validation).
notebook.ipynb An illustrative notebook covering step-by-step usage of GPT-Image-1 on Azure.

This setup is designed as a lab: follow each step to learn the best practices and patterns around GPT-Image-1, from reliable transparency to multi-frame sprite animations!


Features

1. Basic Generation

Generate a single image from a textual prompt.

  • Size: Standard combos like 1024x1024, 1536x1024, etc.
  • Quality: high, medium, or low.
  • Returns a single PNG or None if generation failed.

2. Transparent Generation

Request backgrounds to be transparent.

  • If not enough transparency is detected, the script can attempt an edit call to remove backgrounds.
  • Returns (Image, transparency_percentage) upon success.

3. Layered Scenes

Construct a layered “background + foreground” scene, each generated independently.

  • The background is opaque.
  • The foreground is generated (or forced) to have transparency, then composited on top.
  • A collision mask is also returned (black silhouette where the foreground is non-empty).

3. Sprite Generation

Generate complete character sprite sets for 2D games:

  • Front view (facing camera)
  • Back view (facing away)
  • Left side view (automatically mirrored for right view)
  • Consistent style across all angles
  • Perfect for top-down RPGs and side-scrollers

5. Semantic Image Validation with GPT-4o Vision

  • After generating an image, we can feed the result (as a smaller JPEG) into GPT-4o.
  • GPT-4o can confirm or deny a list of statements, e.g., “The car is red,” “The dog is wearing a hat,” etc.
  • Retry generation if the statements fail.

These patterns empower you to trust the final result and automate iterative workflows without manual inspection.


Getting Started

1. Clone or Download this Repository:

git clone https://github.com/your-username/gpt-image-lab.git
cd gpt-image-lab

2. Install Dependencies:

pip install -r requirements.txt

3. Create a .env File:

Add your Azure OpenAI credentials:

AZURE_IMAGE_API_ENDPOINT=<your-azure-image-endpoint>
AZURE_IMAGE_API_KEY=<your-azure-image-api-key>
AZURE_IMAGE_DEPLOYMENT_NAME=<your-azure-image-deployment-name>

# For GPT-4o Vision if using the validation feature:
AZURE_TEXT_API_ENDPOINT=<your-azure-text-endpoint>
AZURE_TEXT_API_KEY=<your-azure-text-api-key>
AZURE_TEXT_DEPLOYMENT_NAME=<your-azure-text-deployment-name>

4. Run the Gradio App:

  • The console will display a local URL. Open that in your browser to use the interactive UI.

5. Explore the Notebook (notebook.ipynb):

  • Jupyter or VS Code:

    jupyter notebook notebook.ipynb
  • Walk through each step to see direct usage of the endpoints, transparency checks, semantic validation logic, etc.

Contributing

Feel free to open issues or pull requests if you have improvements or ideas. Contributions are always welcome!