Azure OpenAI - GPT-Image-1 Labs
Welcome to the GPT-Image-1 Lab: A hands-on exploration of Azure OpenAI’s GPT-Image-1 for image generation, edits, and semantic validation workflows.
Repository Overview
In this repository, you’ll find:
| File/Folder | Description |
|---|---|
app.py |
A Gradio-based interactive UI showcasing various image generation features and techniques. |
image_generation.py |
A Python class wrapping calls to the GPT-Image-1 endpoints (Generate, Edit, etc.) and GPT-4o Vision (for semantic validation). |
notebook.ipynb |
An illustrative notebook covering step-by-step usage of GPT-Image-1 on Azure. |
This setup is designed as a lab: follow each step to learn the best practices and patterns around GPT-Image-1, from reliable transparency to multi-frame sprite animations!
Features
1. Basic Generation
Generate a single image from a textual prompt.
- Size: Standard combos like
1024x1024,1536x1024, etc. - Quality:
high,medium, orlow. - Returns a single PNG or
Noneif generation failed.
2. Transparent Generation
Request backgrounds to be transparent.
- If not enough transparency is detected, the script can attempt an edit call to remove backgrounds.
- Returns
(Image, transparency_percentage)upon success.
3. Layered Scenes
Construct a layered “background + foreground” scene, each generated independently.
- The background is opaque.
- The foreground is generated (or forced) to have transparency, then composited on top.
- A collision mask is also returned (black silhouette where the foreground is non-empty).
3. Sprite Generation
Generate complete character sprite sets for 2D games:
- Front view (facing camera)
- Back view (facing away)
- Left side view (automatically mirrored for right view)
- Consistent style across all angles
- Perfect for top-down RPGs and side-scrollers
5. Semantic Image Validation with GPT-4o Vision
- After generating an image, we can feed the result (as a smaller JPEG) into GPT-4o.
- GPT-4o can confirm or deny a list of statements, e.g., “The car is red,” “The dog is wearing a hat,” etc.
- Retry generation if the statements fail.
These patterns empower you to trust the final result and automate iterative workflows without manual inspection.
Getting Started
1. Clone or Download this Repository:
git clone https://github.com/your-username/gpt-image-lab.git
cd gpt-image-lab2. Install Dependencies:
pip install -r requirements.txt
3. Create a .env File:
Add your Azure OpenAI credentials:
AZURE_IMAGE_API_ENDPOINT=<your-azure-image-endpoint>
AZURE_IMAGE_API_KEY=<your-azure-image-api-key>
AZURE_IMAGE_DEPLOYMENT_NAME=<your-azure-image-deployment-name>
# For GPT-4o Vision if using the validation feature:
AZURE_TEXT_API_ENDPOINT=<your-azure-text-endpoint>
AZURE_TEXT_API_KEY=<your-azure-text-api-key>
AZURE_TEXT_DEPLOYMENT_NAME=<your-azure-text-deployment-name>
4. Run the Gradio App:
- The console will display a local URL. Open that in your browser to use the interactive UI.
5. Explore the Notebook (notebook.ipynb):
-
Jupyter or VS Code:
jupyter notebook notebook.ipynb
-
Walk through each step to see direct usage of the endpoints, transparency checks, semantic validation logic, etc.
Contributing
Feel free to open issues or pull requests if you have improvements or ideas. Contributions are always welcome!
