DreamLite

3 min read Original article ↗

DreamLite Logo

A Lightweight On-Device Unified Model for Image Generation and Editing

On-Device Demo

Real-time generation & editing on iPhone 17 Pro — no cloud, fully on-device.

Human Portrait & Style Transfer

Nature Landscape & Background Change

About DreamLite

In this paper, we propose DreamLite, a compact unified on-device diffusion model (0.39B) that supports both T2I generation and text-guided image editing within a single network. DreamLite is built on a pruned mobile U-Net backbone and unifies conditioning through In-Context spatial concatenation in the latent space. To stabilize the training of this compact model, we introduce a Task-Progressive Joint pretraining strategy that sequentially targets T2I, editing, and joint tasks. After SFT and RL, DreamLite outperforms existing on-device models and remaining competitive with several server-side models in both generation and editing tasks. By employing step distillation, we further achieve 4-step inferencing, enabling our DreamLite could generate or edit a 1024 × 1024 image in ~3s (using 4-bit Qwen VL and fp16 VAE+UNet) on a iPhone17 pro.

Our contributions are summarized as follows:

  • We propose, to the best of our knowledge, the first unified on-device model that supports both text-to-image generation and text-based image editing, eliminating the need to deploy two separate models.
  • We introduce an in-context conditioning mechanism for UNet to unify generation and editing, and propose a task-progressive joint pretraining scheme (i.e., T2I → Edit → Unified Joint Training) to stably train the model.
  • DreamLite achieves competitive performance on standard benchmarks and consistently outperforms prior mobile models. After deployment on mobile device, DreamLite could generate or edit a 1024 × 1024 image in less than 5s.

Model Architecture

Overview of the proposed framework and its key components.

Model Architecture
Figure 1. Overall architecture of DreamLite.

Visual Results

Training Pipeline
Figure 2. Generation and Editing Results on Mobile Device.

Main Results

Table 1. Comparison with existing methods on GenEval, DPG, ImgEdit and GEdit-EN Benchmarks.

Method Params GenEval ↑ DPG ↑ ImgEdit ↑ GEdit-EN-Q ↑
FLUX.1-Dev / Kontext12B0.6784.03.766.79
BAGEL7B0.8285.13.427.20
OmniGen24B0.8083.63.446.79
LongCat-Image / Edit6B0.8786.64.497.55
DeepGen1.02B0.8384.64.037.54
SANA-1.6B1.6B0.6784.8--
MEISSONIC1B0.5465.3--
VIBE1.6B--3.857.28
SANA-0.6B0.6B0.6483.6--
SnapGen++ (small)0.4B0.6685.2--
EditMGT0.96B--2.896.33
DreamLite (Ours)0.39B0.7285.84.116.88

Table 2. Ablation study on GenEval and ImgEdit benchmarks. "TPJ" denotes "Task-progressive Joint".

Experiments Mechanism Training Stage GenEval ↑ ImgEdit ↑
Text-to-image Pretraining0.70-
Condition MechanismPix2PixT2I → Edit0.563.67
Pix2PixT2I → Edit → Unified0.613.65
Training StrategyIn-contextT2I → T2I0.65-
In-contextT2I → Edit0.643.88
In-contextT2I → Unified0.653.14
In-contextT2I → Edit → Unified0.713.94
Reinforcement LearningIn-contextTPJ Pretrain → RLHF0.724.11
Step DistillationIn-contextTPJ Pretrain → RLHF → DMD0.703.8

Roadmap & Contact

Our release plan and how to reach us.

BibTeX

@article{feng2026dreamlite,
  title={DreamLite: A Lightweight On-Device Unified Model for Image Generation and Editing},
  author={Kailai Feng and Yuxiang Wei and Bo Chen and Yang Pan and Hu Ye and Songwei Liu and Chenqian Yan and Yuan Gao},
  journal={arXiv preprint arXiv:2603.28713},
  year={2026}
}