What Is ERNIE ImageWhat Is ERNIE Image?
The straight answer — model architecture, what it's built for, and where it fits.
ERNIE Image is an open-source text-to-image generation model developed by the ERNIE team at Baidu. It uses an 8-billion-parameter single-stream Diffusion Transformer (DiT) and ships with a lightweight Prompt Enhancer that expands short user inputs into richer, structured descriptions before generation.
The model is designed for practical deployment — it runs on a single consumer GPU with 24G VRAM, not a cluster. Despite the compact parameter count, it reaches state-of-the-art performance among open-weight text-to-image models across several benchmarks.
It's released under Apache 2.0. That means the weights are free to download, use commercially, fine-tune, and redistribute — with no API dependency and no usage quota.