ERNIE Image is an open-source text-to-image AI model developed by Baidu, built on an 8B-parameter Diffusion Transformer (DiT). It is designed to generate images with accurate in-image text, structured layouts, and complex multi-object compositions.
Compared to most open-weight models, ERNIE Image performs better on text-heavy and layout-sensitive tasks. It includes a built-in Prompt Enhancer that expands short inputs into richer, structured prompts, improving output quality without manual prompt engineering.
The model runs on a single consumer GPU with 24GB VRAM, making it suitable for local deployment. Released under Apache 2.0, it can be freely used, modified, and deployed commercially without API limits.
- Apache 2.0 License
- 8B DiT Backbone