ERNIE Image — My Go-To Model for Local Generations

ERNIE Image is the best-in-class open model for generating images with text right now. You can produce mid-level detail infographics, posters, and tables with little to no artifacts — all on a local GPU. It's not as good as OpenAI's GPT Image 2, but it's more than enough for local iteration, A/B testing, and even publishing.

I've wanted a reliable image generation model that I can use with my coding agents for a while. This is part of a bigger project of mine to automate parts of my content creation process using agents. As many know, AI agents can be very costly if you rely on cloud apis as they are notorious for generating multiple images and requests. I didn't wanna worry about api costs when I'm doing these experiments, so i wanted a good local image generation model.

Flux models and the Z-image models were pretty good for generating stock type images and photo realistic images, but couldn't render perfect text. Ernie image is the solution to that, and its more than enough for ideation.

The problem? There are too many variants. BF16 - full model, NVFP4, OTF INT8, Turbo NVFP4, Turbo GGUF — each with different quantization, step counts, and speed profiles. I was stuck trying to balance model size, generation speed, and final quality, so I ran a controlled comparison to find the real-world tradeoffs.

For more info on why other quant variants weren't used, read on.

I've included the ComfyUI workflows I used so you can reproduce these results:

At a Glance — Which Variant Should You Use?

Variant	Best For	Recommended Steps	Speed (s/it)
BF16	Best quality — infographics, posters, publishing	40	~5.6
OTF INT8	Fast iteration, close to BF16 quality	32–40	~2.5
Turbo INT8	Low-resource quantized drafts	8	~4.0
Turbo NVFP4	Quick drafts when text isn't critical	default	fast
NVFP4	Low VRAM / storage only	40	~4.6

TLDR; below i've shared detailed performance, outputs and my thoughts on each model, but in short, I'd use Ernie Image Turbo (INT8) variant for quick iterations and design finalizing. Once i like the prompt and the design it gives me, I'll use the Ernie Image BF16 (full model) with 40 steps to render the final image to use in the video. The turbo version model would render layouts and texts imperfectly, but the design is very similar to what the full model will give me.

The Test Setup

All images use the same coastal poster prompt so every variant starts from the same creative direction. The composite grids below let you compare across models at a glance; individual panels follow so you can inspect details.

Cross-Model Comparison — Max Steps Per Variant

Here's every variant running at its highest available step count, stitched side-by-side:

The first thing you notice: BF16 @ 40 steps is the reference. It has clean text alignment, proper table layout, and no artifacts. OTF INT8 @ 40 is the closest runner-up — same layout, slightly softer text. NVFP4 @ 40 shifts the overall design (bolder headings, different spacing), and both turbo variants trade layout quality for speed.

Individual panels

Turbo NVFP4 — fast but the layout diverges from the reference. Best for rapid iteration where text precision isn't the goal.

Turbo GGUF INT8 (8 steps) — the most compressed variant. Usable for drafts but not production.

BF16 @ 40 steps — reference quality. Clean tables, correct text alignment, no artifacts. This is what you judge everything else against.

NVFP4 @ 40 steps — layout is different from BF16 (bolder headings, shifted spacing). Fewer artifacts than its 20-step counterpart, but the design drift is real.

OTF INT8 @ 40 steps — closest to BF16 of all the quantized variants. Slight softening on fine text but the table structure and layout are preserved.

BF16 — Step Sweep

The full-precision model at 20, 30, and 40 steps:

20 steps produces duplicate table columns — the model can render clear text but hasn't converged on the correct layout logic. Both 30-step runs introduce text artifacts in the table body. 40 steps is the sweet spot: clean layout, correct table structure, no smearing.

Individual panels

20 steps — text is readable but the table has duplicate columns. The model hasn't figured out the right structure yet.

30 steps (A) — text artifacts appear in the table. Layout is better than 20 but not clean.

30 steps (B) — same artifacts, different seed. Confirms 30 steps is inherently unstable for this model.

40 steps — the sweet spot. Clean tables, correct text alignment, no artifacts.

NVFP4 — Step Sweep

NVFP4 at 20 and 40 steps:

At 20 steps there are clear text artifacts and the overall design drifts toward bolder headings. 40 steps cleans up most text issues, but the layout still differs from the BF16 reference.

Individual panels

20 steps — text artifacts, bolder redesign of headings, still coherent but not the same layout as BF16.

40 steps — fewer artifacts but the design still diverges from BF16. Not meaningfully faster per iteration (~4.6 s/it vs ~5.6 for BF16), so the tradeoff is hard to justify unless you need to save VRAM.

OTF INT8 — Step Sweep

On-the-fly INT8 quantization at 32 and 40 steps:

OTF INT8 is the surprise winner among the quantized variants. At ~2.5 s/it it's roughly 2× faster than BF16, and the quality is much closer to the reference than NVFP4. At 32 steps the zebra-stripe table pattern breaks in the second-last row. At 40 steps the pattern is fixed but a few misspellings and minor artifacts appear.

Individual panels

32 steps — broken alternating row pattern in the table. Otherwise solid. Great for quick checks during iteration.

40 steps — fixes the table pattern but introduces a few text artifacts and misspellings. Still the best quality-per-second tradeoff.

Turbo Variants

Two turbo quant formats on the coastal poster prompt:

The Turbo NVFP4 has a distinct layout compared to the full model. Turbo GGUF INT8 at 8 steps is noticeably rougher but completes in seconds. Useful for compositional drafts before committing to a full model render.

Turbo NVFP4

Turbo GGUF INT8 (8 steps)

Loading BF16 as INT8 — Custom Node Setup

One of my favorite optimizations is using the on-the-fly INT8 quantization approach from the ComfyUI-INT8-Fast custom node. You load the full BF16 model but the node quantizes it to INT8 on-the-fly during inference — no separate model file needed.

Here's what the node setup looks like:

This is the setup I used for all OTF INT8 runs. The workflow JSON for this configuration is ernie_image_full_int8.json.

Product Ad — Bonus Comparison

A different prompt (product advertisement) between BF16 @ 50 steps and Turbo @ 8 steps:

Same story: BF16 delivers clean, publishable output. Turbo gets you a composition preview 8× faster.

Individual panels

BF16 @ 50 steps — clean text, proper layout, publishable.

Turbo @ 8 steps — rough but usable for layout iteration. 1.02 s/it.

Takeaways

BF16 @ 40 steps is your default for anything that matters. It's the reference quality point.
OTF INT8 @ 32–40 steps is the best speed-quality compromise — 2× faster, visually close, and the artifacts are minor enough that you can iterate quickly and only do a final BF16 render.
Skip NVFP4 unless you're VRAM-constrained. It's not meaningfully faster than BF16 and the output diverges in layout and design.
Turbo variants are for drafts and compositional previews. Use them to block out ideas before committing a full model run.

For anything that needs realistic people or scenes I still reach for Flux / Krea / Z-image Turbo. But for text — posters, tables, infographics, product shots with copy — ERNIE Image is the best local option right now.

References

ERNIE-Image-Quantized on HuggingFace
ComfyUI-INT8-Fast — on-the-fly INT8 quantization approach
BF16 workflow
BF16-as-INT8 (OTF) workflow
Turbo INT8 workflow