← Back to blog

explainx / blog

Ideogram 4.0: Open-Weight Image Generation — How to Run, API & JSON Prompts (2026)

Ideogram 4.0 launched June 3, 2026 with open weights, native 2K output, and JSON-first prompting. Step-by-step for the API, local inference on a 24GB GPU, magic-prompt, bounding-box layout, and how it compares to FLUX and GPT Image.

·11 min read·Yash Thakker
IdeogramImage GenerationOpen Source AIDesign AIDiffusion Models
Ideogram 4.0: Open-Weight Image Generation — How to Run, API & JSON Prompts (2026)

On June 3, 2026, Ideogram released 4.0 — its first open-weight frontier text-to-image model. The weights are on GitHub and Hugging Face. The hosted API is live at developer.ideogram.ai.

The headline is not just "another open diffusion model." Ideogram 4.0 closes the quality gap between proprietary frontier image models and the open ecosystem on the axes that matter for production design work: typography in scene, deterministic layout, and 2K photoreal output. CEO Mohammad Norouzi put it directly: "The hardest problems at the forefront of design generation — headline-grade typography, deterministic layout, branded layered output — need a foundation engineered for them."

This guide covers what shipped, how the architecture differs from unified multimodal stacks, and how to run Ideogram 4.0 — via API, CLI, and self-hosted inference.

Quick reference

DetailValue
Release dateJune 3, 2026
Parameters9.3B
ArchitectureFlow-matching DiT, single-stream, Qwen3-VL-8B text encoder
Max resolution2048×2048 (multiples of 16, aspect ratios up to 6:1)
Open weightsideogram-oss/ideogram4
Checkpointsideogram-4-nf4 (24GB GPU) · ideogram-4-fp8
API endpointPOST https://api.ideogram.ai/v1/ideogram-v4/generate
API pricingTurbo $0.03 · Default $0.06 · Quality $0.10 per image
Prompt formatJSON-first (plain text via magic-prompt expansion)
GitHub stars2,100+ (as of June 2026)

Jump to the path you need:


What Ideogram 4.0 ships today

Three capabilities anchor the release, per Ideogram's press release:

1. Text rendering at production fidelity

Ideogram has led on in-scene typography since its 2023 launch. Version 4.0 extends that with multilingual support, denser type at smaller scales, and reliable rendering of headlines, packaging copy, and signage. In a ContraLabs blind evaluation judged by ten professional designers, Ideogram 4.0 was picked as best 47.9% of the time — ahead of Gemini 3.1 Flash Image Preview (30.0%), FLUX.2 [max] (15.5%), and Grok Imagine 1.0 (15.0%).

2. Bounding-box layout control

You specify where a logo, headline, callout, or subject belongs on the canvas using normalized [y_min, x_min, y_max, x_max] coordinates on a 0–1000 grid. Layout is directed by the brief, not sampled and corrected afterward.

3. Photoreal output at 2K

Native support for resolutions from 256 to 2048 (multiples of 16), with aspect ratios up to 6:1. For highest quality locally, the README recommends --height 2048 --width 2048 --sampler-preset V4_QUALITY_48.

Layer-based roadmap

Most professional design work is not a single pixel layer. Ideogram 4.0 is the start of a generation stack:

CapabilityStatus
Transparent background cutoutsAvailable via Background Remover API
Editable text + movable image layersFollow-up 4.0 release
Branded assets (typography, palette, logo fidelity)Scheduled
Live WorkshopAug 1–2, 2026 · 2 days

Claude for Work

Use Claude as a thought partner for writing, research & decisions — no coding required. 2 live sessions with Yash Thakker.

Register now

Claude for Work is a 2-day live workshop on using Claude to supercharge your daily work — writing, research, analysis, and decision-making — without any coding required. Learn how to set up Claude Projects with custom instructions, run deep-research sprints, co-write documents that sound like you, and build repeatable prompt systems for your team. August 1–2, 2026. Hosted by Yash Thakker, founder of AISOLO Technologies, instructor to 350,000+ students.

Includes 1-year access to all session recordings, a personal prompt library, Discord community access, and a certificate of completion. No coding or technical background required. Designed for managers, marketers, founders, and writers.


Architecture: a specialized foundation, not a unified multimodal model

Ideogram 4.0 is a foundation model trained entirely from scratch — not a fine-tune or distillation of any existing checkpoint. Key architectural choices from the GitHub README:

ComponentDetail
Backbone34-layer single-stream Diffusion Transformer (DiT) — text and image tokens in one unified sequence
Text encoderQwen3-VL-8B-Instruct — hidden states from 13 intermediate layers concatenated
Training objectiveFlow matching
GuidanceDual-branch classifier-free guidance (independent positive/negative refinement)
Training data formatStructured JSON captions exclusively

The bet is explicit: unified multimodal models (GPT Image, Gemini) are strong generalists, but headline-grade typography, deterministic layout, and brand fidelity require a foundation engineered for design specifically. At 9.3B parameters, Ideogram 4.0 delivers the best text rendering of any open-weight release Ideogram benchmarked — ahead of Qwen-Image (20B), FLUX.2 [dev] (32B), and HunyuanImage 3.0 (80B MoE).

For a general primer on how diffusion image models work under the hood, see our diffusion explainer.


Benchmarks: where Ideogram 4.0 ranks

BenchmarkResult
Design Arena (overall)Top open-weight model; trails only proprietary GPT and Gemini
Design Arena (open-weight only)#1 by commanding margin
ContraLabs typography (1st-place win rate)47.9%
ContraLabs "would use in client work"3.55 / 5
LMArena text-to-imageTop open-weight lab, top-5 overall
7Bench (layout control)Better than all closed-source models tested
Internal human-preference (design + photography)#2 overall — behind only GPT Image 2 medium

The pattern is consistent: Ideogram 4.0 is the best open-weight image model by far, and sits at the frontier of design-oriented generation.


How to run Ideogram 4.0 via the API

The fastest path for production pipelines. No GPU required.

Step 1: Get an API key

  1. Sign up at developer.ideogram.ai
  2. Add payment method in the API Dashboard (billing is separate from the Ideogram app subscription)
  3. Copy your Api-Key

Step 2: Generate your first image

Python:

import requests

response = requests.post(
    "https://api.ideogram.ai/v1/ideogram-v4/generate",
    headers={"Api-Key": "<your-api-key>"},
    json={
        "text_prompt": "A poster for a summer design conference with bold sans-serif typography",
        "rendering_speed": "DEFAULT",
        "aspect_ratio": "ASPECT_16_9",
    },
)

image = response.json()["data"][0]
print(image["url"])

cURL:

curl -X POST https://api.ideogram.ai/v1/ideogram-v4/generate \
  -H "Api-Key: <your-api-key>" \
  -H "Content-Type: application/json" \
  -d '{
    "text_prompt": "A poster for a summer design conference",
    "rendering_speed": "TURBO"
  }'

TypeScript:

const res = await fetch("https://api.ideogram.ai/v1/ideogram-v4/generate", {
  method: "POST",
  headers: {
    "Api-Key": "<your-api-key>",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    text_prompt: "A poster for a summer design conference",
    rendering_speed: "DEFAULT",
  }),
});

const { data } = await res.json();
console.log(data[0].url);

API pricing and speed tiers

Rendering speedPrice per imageUse case
TURBO$0.03Rapid prototyping, A/B testing
DEFAULT$0.06Daily production work
QUALITY$0.10Final delivery assets

No subscription required. Default rate limit: 10 in-flight requests. For higher throughput, contact [email protected].

Important: Image URLs are ephemeral — download and store results in your own system immediately after generation.


How to run Ideogram 4.0 locally (CLI)

Self-host when you need gradients, fine-tuning, or air-gapped inference.

Prerequisites

  • CUDA GPU with 24GB VRAM (NF4 checkpoint) or broader hardware (FP8)
  • Python 3.10+
  • Hugging Face account with accepted license gate

Step 1: Clone and install

git clone https://github.com/ideogram-oss/ideogram4.git
cd ideogram4
pip install .

For development, use editable mode: pip install -e .

Step 2: Accept the license gate and authenticate

  1. Open ideogram-ai/ideogram-4-nf4 on Hugging Face
  2. Click Agree and access repository
  3. Authenticate:
hf auth login
# or: export HF_TOKEN="hf_..."

Without this step, downloads fail with 404 / GatedRepoError.

Step 3: Generate with plain-text prompt

Plain --prompt is expanded into structured JSON by magic-prompt — Ideogram's hosted LLM expansion, which is free and requires only an API key:

export IDEOGRAM_API_KEY="your_key_from_developer.ideogram.ai"

python run_inference.py \
  --prompt "a ginger cat wearing a tiny wizard hat reading a spellbook" \
  --output out.png \
  --quantization "nf4" \
  --magic-prompt-key "$IDEOGRAM_API_KEY"

Step 4: Max quality settings

For 2K output with the quality sampler preset:

python run_inference.py \
  --prompt "a campaign poster with clean sans-serif typography" \
  --output poster.png \
  --quantization "nf4" \
  --height 2048 \
  --width 2048 \
  --sampler-preset V4_QUALITY_48 \
  --magic-prompt-key "$IDEOGRAM_API_KEY"

Optional: safety screening with Hive

For production deployments, enable prompt and output moderation via Hive:

export HIVE_TEXT_MODERATION_KEY="..."
export HIVE_VISUAL_MODERATION_KEY="..."

python run_inference.py \
  --prompt "an isometric illustration of a tiny city floating in the clouds" \
  --output out.png \
  --quantization "nf4" \
  --magic-prompt-key "$IDEOGRAM_API_KEY" \
  --hive-text-key "$HIVE_TEXT_MODERATION_KEY" \
  --hive-visual-key "$HIVE_VISUAL_MODERATION_KEY"

Model checkpoints

CheckpointQuantizationHardwareDiffusers
ideogram-4-nf4NF4CUDA (24GB)Yes
ideogram-4-fp8FP8AllNo

See docs/inference.md for sampler presets, parameter reference, and optimization tips.


JSON prompting: the format that matters

Ideogram 4.0 was trained exclusively on structured JSON captions. Plain text works — but JSON is the native language.

Why JSON-only training?

From the prompting guide:

We train exclusively on JSON so that training and inference share a single, common prompt format. The training captions themselves are deliberately extremely descriptive: each JSON exhaustively describes everything in the image.

Plain-text prompts create train/eval mismatch. JSON mirrors the training distribution and unlocks full model quality.

The caption schema (three top-level fields)

{
  "high_level_description": "A clean business card layout for a tech startup.",
  "style_description": {
    "aesthetics": "minimal, professional, geometric",
    "lighting": "even, diffuse studio lighting",
    "medium": "graphic_design",
    "art_style": "flat vector design, generous whitespace, sans-serif typography",
    "color_palette": ["#FFFFFF", "#F0F0F0", "#333333", "#0066FF", "#00CC88"]
  },
  "compositional_deconstruction": {
    "background": "A solid off-white card surface with subtle paper texture.",
    "elements": [
      {
        "type": "text",
        "text": "ACME TECH",
        "desc": "Bold dark grey sans-serif company name across the upper third."
      },
      {
        "type": "text",
        "text": "[email protected]",
        "desc": "Small blue sans-serif contact email near the bottom."
      }
    ]
  }
}
FieldRequiredPurpose
high_level_descriptionStrongly recommendedOne- or two-sentence summary
style_descriptionOptionalAesthetics, lighting, medium, color palette
compositional_deconstructionRequiredBackground + spatial elements

Element types: "obj" for objects/subjects, "text" for in-image text (include a text field with the literal string to render).

Magic-prompt: JSON without writing JSON

Don't want to hand-write captions? Magic-prompt expands plain text into full structured JSON before generation.

Three backends ship in the repo:

ConfigRegistry keyBackend
Ideogram4MagicPromptV1ideogram-4-v1Ideogram hosted API (free)
ClaudeOpusMagicPromptV1claude-opus-v1OpenRouter
ClaudeSonnetMagicPromptV1claude-sonnet-v1OpenRouter

The hosted ideogram-4-v1 backend is the default in run_inference.py and only needs IDEOGRAM_API_KEY. The magic-prompt system prompts are open source in src/ideogram4/magic_prompt_system_prompts/.

Via the API, two endpoints scaffold the JSON workflow:

EndpointPurpose
POST /v1/ideogram-v4/magic-promptConvert plain text → structured json_prompt
POST /v1/ideogram-v4/describeUpload a reference image → structured JSON prompt (preserves bboxes optionally)

Practical workflow: Start with text_prompt for fast ideation. Migrate to json_prompt once layout precision, brand hex colors, or multi-line typography matter.


Bounding-box layout and color palettes

Spatial control with bbox

Each element can include a bounding box in normalized 0–1000 coordinates (origin top-left):

{
  "type": "text",
  "bbox": [100, 200, 300, 800],
  "text": "SUMMER SALE",
  "desc": "Large bold red headline across the upper center of the poster."
}

Format: [y_min, x_min, y_max, x_max]. This is native to the model — no ControlNet pipeline required.

Color palette conditioning

Steer dominant colors with hex codes in style_description.color_palette:

"color_palette": ["#1B1B2F", "#162447", "#1F4068", "#E43F5A", "#F5F5F5"]

Rules from the prompting guide:

  • Up to 16 colors in style_description.color_palette
  • Up to 5 colors per element
  • Uppercase hex only#RRGGBB form (not #fff or lowercase)
  • Include both highlight and shadow colors for controlled lighting

On 7Bench (layout control), Ideogram 4.0 scored significantly better than all closed-source models tested — the bbox + palette system is the differentiator.


API endpoints beyond generate

The Ideogram API is not just text-to-image. Full capability list from ideogram.ai/api-learn:

CapabilityEndpoint familyNotes
Generate/v1/ideogram-v4/generateText or JSON prompt → image
Transparent backgroundsv4 endpointsNative alpha cutouts
Edit with promptv3 endpointsDescribe changes in plain language
Remixv3 endpointsReimagine with image_weight control
Reframev3 endpointsExtend to new aspect ratio
Remove backgroundv4 endpointsClean cutout in one call
Layerized textv3 endpointsPull editable text layers
Custom modelsTraining + generateFine-tune on brand assets
UpscaleUpscale endpointRaise resolution for delivery
Magic-prompt/v1/ideogram-v4/magic-promptPlain text → JSON caption
Describe/v1/ideogram-v4/describeImage → JSON caption

Ideogram 4.0 also supports MCP for agent workflows — useful if you're wiring image generation into coding agents or design automation pipelines. For agent harness concepts, see our Agent Harness guide.


When to use API vs local vs the app

SurfaceBest forTrade-off
Ideogram appHands-on creation, iteration, editingSubscription credits; no programmatic access
APIProduction pipelines, product integration, agentsPer-image cost; ephemeral URLs
Local (CLI)Fine-tuning, research, air-gapped, unlimited gen24GB GPU; magic-prompt still needs API key (free)
ComfyUINode-based visual workflowsRequires ComfyUI 0.24.0+ and image_ideogram4_t2i.json template

For most developers building image generation into a product, start with the API (Turbo at $0.03/image for prototyping). Move to local inference when you need custom fine-tunes, synthetic data pipelines, or on-premise deployment.

For comparison with other 2026 image models, see our posts on ChatGPT Images 2.0 / gpt-image-2 and the diffusion fundamentals guide.


Enterprise and commercial licensing

Open weights ship under Ideogram's commercial license. Key points from the press release:

  • Fine-tuning on brand data with weights, training data, and inference staying on customer infrastructure
  • Headquartered in Toronto and San Francisco — no embedded political alignment in weights
  • Commercial license tiers at ideogram.ai/licensing
  • Enterprise inquiries[email protected]

The NF4/FP8 Hugging Face checkpoints use a non-commercial license for the open release. Commercial use through the API or enterprise licensing is the production path.


Summary

Ideogram 4.0 is the most significant open-weight image release of 2026 for anyone who ships visual assets — not hobbyists generating cats, but teams that need readable type, controlled layout, and 2K fidelity.

Three things to remember:

  1. JSON is the native prompt format. Use magic-prompt for casual input; write JSON when layout and typography matter.
  2. Three ways in: API for products, CLI for research/self-hosting, app for hands-on design.
  3. It closes the open-vs-closed gap on design benchmarks while staying at 9.3B parameters — a fraction of FLUX.2 [dev]'s 32B.

Related reading

Official sources: Ideogram 4.0 press release · GitHub repo · API docs · Prompting guide · Technical blog

Model specs, API pricing, and benchmark numbers in this post reflect publicly available information as of June 20, 2026. Verify current pricing and license terms at ideogram.ai before production deployment.

Related posts