Floma  ·  Design Engineering  ·  Solo Build  ·  2025 – 2026

Beyond image generation: engineering creative intelligence

Most AI image generation projects focus on the generation. I focused on the direction. This is a complete creative intelligence system — designed, architected, and built entirely solo — that encodes brand aesthetics and campaign strategy into reusable infrastructure, then executes them at scale.

See how these visuals became production campaigns: AI Campaign Assembly →

20–30 Brand-specific assets per campaign
8 Enterprise brands
10minutes To generate a full campaign set

Core Problem

The obvious solution wasn't the right one

Marketing campaign production breaks down into three components: content creation, creative development, and production design. Content and production design are largely linear and deterministic — with structured prompts and modular assembly, they can be automated reliably. Creative development is different. It translates strategy into emotionally resonant visual abstractions, requiring judgment, context, and taste.

In the pursuit of end-to-end campaign automation, the biggest bottleneck wasn't generating visuals. It was deciding what visuals to generate.

Creative development sits between linear, deterministic stages — and behaves nothing like them.

AI image generation seemed like the obvious move. The latest models could render high-quality visuals instantly and at almost no cost. I tested them against real production requirements — brand consistency, strategic relevance, production volume — and found the constraint wasn't rendering quality.

It was direction. Without structure, output styles varied wildly run to run. Models could render what you described but couldn't decide what to render — they had no way to understand the abstract, metaphorical nature of enterprise visual design. And scaling that process across a full campaign meant fragile, manual prompting for every single image.

Three failure modes, precisely defined:

Consistency is unreliable

Individual generation varies too much to maintain brand identity across dozens of assets. Without systematic style definition, results aren't repeatable.

No strategic visual thinking

Models render what you specify. They don't know what to specify. The gap between "generate an image" and "generate the right image for this campaign" is exactly where human art direction lives.

Manual prompting doesn't scale

A single strong image often took multiple iterations. Multiply that across an entire campaign and the process collapses under its own weight.

Same prompt. Multiple generations. No two outputs consistent.

Research & Insights

Separating style from composition

The breakthrough wasn't a better prompt. It was a different architecture.

Testing revealed that JSON-structured prompts produced more consistent results than natural language — but the real unlock came from recognizing that style and composition are fundamentally separable concerns. Two halves of every image, treated as two independent modules.

Style as infrastructure

A brand's aesthetic DNA — color, material, lighting, shape, space — extracted once from reference imagery into a reusable JSON schema.

Composition as creative variation

Composition stays unique per image — broad concept variation explored inside the same brand world, without losing aesthetic cohesion.

By separating these two modules, brand definition becomes infrastructure. When combined in a single generation call, they produce on-brand, campaign-specific imagery at scale — without re-describing aesthetic intent or resending reference images every time.

System Architecture

Building an AI art director

With style and composition separated conceptually, the next step was turning that separation into a controllable system. The architecture has two interconnected layers: one that defines how a brand looks, and one that determines what it should show.

Step 01

Style extraction & refinement

I built a style extraction layer that converts reference imagery into structured, reusable brand definitions. Visual identity is encoded once and reused indefinitely. Injectable JSON media adapters modify extraction behavior across formats — photography, illustration, 3D, product UI — allowing format-specific nuance without changing the underlying architecture.

The system produces three structured artifacts:

Style definition

Encodes aesthetic behavior — color logic, material qualities, shape language, lighting tendencies, and compositional constraints.

Creative brief prompt

Guides downstream campaign planning by defining how cohesive image sets should be structured within this aesthetic world.

Composition guide

Defines how single-image prompts should be written in this style — subject types, visual devices, abstraction level, and complexity.

Reference imagery in. Style definition, creative brief, and composition guide out — a brand encoded once.

Human-in-the-loop review

Style extraction is iterative. Users validate definitions by generating preview outputs, then adjust tone, materiality, or composition using high-level conversational feedback.

For deeper corrections, a smart refine step compares generated outputs against the original references and proposes targeted updates to the underlying system prompts.

AI proposes adjustments. Human judgment decides. This preserves discernment while enabling speed.

Step 02

Creative execution at scale

With a reusable style defined, the workflow transitions from aesthetic infrastructure to campaign execution. Users initiate a new generation session by providing campaign context — messaging frameworks, ad concepts, landing page copy, or other strategic inputs. The system enriches that context with lightweight company research, incorporating relevant product, persona, and market signals.

Translating narrative into visual direction

This is the inflection point where narrative becomes visual direction. From the combined input, the system synthesizes a structured creative brief — translating messaging strategy into a cohesive visual plan that defines tone, themes, recurring objects, motifs, environmental logic, and a modular shot list for end-to-end coverage. Users review and refine the brief conversationally before moving forward.

Campaign context in. A structured creative brief out — the inflection point where narrative becomes visual direction.

Once the brief is approved, the system transitions into a dynamic workspace where concepts and images evolve together. Concept generation, visual previews, refinement, variation, and export all happen in one environment. Rather than moving through rigid steps, users iterate fluidly — adjusting strategy, refining visuals, generating variations, and downloading final assets as they're ready.

Concept review and refinement

Conversational feedback applied across one or many concepts at once — previews refresh as ideas evolve.

Image review and refinement

Promote any concept to a polished image — generate variations, compare side-by-side, and export when it's right.

System extensions

The same core architecture supports a set of related capabilities — each one a different way of pointing the underlying style and composition engine at a new creative problem.

Full creative brief and shot list for a cohesive lifestyle photoshoot, then dozens of complementary images in a single pass. Instead of isolated assets, campaigns get a unified visual world — scenes that feel intentional, varied, and brand-specific.

Outcome

Five clients. Dozens of assets. Under twenty dollars.

20–30

Brand-specific assets per campaign

<$20

Generation cost per full set

5

Enterprise clients

We used this system to generate campaign visuals for clients across the spectrum — from early-stage startups like Wingspan to enterprise brands like Checkmarx, SolarWinds, Salesforce, and R1.

For startups, it unlocked custom, campaign-specific illustration systems without the cost of traditional production. For enterprise teams, it enabled visual personalization at a scale that previously required a dedicated design team. Same system, two different value propositions.

Production time compressed from days per visual to hours per campaign. The breakthrough wasn't image generation itself — it was encoding art direction into infrastructure that could scale without losing intent.

The architecture behind this system isn't limited to image generation. The core model — separating aesthetic infrastructure from composition, translating strategy into directed concepts, scaling execution under human oversight — applies across formats, channels, and media. That's the principle Floma was built on, and it's what this system proved.

Checkmarx campaign — a grid of cybersecurity 3D icons generated in the brand's defined visual style

Same system, four distinct brand worlds — each generated from its own style definition.