Floma · Design Engineering · Solo Build · 2025 – 2026
Beyond image generation: engineering creative intelligence
Most AI image generation projects focus on the generation. I focused on the direction. This is a complete creative intelligence system — designed, architected, and built entirely solo — that encodes brand aesthetics and campaign strategy into reusable infrastructure, then executes them at scale.
See how these visuals became production campaigns: AI Campaign Assembly →
Core Problem
The obvious solution wasn't the right one
Marketing campaign production breaks down into three components: content creation, creative development, and production design. Content and production design are largely linear and deterministic — with structured prompts and modular assembly, they can be automated reliably. Creative development is different. It translates strategy into emotionally resonant visual abstractions, requiring judgment, context, and taste.
In the pursuit of end-to-end campaign automation, the biggest bottleneck wasn't generating visuals. It was deciding what visuals to generate.
Creative development sits between linear, deterministic stages — and behaves nothing like them.
AI image generation seemed like the obvious move. The latest models could render high-quality visuals instantly and at almost no cost. I tested them against real production requirements — brand consistency, strategic relevance, production volume — and found the constraint wasn't rendering quality.
It was direction. Without structure, output styles varied wildly run to run. Models could render what you described but couldn't decide what to render — they had no way to understand the abstract, metaphorical nature of enterprise visual design. And scaling that process across a full campaign meant fragile, manual prompting for every single image.
Three failure modes, precisely defined:
Same prompt. Multiple generations. No two outputs consistent.
Research & Insights
Separating style from composition
The breakthrough wasn't a better prompt. It was a different architecture.
Testing revealed that JSON-structured prompts produced more consistent results than natural language — but the real unlock came from recognizing that style and composition are fundamentally separable concerns. Two halves of every image, treated as two independent modules.
Style as infrastructure
A brand's aesthetic DNA — color, material, lighting, shape, space — extracted once from reference imagery into a reusable JSON schema.
Composition as creative variation
Composition stays unique per image — broad concept variation explored inside the same brand world, without losing aesthetic cohesion.
By separating these two modules, brand definition becomes infrastructure. When combined in a single generation call, they produce on-brand, campaign-specific imagery at scale — without re-describing aesthetic intent or resending reference images every time.
System Architecture
Building an AI art director
With style and composition separated conceptually, the next step was turning that separation into a controllable system. The architecture has two interconnected layers: one that defines how a brand looks, and one that determines what it should show.
Style extraction & refinement
I built a style extraction layer that converts reference imagery into structured, reusable brand definitions. Visual identity is encoded once and reused indefinitely. Injectable JSON media adapters modify extraction behavior across formats — photography, illustration, 3D, product UI — allowing format-specific nuance without changing the underlying architecture.
The system produces three structured artifacts:
Style definition
Encodes aesthetic behavior — color logic, material qualities, shape language, lighting tendencies, and compositional constraints.
Creative brief prompt
Guides downstream campaign planning by defining how cohesive image sets should be structured within this aesthetic world.
Composition guide
Defines how single-image prompts should be written in this style — subject types, visual devices, abstraction level, and complexity.
Reference imagery in. Style definition, creative brief, and composition guide out — a brand encoded once.
Human-in-the-loop review
Style extraction is iterative. Users validate definitions by generating preview outputs, then adjust tone, materiality, or composition using high-level conversational feedback.
For deeper corrections, a smart refine step compares generated outputs against the original references and proposes targeted updates to the underlying system prompts.
AI proposes adjustments. Human judgment decides. This preserves discernment while enabling speed.
Creative execution at scale
With a reusable style defined, the workflow transitions from aesthetic infrastructure to campaign execution. Users initiate a new generation session by providing campaign context — messaging frameworks, ad concepts, landing page copy, or other strategic inputs. The system enriches that context with lightweight company research, incorporating relevant product, persona, and market signals.
Translating narrative into visual direction
This is the inflection point where narrative becomes visual direction. From the combined input, the system synthesizes a structured creative brief — translating messaging strategy into a cohesive visual plan that defines tone, themes, recurring objects, motifs, environmental logic, and a modular shot list for end-to-end coverage. Users review and refine the brief conversationally before moving forward.
Campaign context in. A structured creative brief out — the inflection point where narrative becomes visual direction.
Once the brief is approved, the system transitions into a dynamic workspace where concepts and images evolve together. Concept generation, visual previews, refinement, variation, and export all happen in one environment. Rather than moving through rigid steps, users iterate fluidly — adjusting strategy, refining visuals, generating variations, and downloading final assets as they're ready.
Concept review and refinement
Conversational feedback applied across one or many concepts at once — previews refresh as ideas evolve.
Image review and refinement
Promote any concept to a polished image — generate variations, compare side-by-side, and export when it's right.
System extensions
The same core architecture supports a set of related capabilities — each one a different way of pointing the underlying style and composition engine at a new creative problem.
Full creative brief and shot list for a cohesive lifestyle photoshoot, then dozens of complementary images in a single pass. Instead of isolated assets, campaigns get a unified visual world — scenes that feel intentional, varied, and brand-specific.
Any image becomes a launch point for exploring alternate poses, color systems, compositional shifts, or environmental changes under defined parameters.
Existing images — product UI screenshots, outdated graphics — ingested and reinterpreted in a defined style while maintaining structural accuracy. Stylized 3D renders, contextualized product visuals, or aesthetic transformations, all without losing structural fidelity.
Static visuals transformed into 4–8 second animations using the same style definitions and compositional logic. The system generates animation concepts, keyframes, and motion prompts to extend brand consistency into video.
Outcome
Five clients. Dozens of assets. Under twenty dollars.
20–30
Brand-specific assets per campaign
<$20
Generation cost per full set
5
Enterprise clients
We used this system to generate campaign visuals for clients across the spectrum — from early-stage startups like Wingspan to enterprise brands like Checkmarx, SolarWinds, Salesforce, and R1.
For startups, it unlocked custom, campaign-specific illustration systems without the cost of traditional production. For enterprise teams, it enabled visual personalization at a scale that previously required a dedicated design team. Same system, two different value propositions.
Production time compressed from days per visual to hours per campaign. The breakthrough wasn't image generation itself — it was encoding art direction into infrastructure that could scale without losing intent.
The architecture behind this system isn't limited to image generation. The core model — separating aesthetic infrastructure from composition, translating strategy into directed concepts, scaling execution under human oversight — applies across formats, channels, and media. That's the principle Floma was built on, and it's what this system proved.
Same system, four distinct brand worlds — each generated from its own style definition.