Spec-Driven Creative: How Structured Specs Transformed My AI Workflow

Mar 12, 2026

A few months ago, I was spending more time correcting AI output than it was fun to be doing. I’d give Claude a clear description of what I wanted — a feature, an article outline, a marketing brief — and get back something that was technically competent but fundamentally off. Not wrong, exactly. Just not what I had in mind. The kind of output that requires twenty minutes of back-and-forth to get right, erasing whatever time I thought I was saving.

The frustrating part was the inconsistency. Some sessions were great. Others felt like starting over with someone who’d never heard of my project. I’d find myself saying “why did you do that?” more often than I’d like to admit — staring at a feature implementation that ignored constraints I thought were obvious, or a draft that sounded like it was written for a completely different audience.

Then I made a specific change, and the dynamic shifted. Not a better prompt template or a new model — a system of documents. A hierarchy of specs that give the AI structured intent to work against, instead of asking it to guess what I mean. I went from constant course-correction to something much closer to “approved, let’s do the next thing.”

I didn’t invent it. It’s called spec-driven development. But as I think you’ll soon see, spec-driven creative might be a better way to think about it.

The core principle

Spec-driven development is straightforward: instead of describing what you want in a single prompt or conversation, you build a hierarchy of documents that define your project at multiple levels. The AI references these documents as it works, and — critically — it can evaluate its own output against them. Instead of guessing at your intent, it has a north star for every decision.

The hierarchy looks like this:

PRD (Product Requirements Document) — defines the market, the user, the KPIs (key performance indicators), and the key features. This is the “what and why.”
Design spec — a foundational spec derived from the PRD that prevents drift across features. This is “how it should feel” — the consistent anchor that keeps everything cohesive as the project grows.
Feature specs — one per deliverable, each pointing back to the PRD and design spec. Each one is a focused reference point for a single piece of work.

Each layer flows from the one above. The feature specs reference the design spec. The design spec references the PRD. And the PRD captures your original thinking about what you’re building and why.

This isn’t a new idea. It’s a well-established one applied to a new domain.

Bertrand Meyer’s Design by Contract, introduced in the 1980s, formalized the principle that defining the contract before building the implementation produces more reliable software. Every module interaction has explicit obligations — preconditions, postconditions, invariants — and ambiguity in interfaces is what causes defects.

Dan North’s Behavior-Driven Development took a similar path in 2003: write specifications of desired behavior before code exists. The shift from thinking in tests to thinking in behavior changed how entire teams approached development. And the contract-first API development movement proved the same thing in a modern context — define the interface before the implementation, and you get better results with less rework.

The IEEE requirements standards codified decades of this kind of industry learning into engineering consensus: the quality of specifications directly determines the quality of what gets built.

The principle is consistent across all of these: specify first, build second.

What makes this particularly relevant now is that AI tools respond to structured context the same way software responds to clear contracts. Anthropic’s work on context engineering makes the case directly: the model is already smart enough — intelligence is not the bottleneck, context is. Building effective AI systems is mainly about designing the system so the model understands tasks clearly, accesses the right information, and responds consistently.

A well-written spec is context engineering in practice. You’re building the “world” the AI operates in — and the richer and more structured that world, the better the output. I wrote about this principle in The Synthetic Persona Protocol: before you create anything, you need to create the universe it lives in. Specs are that universe, made concrete and reusable. It’s the same idea I explored in Standing at the Edge — you can’t automate quality you haven’t defined. Specs are literally how you define it.

There’s a balance to strike, though. Specs need to be concise enough to point in the right direction, but not so dense that they fill up context windows and leave no room for the AI to reason. Over-specify — loading every interaction with pages of rules — and the AI gets buried in instructions. The sweet spot is enough structure to prevent drift, with enough openness for the AI to contribute its own reasoning.

The evidence backs this up. Industry analysis on structured prompting suggests that vague prompts leave roughly 42% of output quality on the table, and structured approaches reduce output variability by about 35%. And a randomized controlled trial by METR found something worth sitting with: experienced developers using AI tools without structured approaches actually took 19% longer to complete tasks than working without AI at all. These were people with an average of five years in their codebases — and the AI made them slower, not faster. Before starting, they predicted a 24% time savings. After finishing, they still believed that, despite the data showing the opposite. The tool alone isn’t enough. How you work with it determines whether it helps or hurts.

Building an app with specs

Let me make this concrete with an example from my own workflow.

Say you want to build a to-do app. The instinct with AI tools is to jump straight in: “Build me a to-do app with user accounts, task management, and an archive.” You’ll get something. It’ll probably run. But it won’t be what you had in mind — and the further you get, the more the drift compounds. Every new feature is a guess built on a previous guess.

Spec-driven development changes the starting point. Before any code gets written, you do the thinking.

First, the PRD — a clear, concise capture of what you’re building and why. Not a 30-page enterprise document. Just the market, the user, the success metrics, and the key features.

Next, the design spec — the visual language, interaction patterns, and design principles that hold consistent across every feature. I create this early because it prevents drift. When Claude builds the login screen, and then later builds the archive view, they both feel like they belong to the same application.

Then, feature specs. One for each piece: login, account maintenance, to-do management, archive. Each spec points back to the PRD for the “why” and the design spec for the “how it should feel,” while defining its own requirements, acceptance criteria (what “done” looks like), and edge cases (what happens when things go wrong).

Here’s what a condensed feature spec actually looks like — a few lines from the to-do app’s login spec:

Feature: User Login
References: PRD Section 2 (target user), Design Spec (form patterns, error states)
Purpose: Returning users authenticate to access their task list.
Requirements: Email/password login; session persists for 30 days; failed attempts show inline error with retry guidance.
Acceptance criteria: User can log in within two taps from launch; incorrect credentials display a specific error message (not generic “login failed”); session token refreshes silently.
Edge cases: Expired session mid-task saves draft locally; account locked after five failed attempts triggers email recovery flow.

A few lines, but every requirement traces back to the PRD and design spec, the acceptance criteria are testable, and the edge cases prevent the AI from guessing at failure states. Without this spec, Claude builds a login screen that technically works but handles errors generically and invents its own visual patterns. With it, the output matches intent on the first pass.

Here’s the part that changed my relationship with the work: you don’t have to come up with all of this yourself. I use Claude’s best practices documentation as a starting point — Anthropic explicitly recommends researching and planning before coding, noting that it “significantly improves performance for problems requiring deeper thinking upfront.” I bring that documentation into the project, provide my PRD, and tell Claude to work in a spec-driven way. It identifies features I hadn’t considered. It drafts specs for each one.

My job becomes reviewing and aligning — making sure everything matches my vision. Instead of writing code or directing every move, I’m thinking about what I’m building, capturing that thinking in specs, then evaluating AI output against my intent. This also works at team scale — shared specs create consistency that no amount of individual prompting can match.

The shift is from doing to defining and reviewing.

One thing I didn’t expect: this changes your weekly cadence. Specs need time to breathe. I capture thinking when I think best — early morning, during a walk — and use those notes as starter material for spec creation. Then I review what Claude generates. It’s not slower, but the rhythm is different, and the review time is what keeps quality high.

Creating content with specs

The same hierarchy applies to content creation — and for people who don’t write code, this is where spec-driven development might matter most.

Each article is a feature. Each marketing campaign is a feature. The spec cascade is identical: build the world first, define the specific piece, then let the AI work within those boundaries.

For content, the hierarchy maps like this:

PRD — who’s the audience, what’s the goal, what are you trying to convey. This captures the strategic intent behind the piece.
Voice and style spec — tone constraints, language guidelines, voice patterns, stylistic preferences. This is the content creator’s equivalent of the design spec. It prevents your AI-assisted writing from sounding generic or drifting off-brand across pieces.
Article spec — the specific piece’s blueprint: thesis, key points, structure, sources needed, call to action. This is the feature spec.

Let me walk through how this works in practice. Say you want to write a long-form Substack post about lessons learned from migrating your team to a new project management tool. You’ve lived through the experience, you have opinions, and you know other mid-level managers would benefit from hearing what actually happened.

Start with the PRD. Your audience is mid-level managers facing their own migration. Your goal is practical insights — not a product review, but an honest account of what worked and what didn’t. Your key points: the resistance you encountered, the rollout mistakes you made, and the specific things that got the team on board.

Next, the voice and style spec. Maybe you write conversationally, avoid corporate jargon, and lead with honest failures over polished success stories. Whatever your voice is, capture it. This spec persists across every piece you create — the anchor that keeps your writing sounding like you, whether you wrote it at 6 AM or the AI drafted it at your request.

Then, the article spec. Here’s what a condensed version looks like:

Article: Lessons From a Project Management Migration
References: PRD (mid-level manager audience), Voice Spec (first-person, honest about failures)
Thesis: Tool migrations fail when teams focus on features instead of workflows — here’s what actually worked.
Structure: Open with the disaster of week one (the specific Slack message from your PM saying “nobody is using this”); walk through the three changes that turned it around (workflow mapping session, champion network, 30-day parallel run); close with what you’d tell someone about to start their own migration.
Sources: Slack messages from weeks 1 and 6, adoption metrics (23% week one to 89% week eight), feedback quotes from two team leads.
Constraints: No product recommendations. Focus on process, not tools. Under 2,000 words.

You hand Claude the PRD, the voice and style spec, and the article spec. The output lands differently than if you’d just said “write me a post about migrating to a new project management tool.” Without specs, Claude writes generic migration advice — “communicate early and often, get executive buy-in, start with a pilot group.” With specs, it opens with the specific Slack message, cites the adoption numbers, and structures the piece around your actual three turning points. It sounds like you. It hits the points that matter.

This is building the world before asking the AI to work in it — the same principle from The Synthetic Persona Protocol, applied to content instead of personas. The domain changes. The hierarchy doesn’t.

Getting started

The cascade you’ve seen throughout this article — idea, PRD, design spec, feature specs — is the whole system. Here’s how to start.

Start with your idea. It can be rough. A voice memo from your commute or a paragraph scribbled during a meeting. You need a starting point, not a polished concept.

Hand it to the AI and build from there. Describe what you’re thinking, who it’s for, what problem it solves. Claude will structure that into a PRD — identifying gaps, suggesting considerations you missed, organizing rough thoughts into something coherent. From the PRD, it generates the design spec. From the design spec, it breaks work into feature specs. Anthropic’s prompting best practices recommend structuring inputs with clear instructions, context, task definitions, and output formats — the PRD creation step is where that structure takes shape.

Your job is reviewing and aligning. At every step, you’re shaping AI-generated specs until they match your intent. You don’t need to know how to write a PRD. You don’t need experience with design specs. The specs are the AI’s job. The vision is yours.

There’s an honest tradeoff. The upfront investment in specs feels slower than just jumping in and prompting. That’s real. It follows the same pattern I’ve noticed in every new skill I’ve picked up: initial excitement, a stretch of tedious setup that feels unproductive, and then the moment when the first spec-driven output comes back noticeably closer to what you actually wanted. After that, the workflow becomes natural.

The output is more reliable. Not every session is perfect, but the baseline quality went up and stayed up — because the specs persist, the context is structured, and the AI has something to evaluate its own work against.

That’s the shift. From correcting to reviewing. From “why did you do that?” to “approved, let’s do the next thing.”

Join me on a listening tour

Two years ago, I talked to leaders, builders, and operators about how AI was starting to show up in their work. Those conversations shaped a lot of my thinking about where things were headed.

Now I’m doing a follow-up — same core questions, two years later. I’m curious what actually happened, which predictions landed, and what’s still frustrating. The conversation is casual, about 45 minutes, and I’ll share a summary of what I learn across all the interviews (anonymized, of course).

No prep needed — just your honest perspective. And if you know someone who’d have a sharp take on this, I’d welcome the introduction.

Book A Listening Tour Interview

Discussion about this post

Ready for more?