aidesignengineering

May 21, 202610 min read

Design Before Code: A Discipline for AI-Assisted Engineering

When AI writes the code, the reasoning that produced it never exists anywhere except in a transient context window. A two-session pattern — a long Workshop to build the design, and a series of fresh Cold Reads to test it — moves accountability back where it belongs.

By Pallav

A pull request lands. Eight hundred lines. The diff is clean, the tests pass, the AI wrote most of it. In review someone asks why a particular pattern was used in three places — why a singleton here and an injected factory there? — and the author shrugs. The AI made that choice. The author skimmed it, approved it, and pushed it.

This is the failure mode. The system shipped. The author owns it. But the design — the actual reasoning that produced the code — never existed anywhere except inside the model's transient state. The author is now responsible for a thing they cannot defend.

The thing you'll be held responsible for is the design, not the lines. Code can be regenerated from a strong design doc in an afternoon. A bad design buried inside good-looking code costs months to unwind. If you let an AI write the code without first writing the design, you've outsourced the part that actually matters.

What follows is a pattern I've been using for spec-driven work: a long session to build the design, a series of fresh sessions to interrogate it, and a rule that no code gets written until the design survives both.

Code is the wrong artifact for the AI to produce first

Three reasons to prefer English design docs over generated source as the primary AI output. Two are practical, one is structural.

Economics. Asking an AI to reason about a few pages of English is cheap and reliable. Asking it to reason coherently across a hundred thousand lines of source is expensive, slow, and frequently wrong. Tokens spent on the design read; tokens spent re-reading generated code are mostly waste.
Inspectability. A design doc is for humans. You can read every line, debate it, catch contradictions in a thread. Generated code at scale gets skimmed — the assumptions live in places nobody looks until something breaks.
Reversibility. A doc revision costs minutes. A code revision that pulls one assumption out of the spine of a hundred files costs days. Cheap-to-change beats hard-to-change every time when you're iterating fast.

The structural reason is responsibility. When a human writes code, the small why-did-I-do-this decisions get baked into structure choices, comments, commit messages — leaving fingerprints you can follow. When an AI writes code, those decisions vanish: the model made them, then forgot them, and the only place they ever lived was a transient context window. If you don't extract them before the code is written, you'll own a system you can't defend.

Two sessions, two jobs

The pattern uses two distinct AI modes. They look like the same tool but they play opposite roles. Call them the Workshop and the Cold Read.

FIG_01: WORKSHOP → DESIGN DOC → COLD READ LOOP

The Workshop session

The Workshop is one long-running conversation. You and the AI carry the same context for as long as it takes — sometimes hours, sometimes days. Three rules:

No code. Tell the AI explicitly: produce no code in this session. If it offers, push back. The output is exclusively English describing intent, structure, and consequences.
Adversarial framing. Direct the AI to interrogate your assumptions rather than agree with them. Tell me where this design is weak. List three alternatives and explain why I'm picking this one anyway.
Files and reasons. Every change must land in the doc with two things: which file or files will be touched, and why this approach over the obvious alternatives.

The output is a markdown doc that lists every file the change will affect, every decision made, every alternative considered and rejected. This document becomes the source of truth. It outlives the session.

The Cold Read session

A Cold Read is a fresh AI session — no prior context, no memory of the Workshop, no access to the codebase beyond what you paste in. You hand it the design doc and nothing else. Then you ask three things, in sequence:

Explain this system back to me. If the explanation misses pieces, your doc is leaning on context that only lived in the Workshop. Fix the doc, not the AI.
What would you implement first, and what's the riskiest part? A good doc lets a fresh reader plan an implementation. If the AI flounders, the structure is wrong.
Tear this apart as a hostile critic. Adversarial prompting catches the assumptions you've quietly committed to. The doc has to survive an aggressive read, not a charitable one.

Each failure of the Cold Read sends you back to the Workshop with a specific gap to fill. The doc gets revised. You repeat. Only when a Cold Read consistently passes — and a human has signed off — does code actually get written.

WHY THE COLD READ HAS TO START FRESH

A new session can't fall back on the running thread for context. If the doc is missing something, the gap becomes obvious instead of being papered over by ambient memory. Continuing the original session for review is the most common mistake — the AI fills in the holes from prior turns and you never notice.

What the design doc actually contains

A passable design doc has four sections. None of them are optional.

Section	What it answers
Files & touch points	Exactly which files change. Names. Paths. No vague descriptions.
Decisions, with reasons	Each non-obvious choice, the alternatives considered, why the chosen path won.
Edge cases & failure modes	What goes wrong, how it's handled, what gets silently swallowed and why that's acceptable.
What this is NOT doing	Scope cuts you deliberately made. The next reader needs to know why something obvious was left out.

The files & touch points section is the test that the design is real. A doc that says add caching layer to the API is not a design — that's a wish. A doc that says add lib/cache.ts with these three exports; modify routes/users.ts:42-68 to call them; add vitest tests in lib/cache.test.ts is a design.

The what this is NOT doing section is the one most engineers skip and most often regret skipping. Without it, the next person reading the doc — or the next Cold Read — silently expands the scope. Oh, this should also handle X. Now you have a different feature.

A useful skeleton:

docs/design/feature-name.md

markdown

# Feature: <short name>

## Scope
- **In:** what this delivers
- **Out:** what this explicitly does NOT deliver

## Files & touch points
- `src/<path>.ts` — what changes, and why
- `tests/<path>.test.ts` — new coverage

## Decisions
### D1: <decision title>
- **Chosen:** <approach>
- **Rejected:** <alternative> — <reason>
- **Cost we're accepting:** <the downside of the chosen path>

## Edge cases
| Scenario | Behaviour | Why |
|---|---|---|
| ... | ... | ... |

## Risks
- What could go wrong
- How we'd catch it
- What we won't catch

## Sign-off
- [ ] Cold Read 1 — explain-back
- [ ] Cold Read 2 — adversarial critique
- [ ] Human review

A worked example

Take a concrete one. The feature: add regex-based URL path matchers to a small API mocking library I maintain on the side. Today the library only matches exact path strings. A request like /users/42 needs its own mock; /users/43 needs another.

In the Workshop session, we don't write matchURL(pattern: RegExp) and call it done. We argue:

Should the regex test against the path only, or the full URL? Decided: path only — fewer security footguns when users paste hostile patterns.
What's the precedence between regex and exact matchers when both are registered? Decided: exact wins; document this loudly so users can't be surprised.
What happens if the regex is invalid? Decided: throw at registration time, not at match time, so failures surface during test setup rather than mid-request.
How does this interact with existing fixtures using wildcards? Decided: add migration notes; fixtures using must be ported manually. We're not doing automatic translation.

None of those four answers are in the code I haven't written yet. They live in the doc. The doc names the files (src/matchers/url.ts, src/matchers/registry.ts, tests/matchers/url.test.ts, docs/migration.md), the four decisions above with rejected alternatives, the error cases, and the scope cuts — header-level regex is explicitly out of this pass.

The Cold Read tears at it. The first fresh session asks: What happens if two regex matchers both match the same URL? The Workshop didn't cover that. Back to the Workshop. The doc grows a fifth decision (most-recently-registered wins, with a warning logged). The next Cold Read asks: What's the perf cost of running regex tests against every incoming request? Another gap. Another revision (lazy-compile on first use; cap compiled-regex count per registry).

After three Cold Read iterations the doc covers everything any fresh reader asks. At that point — and only at that point — does the code get written. Often the AI writes it in one sitting from the doc alone. Sometimes a human does. Either way, the system is defensible from the moment it ships.

When this pattern doesn't fit

Honest concession: this assumes the design can be made solid before any code is written. That assumption holds for well-understood features, structural refactors, and spec-driven work. It breaks for genuinely exploratory work — research code, novel UI patterns, anything where you need to prototype to discover what you're building.

For exploratory work, invert the rule: build the prototype fast, then throw it away and write the doc. The prototype is the research; the doc is the production design. Treating exploration as a separate phase is fine — just be honest with yourself that you're in exploration, and that the prototype is not the thing you're shipping.

The pattern fits we know what we want, we just need to design how — migrations, new endpoints, adding capabilities to mature systems, anything where the shape is clear and the question is the implementation. That's most production work.

Adversarial prompting beats supportive prompting

Most people ask AI to validate their work. Does this look right? The model says yes. You get nothing.

The actually-useful prompts are the mean ones:

Find three things that are wrong with this design. Forces enumeration rather than agreement.
What would a skeptical staff engineer push back on in code review? Activates training data on hostile-review patterns.
What scenario would break this system silently — no error, just wrong behaviour? Surfaces the failure modes nobody planned for.
Rate this design one to ten and list everything missing from a ten. The number is meaningless; the gap analysis is everything.

Adversarial prompts in the Cold Read session catch what charitable prompts in the Workshop missed. The Workshop is where the design gets built; the Cold Read is where it gets tested. Mixing the modes is how bad designs survive.

Working a legacy codebase

Greenfield is the easy case. Most real engineering happens against an existing codebase nobody on the current team designed. The pattern still works — you just need a different starter.

Before any Workshop session begins, generate a system map: a markdown summary of the relevant slice of the codebase. Not everything — just the modules the change will touch, their public interfaces, and the contracts they uphold. Three to five pages. This becomes the substrate the Workshop reasons against.

Without a system map, every Workshop session starts by re-discovering the architecture from scratch, which is exhausting and unreliable. With one, the AI can quote contracts at you when you propose something that violates them.

GENERATE THE MAP FROM THE CODE, NOT FROM MEMORY

Don't write the system map from what you think the codebase does. Have an AI read the source and produce the map; then a human edits it for accuracy. What you think the codebase does and what it actually does diverge constantly — the map is most valuable when it surfaces the divergence.

The version that fits on one line

Going fast with AI without this discipline does not make you faster. It lets you mass-produce mistakes you can't see. The Workshop is where you find the mistakes early, when they're cheap. The Cold Read is where you find the ones you missed. The code, when it finally gets written, is the part that's least at risk.

Treat the design as the artifact. The code follows from it. That's the whole rule.