noah jobse

Where should the research live: spec time or build time

#spec-driven development #AI agents #engineering

This is the thing I keep running into with spec driven dev and haven't really seen anyone name.

External context can enter your build at two points. Spec time or build time.

Spec time is when you do a research pass before you write the spec. You go read the docs, figure out the API, make decisions, and write them down. Build time is when you hand the agent a thinner spec and let it go find the docs itself, then act on what it finds.

Both work. The interesting part is that almost everyone frames the choice as a speed tradeoff, and I don't think speed is the real difference.

It's not about speed, it's about the review gate

The usual framing is that front loading is slower and fetching on the fly is faster. That's true as far as it goes. But it misses what actually changes when you move research from one side to the other.

Front loading research into a spec gives you a review gate. The research becomes an artifact. It sits in the spec where you can actually look at it, catch the wrong assumption, and correct it before a single line of code gets written. You are reviewing the inputs.

Build time is find it and do it. The agent reads the docs and acts in one motion. You never see the research that drove the implementation, you only see the result. There is no gate on the finding, only on the output.

So the real question isn't "which is faster." It's "do I need to verify the inputs before they harden into code." That reframes it from an efficiency decision into a delegation decision, which is a much more useful way to think about it. Front loading is "decide, then delegate the build." Build time is "delegate the deciding too."

What actually decides it: how much I trust the docs

Once you see it as a delegation call, the criterion gets simple. I delegate the research when I trust the agent to get it right, and I front load when I don't.

And what that really comes down to is how much I trust the docs, plus my own familiarity with them.

Take Stripe. I know the API, the docs are good, and the model has seen a huge amount of correct Stripe code. The priors are current. I'll happily let the agent fetch what it needs at build time, because the odds it gets it wrong unsupervised are low.

Now take something new, or something I know just had a breaking change. There I take the delay and read it myself first. The Supabase anon key getting killed off is a good example. An agent will still confidently write the old anon key well after the change, because it was trained on years of implementations that used it. The docs moved and the priors didn't.

That example points at something worth saying plainly: "trust the docs" really means "trust the agent's priors about the docs." Those are not the same thing, and the gap between them is exactly where you get burned.

The two ways build time fails

If you look closely, delegating research to build time has two distinct failure modes, and they're worth separating because they call for the same fix.

The first is that the agent doesn't fetch at all. It thinks it already knows, leans on stale training, and writes the deprecated pattern. The Supabase keys are this.

The second is that the agent does fetch, but misreads an unfamiliar or badly organized API and wires it up wrong.

Front loading with a review gate catches both, because in both cases you are putting a human check on the current truth before it becomes code. That's the whole value of the gate. It's insurance against the agent being confidently wrong, whether the confidence came from training or from a bad read.

How I actually handle it

The trap people fall into is doing the research at spec time to make a decision, then leaving the findings in their own head, so the build agent re-fetches everything anyway. That's the duplication that made me want to write this in the first place. You did the work twice and gated nothing.

The fix is to make the spec absorb the research. Concretely, two places:

Per-task findings go directly into the spec file. The endpoints, the payload shapes, the gotcha I just verified, the current key format. They sit inline so the build agent consumes them instead of re-deriving them. The research I did becomes context the agent reads, not work it repeats.

Anything worth persisting graduates up to CLAUDE.md. If a finding isn't specific to this one task but is a durable fact about how we build, it belongs in the project rules, not buried in one spec. "We use the new Supabase publishable keys, never the anon key" is a CLAUDE.md line, because it's true for every task forever.

Once you do this, the duplication mostly goes away. Per-task truth lives in the spec, durable truth lives in the rules, and the agent reads both instead of re-fetching. You only repeat yourself when a finding falls through both nets.

A concrete case: Gmail inbound and outbound

I'm building a Gmail inbound/outbound integration right now, and it splits cleanly along this line.

The things I front load into the spec, because trust is low:

  • The exact API shapes. Request and response structures I want pinned, not guessed.
  • The newer features. The parts of the API the model hasn't seen much correct code for, where its priors are thin.
  • The Cloud setup nuances. The Pub/Sub topic, the watch registration and its expiry, the IAM bindings. This one is interesting because it's a case where even build-time fetching struggles. The knowledge isn't sitting in one doc page the agent can pull. It's spread across consoles and permissions and setup steps that live outside the code entirely. If I don't capture it at spec time, there's often nothing clean for the agent to fetch.

The things I happily leave to build time, because they're routine:

  • Parsing a payload once the shape is known.
  • Wiring a handler.
  • Standard CRUD and the ordinary glue around it.

The pattern underneath is the same one from the Stripe vs Supabase split. Front load where the priors are stale or thin or scattered, delegate where the work is well-trodden.

Where this leaves me

I don't think this is fully solved, but the shape of the answer feels right. Treat the spec-time vs build-time choice as a delegation decision, not a speed one. Front load the research wherever you don't trust the agent's priors, and when you do front load, write the findings into the spec so the gate you just created doesn't get thrown away at build time.

Mostly that works for me now. Curious how other people are drawing the line.