I’ve been thinking about a problem that keeps surfacing on engineering teams. Two developers, same codebase, same AI tool, completely different output quality. The junior asks the AI to “create a notification service” and gets generic boilerplate that ignores half the team’s conventions. The senior asks for the same thing and specifies functional style, existing middleware, proper directory structure, explicit types, the logging utility everyone actually uses. The gap isn’t in what the AI knows about the project. It’s in what each person knows to ask for.
This plays out across every interaction with AI, not just code generation. Refactoring requests, security checks, code reviews. The senior engineer has internalized years of pull request feedback, production incidents, architectural debates. She doesn’t consult a checklist when reviewing code or prompting an AI. She just knows. That knowledge is the team’s most valuable asset and it lives almost entirely in people’s heads.
The usual response is to write better documentation, do more pairing, train the juniors. These help but they’re slow and they don’t scale. What’s needed is a way to extract what the senior applies instinctively and make it execute consistently for everyone, not as advice to remember but as infrastructure that runs.
From Tacit Knowledge to Executable Instructions
I’ve described techniques before for improving individual collaboration with AI: curated context documents, structured design conversations, living decision records. Each helps one person get better results. None of them address the consistency problem. When team standards exist only as tribal knowledge or Slack tips, AI-assisted development quality depends entirely on who’s at the keyboard. Seniors become bottlenecks not because they write the code but because they’re the only ones who know what to ask for.
The shift I’m proposing is treating the instructions that govern AI interactions as versioned artifacts. Not documentation that informs, but configuration that executes. Linting rules aren’t wiki pages describing code style, they’re config files that enforce it. CI/CD pipelines aren’t documents explaining deployment, they’re executable definitions. AI instructions belong in the same category.
When a generation instruction lives in the repository, reviewed through pull requests, it encodes “how we do things here” in a form that runs the same way for everyone. The developer doesn’t need to carry the full set of team standards in their head. They invoke the instruction. The team’s judgment applies consistently because the infrastructure encodes it.
The Anatomy of an Executable Instruction
A well-structured instruction has four distinct elements, regardless of its purpose. First, the objective: what the AI is being asked to do, stated precisely. Not “check security” but “identify security vulnerabilities in this code against our threat model.” Second, the constraints: the architectural boundaries, conventions, and patterns that must be preserved. These are the non-negotiables, the things that separate acceptable output from work that gets rejected in review.
Third, the checks: prioritized items the AI should verify, ordered by criticality. In a security instruction this might be SQL injection, authorization on every endpoint, hardcoded secrets. In a generation instruction it’s adherence to the team’s functional style, proper error handling, correct module structure. Fourth, the outputs: what the instruction should produce. A list of issues with severity and location, suggested fixes with explanations, areas flagged for human review.
The most interesting part isn’t the structure itself but the extraction process. It amounts to interviewing senior engineers with pointed questions: What architectural decisions should never be left to individual judgment? Which conventions get corrected most often in generated code? What triggers an immediate rejection in review? The answers map directly to instruction elements. Non-negotiable patterns become constraints. Frequent corrections become convention checks. Review rejections become critical items.
On one project, this extraction revealed that two senior engineers had quietly different thresholds for what counted as a critical security concern versus merely important. The disagreement had never surfaced because they reviewed different pull requests. Writing a shared instruction forced the distinction to become explicit. Once a junior developer started using that instruction, they immediately caught a missing authorization check on a new endpoint, exactly the kind of issue that had previously only been caught when a senior happened to be the reviewer.
Where Instructions Apply in the Workflow
These instructions aren’t a single-purpose tool. They apply at different points and the timing shapes their value. At generation time, when creating a new service or implementing a feature, a generation instruction ensures output follows team conventions from the start. This is where encoded standards have the most leverage: they prevent misalignment rather than catching it later.
During development, refactoring instructions keep improvements aligned with team norms. Security instructions apply the team’s actual threat model, not a generic checklist. The standards are present throughout development, not bolted on at the end.
At review time, when a developer finishes work (AI-generated or manual), a review instruction applies the team’s quality gate. But review is the last opportunity to catch misalignment. The earlier standards are applied, the fewer issues reach that stage.
Some teams extend instructions into CI as automated consistency checks. CI-level instructions must be fast enough not to slow the pipeline and predictable enough to avoid noisy false positives. They require the same maintenance discipline as any other CI gate.
Instructions as Shared Infrastructure
A prompt on an individual machine is a personal productivity hack. The same prompt in the team’s repository is infrastructure. The distinction matters. When it lives in the repo, it inherits all the properties of versioned artifacts: changes tracked, standards collectively owned, everyone working from the same version.
Different tools implement this differently. Custom commands, skills, rules files, project instructions. The underlying property is the same: a versioned, shared artifact that the AI executes consistently.
This enables a different kind of collaboration. A developer notices the generation instruction doesn’t specify the team’s new error-handling pattern and submits a PR to update it. A security incident reveals a gap in the security instruction; the fix is a commit, reviewed and merged like any other infrastructure change. The standards aren’t static rules frozen in place. They’re living artifacts the whole team maintains, sharpened by practice, improved through the same pull request workflow used for code.
I’ve written before about treating context as infrastructure rather than habit. The same principle extends here. The priming document tells the AI how the project works. The executable instruction tells it how the team works.
When This Actually Matters
There’s a real risk these instructions become another documentation graveyard, created with enthusiasm and abandoned within months. Repository placement and pull request review mitigate this. An instruction in the repo is visible. It appears in diffs. It can be referenced in PR templates. When it drifts from actual practice, the drift is visible the same way a stale test is visible, not because someone audits it but because it’s encountered in normal work.
This approach is most valuable when a team is large enough that consistency can’t be maintained through conversation alone. If AI-assisted output quality visibly varies depending on who’s prompting, or if generation and review work routes through a handful of people because they’re the only ones who know how to prompt effectively, that’s the signal. Teams of five may not need this. Teams of fifteen almost certainly do.
The costs are real. Creating good instructions requires effort: extraction interviews, drafting, iteration. Overly prescriptive instructions become brittle, producing false positives on edge cases or fighting against legitimate variations. There’s a maintenance burden as standards evolve. And there’s a risk of over-engineering: not every AI interaction needs a dedicated instruction.
The right starting point is one instruction. A generation or review instruction usually offers the highest value. It addresses the most common workflow, the widest quality gap, the most visible inconsistency. Additional instructions should follow adoption, not precede it.
The Mechanism Isn’t New, The Scope Is
Teams already encode judgment into infrastructure. Linting rules capture style preferences. CI pipelines embody deployment knowledge. Infrastructure as code makes operations reproducible. What changes with AI is the scope of what can be encoded. Linting catches syntax and style. Executable team standards can encode architectural judgment, security awareness, refactoring philosophy, review rigor, the kind of knowledge that previously transferred only through pairing and years of shared experience.
The most interesting property is that these standards are team-owned. They live in the repository. They evolve through pull requests. They improve when practice reveals gaps. Every instruction that misses something is an instruction waiting to be updated, and the update is a commit the whole team reviews. The standards aren’t just output of team knowledge; they’re the mechanism through which that knowledge gets codified, shared, and refined.
What we’re really doing is taking judgment that lives in people’s heads and making it execute as shared infrastructure, turning the instincts that separate senior engineers from juniors into artifacts that anyone on the team can invoke.