The Five-Room House

A common instinct, when an AI gives a mediocre answer, is to feed it more context. The intuition makes sense: if context is what the model lacks, more context should help. It usually doesn't. Research from MorphLLM and Augment Code over the past year keeps finding the same result: as you load more context, performance gets worse, not better.

Augment's framing is the one that sticks. In their 2025 piece "Your agent's context is a junk drawer," they describe the failure mode bluntly: an unstructured AI context window is exactly what the title says it is. Files thrown in over time, conversation history piling up, reference docs from a previous task that nobody cleaned up. The model spends its attention sifting through irrelevant material, looking for the things that apply to the current step. Every irrelevant file in context is a distraction tax. The model that could have nailed the current step on a clean prompt produces a confused mid-quality answer because most of its attention went to material that didn't matter.

MorphLLM's research put numbers on it. Performance plateaus well before the model's nominal context window fills. By the time you're at 30,000 to 50,000 tokens of unstructured context, output quality is measurably lower than at 5,000 tokens of targeted context. The performance ceiling is set by how cleanly you can route the model to just what it needs for the current step, not by how much you can stuff in. Less context produces better output, provided the context that's there is the right context.

The implication is uncomfortable for anyone whose default move is to upload more files, and for anyone building a product that gives the user a "drop a folder of everything you've got" button. The AI handling that input is, by default, going to do worse than the version that received a curated subset. The fix is structured context, which is a design problem, not a hardware problem.

Progressive disclosure, with structure

The pattern that solves the junk-drawer problem is progressive disclosure. Anthropic uses the term in both their Skill authoring best practices doc and their Effective Context Engineering for AI Agents essay. The idea: an agent maintains lightweight references (file paths, table-of-contents pointers, link snippets) and only loads the actual content into context when it's needed for the current step. The agent works through a filesystem the way a human uses an inbox or a bookmark folder. You don't read every email or open every bookmark. You fetch what's relevant, then put it back when you're done.

Anthropic frames it as how human cognition handles large information stores: "we generally don't memorize entire corpuses of information, but rather introduce external organization and indexing systems like file systems, inboxes, and bookmarks to retrieve relevant information on demand." The agent works the same way. The folder structure is its inbox. The contents of any one file aren't in the agent's head until it reads that file, and the agent only reads what the current task asks for.

ICM is one specific implementation of progressive disclosure, applied to multi-stage workflows. Anthropic's framing is general: keep references lightweight, load on demand. ICM adds opinions about how: how many layers there should be, what each layer is for, and the hard rule that routing files contain only routing (no instructions, no reference material, just navigation). The result is a structure tight enough that two different agents reading the same workspace produce structurally similar work, even if the prose underneath is different.

The ICM methodology teaches the structure with a simpler analogy: the map, the rooms, and the workspace. The map (root CLAUDE.md) tells the agent how the building is laid out: what folders exist, what each one is for, which keywords trigger which workflow. The rooms (per-stage CONTEXT.md files) hold task-specific instructions; when the agent enters a stage, it reads that stage's contract and works from it. The workspace is the file system itself: the source materials, the references, the outputs of previous stages. The agent is never asked to hold the whole building in its head. It walks to the right room and reads the room's contract. Everything else stays on disk until it's needed.

The five layers, spelled out

Formally, ICM organizes context into five layers with explicit token budgets. Each layer answers a different question for the agent.

Layer 0: Root CLAUDE.md ("Where am I?"). Always loaded automatically when the workspace opens. Contains the workspace map, the routing table, and trigger keywords. Budget: about 800 tokens. This is the only file the agent always sees, so it has to be lean. It tells the agent what kind of workspace this is, what folders matter, and what keywords map to which workflow. It does not contain the actual instructions for any of those workflows; those are one navigation hop away.

Layer 1: Workspace CONTEXT.md ("Where do I go?"). Read on entry to a workspace. Lists the stages and what each is for. Budget: about 300 tokens. Pure navigation. No process steps, no audit checks, no reference material. Just "here are the stages, in order, with one sentence each."

Layer 2: Stage CONTEXT.md ("What do I do?"). Read when the agent enters a specific stage. Budget: 200 to 500 tokens. This is where the actual work-instructions live: inputs the stage reads, the process it follows, the audit checks before output, the format and location of the output it produces. The contract for the stage. Specific enough that two different agents would produce structurally similar work, loose enough that the agent has creative latitude inside the contract.

Layer 3: Reference material ("What rules apply?"). Voice rules, design systems, templates, style guides, brand vault. Loaded selectively per stage; the inputs table at the top of the stage's CONTEXT.md says exactly which reference files this stage needs. Files outside that list stay on disk. Budget varies per stage, but stays lean through selective routing.

Layer 4: Working artifacts ("What am I working with?"). The actual material the work flows through: outputs from previous stages, source material the user provided, the in-progress draft. Also loaded selectively; the agent reads only the artifacts the current stage's inputs table calls for.

The total context per stage typically lands at 2,000 to 8,000 tokens. The monolithic everything-at-once approach lands at 30,000 to 50,000. The smaller number consistently produces better output, both because the model isn't distracted and because the structure forces the workspace author to declare exactly what each stage needs. The declaration is the value; the limited context is the consequence.

One rule keeps this honest: CONTEXT.md files contain routing only, never instructions or reference material. Routing tells the agent where to find what it needs. Instructions and reference material live in dedicated files the routing points to. Mix the two and your context budget bloats, because the routing file gets read on entry, so anything in it is paid for whether or not it applies. Keep them separate and the system stays lean: the agent reads the routing, then reads only the specific files the routing pointed to.

What a stage contract looks like

Here is a real stage contract from a workspace I use: meeting-summarizer/stages/01-meeting-notes/CONTEXT.md. The structure is the contract; the prose is filled in for one specific job (capture detailed meeting notes from a transcript).

# Stage 01: Meeting Notes

## Inputs
| Source | File/Location | Why |
|--------|--------------|-----|
| Transcript file | User-provided path | Source material |
| Large transcript handling | ../../shared/large-transcript-handling.md | Required if transcript exceeds 100KB |
| Summarization tips | ../../shared/summarization-tips.md | Voice/quality guidance throughout |

## Process
The host is David. The notes should comprehensively document:
- Main topics discussed
- Key decisions made
- Important context shared
- Technical details or specifics mentioned
- Any agreements or conclusions reached

Read the entire transcript before writing — context from later in the
meeting often clarifies earlier points.

## Audit
| Check | Pass Condition |
|-------|----------------|
| Full transcript read | Entire transcript processed, not just partial reads |
| Coverage | All major topics from the transcript represented in the notes |
| Output saved | output/notes.md exists |

## Outputs
| Artifact | Location | Format |
|----------|----------|--------|
| Meeting notes | output/notes.md | Markdown — narrative prose grouped by topic |

Five possible sections, four of them shown here. The vocabulary is the same in every stage of every workspace, and that's the point. Inputs declares the exact files the stage will read, with a one-line why per file. The agent doesn't have to guess what's relevant; it loads what the table says and ignores the rest. Process gives the numbered steps a different agent could re-execute and arrive at a structurally similar output: loose enough to leave room for judgment, specific enough that two independent runs produce comparable work. Audit defines unambiguous pass/fail checks the agent runs before writing the output, so a stage doesn't ship a half-baked artifact to the next stage. Outputs declares where the result lands and in what format, so the next stage's Inputs table has a stable thing to reference. Checkpoints (not shown here because this stage doesn't have one) would mark moments where the agent pauses for human review before proceeding.

That's the entire vocabulary you need to write any stage in any workspace. Every stage in meeting-summarizer follows it. Every stage in content-to-guide follows it. Every stage in any workspace built from the ICM source material follows it. That uniformity is what lets the agent move through unfamiliar workspaces without confusion: the shape of every stage is the same, only the prose changes.

Once you have the mechanics (five layers, lean routing, stage contracts) the next question is when to actually reach for ICM. Not every task needs this much structure. A skill is sometimes enough; a framework is sometimes the right answer. Part 3 walks through the full stack and shows where ICM fits.

Three takeaways

Less context, better output. The performance ceiling is set by what's missing from the prompt and what's misplaced in it, not by how much you can stuff in. Lean wins; selective lean wins more.
CONTEXT.md is routing only. Never put instructions or reference material in routing files. Mix and you bloat your context budget on every stage entry, before the work has even started.
Stage contracts make agents interchangeable. Specific-enough Inputs / Process / Audit / Outputs sections mean two different agents, or two different model versions six months apart, produce structurally similar work. That's the durability the file structure buys you.

Sources

Anthropic: Effective Context Engineering for AI Agents: the canonical "progressive disclosure" framing, including the human-cognition analogy
Anthropic: Skill authoring best practices: progressive disclosure patterns applied to Skills
MorphLLM: Context Engineering — Why More Tokens Makes Agents Worse: research on the performance ceiling
Augment Code: Your agent's context is a junk drawer: the framing that captures the failure mode
The five-layer model and the map/rooms/workspace analogy come from ICM's source material. See the About page for full attribution.