Beyond ICM: Where We're Going
What ICM doesn't yet handle, and the direction worth moving toward: durability, observability, restartability.
A note before we start
This part gets a notch more technical because we're talking about how to actually run ICM at production reliability, and the vocabulary for that lives in the formal workflow-engineering literature. The terms have specific meanings, and I'll keep the jargon explained as we go.
It's also the most honest part of the series. I'm not running any of this in production yet. The four prior parts describe ICM as it works today; this part describes the direction I'm planning to move toward, and why I think it's the right next layer. If the gap between "what we have" and "what we want" makes you skeptical of the destination, that's a reasonable read. The gap is real. The point of writing this part is to name the destination clearly enough that the next person working with ICM knows what we're building toward, and so we can talk about whether the proposal holds up under scrutiny before it gets baked in.
The gap, in one scenario
ICM as it stands today is a workflow specification with no formal runtime semantics. That phrasing sounds technical, but the consequence is concrete.
Imagine an ICM stage that sends an approval email to a client. Say, the deliver stage of a content pipeline that emails a draft to a stakeholder for sign-off. The agent runs the stage. The email goes out. Then the next stage fails: a network blip, a malformed input file, a model hiccup, anything that throws an exception mid-execution. The workflow halts. You restart the run. The agent re-runs the failed stage. But it also re-runs the previous stage, because there's no record of which stages already completed. The email goes out a second time. The client gets two approval emails an hour apart.
Now imagine the failure happens at stage seven of a ten-stage workflow that has external side effects at stages three, five, and seven. Restart, and stages three and five run again. Whatever they did to the outside world (database writes, Slack posts, file uploads) happens again. There is no rollback story. There is no resume-from-where-you-stopped. The folder structure works as documentation, but it doesn't execute reliably under failure. That gap is what Part 5 names. The further you push ICM into work that touches external systems, the more this gap matters, because every external touch is a place where a retry can do harm.
This gap has been studied for decades in workflow engineering. ICM doesn't currently borrow from that body of work. Closing the gap is mostly a matter of adopting concepts the field has had for thirty years, packaged in a way that survives ICM's folders-as-source-of-truth conviction.
What's missing, and how to add it
The discipline that solves this gap is called durable execution, and it has a real lineage. The 1987 saga paper from Garcia-Molina and Salem is the foundational work on multi-step transactions you can't wrap in one ACID block: when a workflow has steps that span multiple databases or external systems, you decompose the work into local transactions, each with a compensating action that undoes its effect if a later step fails. The Workflow Patterns Initiative cataloged 40-ish control-flow patterns that show up across workflow engines: sequence, parallel split, synchronizing merge, deferred choice, milestone, cancel region. Modern engines like Temporal, AWS Step Functions, DBOS, LangGraph, and Vercel WDK all run a variant of the same trick: serialize execution state at step boundaries, replay deterministically on restart. Different syntax, different storage backends, same core idea.
There are five concepts ICM is missing, mapped to the runtime gaps. Each one has a concrete failure mode that ICM can't currently handle.
1. Deterministic replay. The killer idea behind modern durable execution. When a workflow process dies and restarts, the engine replays the work from line one, but each side-effecting call returns its previously-recorded result instead of running for real. The replay catches up to where the failure happened, then executes for real from that point forward. The implication: workflow code must be pure orchestration, with all non-determinism (clocks, random numbers, network calls, database writes) wrapped in activities that get checkpointed. ICM today has no event log, no replay mechanism. A re-run is a from-scratch run.
2. Step-boundary checkpoints. State is durable between steps, not inside them. The design unit becomes "what do I want to atomically commit." A 10-line step you're confident about is better than a 100-line step that might crash in the middle. ICM's coarsest checkpoint today is the stage's output/ folder; once a stage writes its output, the next stage can read it. But there's no checkpoint within a stage. A stage that crashes halfway re-runs from the start.
3. Idempotency. Every step must be safe to retry, because at-least-once delivery is the only honest guarantee in distributed systems. Either the step is naturally idempotent (a PUT request, an upsert on a primary key, a "create if not exists" check) or you wrap it in a deduplication key. "Send Slack message" is the canonical footgun: without an idempotency key, every retry double-posts. ICM's file writes are naturally idempotent (overwriting a file twice produces the same file), but stages with side effects (Slack, email, Supabase, git push) have no formal contract for idempotency.
4. Compensations. Sagas, in workflow-engineering vocabulary. If step five fails after step three sent the email, step three's compensation might be "send a correction email," or "log a warning, alert a human," or "no-op, but flag the stage as needing manual review." Workflows that touch external systems need an explicit compensation per side-effecting step. ICM has no compensation concept today; if a workflow aborts after a side effect, you're hand-cleaning.
5. Signals, timers, and queries. Three primitives that don't exist in normal code, that durable runtimes provide as first-class. Wait for an external event for seven days: a webhook, an approval click, a payment confirmation. Wake up tomorrow morning: a durable timer that survives process restarts. Let an outsider check workflow state without disrupting it: a query that returns the current state without affecting the run. ICM's substitute for the first two is "stop and ask the human in chat," which works for synchronous workflows but not for the kind that sit waiting for external events for hours or days.
Now the path to add these to ICM without abandoning the folder-as-source-of-truth conviction. Two structural additions, both surgical:
A workspace-level workflow.yaml at the workspace root. This file declares the state schema (what artifacts flow between stages, with their types), lists stages in order, marks branches and parallel fanout, names interrupt points (human-in-the-loop), and declares external signals the workflow can wait on. One file. The compiler reads it. The existing CLAUDE.md and stage prose stay exactly as they are; workflow.yaml is purely additive.
YAML frontmatter on every stage CONTEXT.md. The bits a runtime needs to check (stage_id, reads, writes, side_effects with idempotency keys, retry, timeout, interrupt_before, compensations) get pulled into machine-parseable frontmatter. Existing prose stays as the body of the file, untouched. Most stages have small frontmatter. Stages that touch external systems get richer frontmatter; the side-effect declaration is where the idempotency key and compensation reference go.
The payoff: any durable runtime (DBOS Transact, LangGraph, Temporal, whatever ships next) is a 200-300 line compiler away. The compiler reads workflow.yaml and the stage frontmatter, generates the runtime-specific wiring, and executes the workflow. The ICM workspace itself never knows which runtime it's running on. Swap the runtime by rewriting the compiler, a weekend's work, without touching any of the workspaces. The folders don't change. The prose doesn't change. The contract does the same job; the runtime under it is replaceable.
What this would look like for meeting-summarizer
Aspirational, not existing. This is a sketch of what the next layer of writing would look like for the workspace shown in Part 4:
name: meeting-summarizer
version: 1
state:
transcript: { type: file, location: shared/transcript.md }
meeting_notes: { type: file, written_by: 01-meeting-notes }
action_items: { type: file, written_by: 02-action-items }
key_insights: { type: file, written_by: 03-key-insights }
extended_notes: { type: file, written_by: 04-extended-notes, optional: true }
delivery_target: { type: signal, schema: { folder_path: string } }
stages:
- { id: 01-meeting-notes, next: 02-action-items,
reads: [transcript], writes: [meeting_notes] }
- { id: 02-action-items, next: 03-key-insights,
reads: [transcript, meeting_notes], writes: [action_items] }
- { id: 03-key-insights, next: 04-extended-notes,
reads: [transcript, meeting_notes], writes: [key_insights] }
- { id: 04-extended-notes, next: 05-assemble-and-deliver,
reads: [transcript, meeting_notes], writes: [extended_notes],
gate: explicit_request }
- { id: 05-assemble-and-deliver, next: END,
reads: [meeting_notes, action_items, key_insights, extended_notes],
side_effects: [
{ system: filesystem, idempotency_key: "delivery_target.folder_path" }
],
interrupt_before: true }
That's the durable-execution layer for one workspace. The state schema declares what flows between stages by name, not by file path, so the runtime knows what's in scope at each step. The stage list spells out the order, the reads, the writes, the optional gating. The single side-effect declaration on stage 05 includes an idempotency key, delivery_target.folder_path, so a retry of the final write goes to the same folder, never to two folders. The interrupt_before: true on stage 05 maps to a runtime interrupt (in DBOS this is a recv call that waits on an external signal; in LangGraph it's the interrupt_before graph configuration).
The prose in each stage's CONTEXT.md stays exactly the same; it's the human-readable contract. The frontmatter and the workflow file are what the runtime reads. The two layers don't interfere with each other: a human reading the workspace sees the prose; a compiler reading the workspace sees the structured fields.
For workspaces with richer side effects (sending emails, posting to Slack, writing to Supabase), the frontmatter on the side-effecting stages grows accordingly, with one entry per external system, each with its own idempotency key and compensation reference. Most workspaces I run today wouldn't need this richness. The ones that would need it are the ones I'd be most worried about running unattended, exactly the workspaces where durable execution matters most.
Closing the series
There's no Part 6. ICM as a methodology lands here, as a clean workflow specification with a clear path to add runtime semantics when you outgrow the manual run. The phased path I'm planning, in order: write workflow.yaml and frontmatter for one workspace as a documentation exercise (no runtime needed; this alone prevents whole categories of incident, because writing down the side effects forces you to think about the idempotency story). Then evaluate DBOS Transact as the smallest viable runtime: MIT-licensed, library-shaped, runs in GitHub Actions cron with Postgres state, no separate infrastructure to babysit. Then LangGraph if I hit agent-specific feature gaps that DBOS doesn't fill, like streaming token output, time-travel debugging, or agent loops within a stage. Then Temporal if the work ever scales past one user and starts wanting the kind of reliability primitives Uber built it for. Most likely I stop at DBOS. The rest is contingency, available when needed.
The claim the whole series rests on is that the workspace itself never has to change as the runtime under it changes. The folders are the source of truth. The runtime is a renderer. Every layer of escalation costs you a weekend of compiler work and zero workspace rewrites. That's what the bet against framework lock-in looks like in practice: when the next general advance arrives, your work is portable to it because it was never coupled to the previous one.
That's the same bet The Bitter Lesson names from a different angle. Don't build the bespoke layer that gets eaten by the next general advance.
Three takeaways
- ICM is the spec; the runtime is a renderer. This separation is the durability promise. Pick the runtime that fits the moment; the folders survive the swap. Workspaces don't change when the runtime under them changes.
- You don't need a runtime to start writing the contracts.
workflow.yamland frontmatter is documentation that prevents incidents on its own. Writing down the side effects forces you to think about the idempotency story; the runtime makes the contracts executable, but the contracts are auditable on their own. - The phased path: workflow.yaml → DBOS → LangGraph → Temporal. Promote only when forced. Most workspaces stop at DBOS or earlier. Don't pre-build the bespoke layer the Bitter Lesson warns about.
Cross-references
- Garcia-Molina & Salem, Sagas (1987): the foundational paper on multi-step transactions
- Workflow Patterns Initiative: the closest thing to a CS curriculum for workflows
- Temporal docs: Determinism: the canonical primer on deterministic replay
- DBOS Transact (Python): MIT, library + your Postgres
- LangGraph: MIT, agent-shaped graphs
Sources
- The five missing concepts and the path to add them come from my private working notes on durable execution, distilled from the Latent Space podcast with Vercel CTO Malte Ubl, the saga paper, the Workflow Patterns Initiative catalog, and the Temporal/DBOS/LangGraph documentation
- The phased path (workflow.yaml → DBOS → LangGraph → Temporal) is mine; see the About page for the scope of this contribution