How It Works
What runs where when you execute a Kitaru flow — server, runner, execution targets, and the contract between them.
When you call .run() on a flow, three things work together to make it
durable: the Kitaru server (shared metadata, auth, deployment registry), the
runner (per-run durable control flow), and one or more execution targets
(where each checkpoint's code actually executes). During local development all
three collapse into a single Python process. In production they separate across
your infrastructure.
- SDK
- CLI
- UI
- MCP
- execution metadata
- checkpoint state
- log metadata
- auth + credentials
- runs checkpoints
- calls kitaru.llm()
- writes artifacts
- can wait + resume
- S3
- GCS
- Azure Blob
- local fs (dev)
Kitaru separates durable control flow from code execution:
- The Kitaru server stores shared metadata, deployment snapshots, checkpoint state, execution logs, and control-plane data.
- For each run, a runner (the durable brain of an execution) executes the selected flow snapshot, manages checkpoint order, persists state, and handles retry, replay, resume, and wait.
- Individual checkpoints can run inline in the runner or in an isolated runtime (a separate container, Kubernetes job, or cloud job on the configured stack). The runner/target split is also where sandboxes, external tools, and custom compute backends conceptually plug in — the two shipped execution targets today are
inlineandisolated.
Key idea. The runner owns the durable run: checkpoint order, state, retry, replay, resume, and wait. Execution targets do the work. Checkpoints are the contract between the two.
- loads the selected flow snapshot
- controls checkpoint order
- persists state after every checkpoint
- retry · replay · resume · wait
- can wait for days without burning compute
- checkpoint outputs
- files · errors · logs
- replay lineage
Control / orchestration / execution
Kitaru splits runtime responsibilities into three planes. (This is separate from the harness / runtime / platform split, which is about where Kitaru sits in the broader agent stack — not about how a single run executes.)
| Plane | What lives here | Responsibility |
|---|---|---|
| Control plane | Kitaru server, UI, metadata DB, deployment registry, CLI/SDK/MCP APIs, auth and credential brokering | Knows what exists and who can call what |
| Orchestration plane | The runner for a single execution | Owns durable control flow for one run |
| Execution plane | Inline process or isolated container (shipped today); sandboxes, external tools, and custom backends are conceptual extensions of the same contract | Performs work |
The control plane is long-lived and shared. The orchestration plane is per-run and durable. The execution plane is where your code (and your agent's code) actually executes.
What runs where?
The most common confusion is which component runs user code. This table makes it explicit.
| Component | What it does | Runs user code? |
|---|---|---|
| Kitaru server (control plane) | Stores deployment registry, execution metadata, checkpoint state, log metadata, auth and session state | No |
| Runner (orchestration plane) | Runs the selected flow snapshot, controls checkpoint order, persists durable state, handles retry / replay / resume / wait | Yes, for inline checkpoints |
| Inline execution | Runs a checkpoint inside the runner process/pod | Yes |
| Isolated runtime | Runs a checkpoint in a separate container, job, pod, or remote compute backend | Yes |
| Sandbox (conceptual) | The same contract as isolated, tightened with stronger isolation or restricted egress. Not a shipped Kitaru execution target today — provided via adapters / your platform. | Yes, where integrated |
| External tool / MCP server | Performs work through a remote API or capability | Outside Kitaru |
| Metadata store | Stores runs, versions, checkpoint statuses, replay lineage | No |
| Artifact / state store | Stores checkpoint outputs, files, logs, replay lineage | No |
The run, step by step
Here is what actually happens when a consumer invokes a flow.
Request arrives. A user, service, or upstream agent calls the Kitaru invocation API (via CLI, SDK, MCP, or HTTP).
Server resolves the flow. The server authenticates the caller, resolves the target flow (and optionally a version or tag), validates the input schema, and creates a run record plus a FlowHandle.
Runner starts. The control plane schedules a runner on your configured stack — a Kubernetes pod, a cloud job, or the local process in dev. The runner loads the selected flow snapshot.
Runner executes checkpoints in order. For each checkpoint, the runner either executes inline or delegates to an isolated target. It waits for the result, persists the output to the artifact/state store, and advances.
State is durable the entire time. If a checkpoint fails, if the runner dies, or if a kitaru.wait() suspends the run, the server retains everything needed to retry, replay, or resume later.
Consumer observes results. The caller uses the returned FlowHandle (or the UI / CLI / SDK / MCP) to tail logs, inspect checkpoints, provide human input, replay, or cancel.
Runner vs sandbox
This is the idea that tends to click last and matter most.
The runner is the durable brain of a run.
The sandbox (or isolated runtime) is the hands that perform work.
If a sandbox dies mid-execution — a container evicted, a network partition, a pod OOM — the runner still holds durable checkpoint state and can retry that single checkpoint, resume from the last known boundary, or replay the run with a modified input or code version. The sandbox's failure is localized to the checkpoint that was executing, not the whole agent.
This is why platform teams should not confuse "I have a sandbox provider" with "I have durable execution". A sandbox is a bounded execution environment. Durable execution is a property of the surrounding runner — and of the checkpoints it persists.
Inline vs isolated checkpoints
Every checkpoint picks an execution target. Two are built in today: inline
(same process as the runner) and isolated (a separate container or job on the
configured stack). Code examples, decision rules for when to reach for
isolated, and the interaction with .submit() live on the
Checkpoints page.
A failed checkpoint is durable context
In classical pipelines, a failed step is a crash. In Kitaru, a failed checkpoint is durable context — something the runner, the agent loop, a human, or a retry policy can reason about.
Consider a document-synthesis agent:
Because the retrieval checkpoint's failure is persisted as a typed artifact, a downstream consumer has several real options:
- Retry the same checkpoint with the same input
- Replay with a modified input (e.g. a corrected document id)
- Replay with modified code (e.g. a new retrieval strategy)
- Feed the error artifact back into the agent loop so it can self-correct
- Wait for a human to provide a correction via
kitaru.wait(), then resume
This is what "agent-native error handling" means in practice: failures become data, and durable state survives them.
How deep do you integrate?
You don't have to restructure your agent to get value. Pick the depth that fits.
Level 0 — Black-box harness
Wrap the entire agent run as one checkpoint.
flow
└── checkpoint: run_agent()
└── PydanticAI / LangGraph / Claude Agent SDK / custom loop- Fastest integration
- Minimal code changes
- Framework-agnostic
The tradeoff: replay boundary is coarse (one per agent run) and you see less of the agent's internal state.
Level 1 — Coarse workflow checkpoints
Add checkpoints around the phases that matter to your team.
flow
├── checkpoint: plan()
├── checkpoint: retrieve()
├── checkpoint: act()
├── checkpoint: synthesize()
└── checkpoint: validate()- Useful replay points
- Better audit trail
- Good balance of portability and durability
The tradeoff: you (not the framework) decide where the boundaries go.
Level 2 — Framework-aware adapter
Use a Kitaru adapter that tracks the framework's internals (model calls, tool calls, intermediate state) as child events under the enclosing checkpoint.
flow
└── checkpointed framework runtime
├── model calls
├── tool calls
├── intermediate state
└── final output- Richer introspection
- Better debugging
- Tighter developer experience
The tradeoff: adapters are per-framework and need maintenance. The Pydantic AI adapter is the first one shipped.
Framework-agnostic by construction
Kitaru does not require your agent to be written as a graph. @checkpoint wraps ordinary Python function boundaries, independent of the harness.
That means a platform team supporting multiple harnesses — PydanticAI here, LangGraph there, Claude Agent SDK for one team, a raw-Python loop for another — can still standardize durability, replay, and execution metadata on a single runtime primitive. The harness choice stays a per-team decision.
Fits behind your platform
Kitaru can be used directly through its invocation API, or placed behind your existing platform/gateway:
- auth · entitlements
- rate limits · policy
- interceptors · guardrails
- product-specific endpoints
- version / tag resolution
- schema validation
- run record + FlowHandle
- credential brokering
- checkpoint order
- replay · resume · wait
- artifacts + state
- retry + isolation
You keep your auth, entitlements, interceptors, observability, and UI. Kitaru handles the durable execution layer underneath. This is how Kitaru drops into a finance- or regulated-industry-style internal agent platform without asking you to rebuild the surrounding system.
Local development
When you are developing locally, all three components run inside a single Python process on your machine. The server is embedded — no separate service to start, no database to configure. Checkpoint outputs are written to your local filesystem.
This means you can install Kitaru, run kitaru init, and have a fully working
durable execution environment in under a minute. Your flows behave exactly the
same as they will in production — same checkpointing, same replay, same
observability — just without the cloud infrastructure underneath.
Optionally, you can run a local server and UI to browse executions in a web UI.
Production
In production, the three components separate across your infrastructure:
- The server runs as a long-lived Kubernetes pod (deployed via Helm). It stores execution state in a database and serves the UI. Your whole team connects to it.
- The runner runs on the compute backend defined by your stack —
Kubernetes, Vertex AI, SageMaker, AzureML. When you call
.run(), the client fetches short-lived credentials from the server and dispatches the execution directly to the compute backend. The runner executes your checkpoints and writes outputs to cloud storage. If the execution crashes, replay picks up from the last completed checkpoint. - Artifacts and state live in your own S3 / GCS / Azure Blob bucket. The server tracks metadata but does not access storage directly; when a client needs to read files, it fetches temporary credentials brokered by the server.
There is no mandatory SaaS control plane in the path of your agent's data.
Related
Harness, Runtime, Platform
Where Kitaru fits in the broader agent stack.
Flows
The outer durable boundary of a Kitaru run.
Checkpoints
Durable work units. The contract between the runner and execution targets.
Wait and Input
Pause a run, release compute, resume when input arrives.
Stacks
Compute, storage, and container registry, bundled as a named runtime.